Predicting Essential Genes of Alzheimer Disease based on Module
Partition and Gravity-like Method in Heterogeneous Network
HAIYAN GUO1, SHUJUAN CAO1, CHEN ZHOU1, XIAOLU WU1 ,YONGMING ZOU2
1School of Mathematical Sciences, Tiangong University, Tianjin, 300382, CHINA
2Department of Neurology, Tianjin Huanhu Hospital, Tianjin, 300222, CHINA
Abstract: The pathogenic mechanism of Alzheimer's disease (AD) is complicated, predicting AD essential
genes is an important task in biomedical research, which is helpful in elucidating AD mechanisms and
revealing therapeutic targets. In this paper, we propose a random walk algorithm with a restart in the
heterogeneous network based on module partition and a gravity-like method (RWRHNMGL) for identifying
AD essential genes. The phenotype-gene heterogeneous network (PGHN) is constructed from multiple data
sources by considering similar information. These nodes of the optimal module, selected by module partition
and covering most functions of AD gene networks, are taken as gene seeds. A refined random walk algorithm is
developed to work in the PGHN, the transition matrix is modified by adding a gravity-like method based on
subcellular location information, and candidate genes are scored and ranked by a stable probability vector.
Finally, the receiver operating characteristic curve (ROC) and Mean Reciprocal Rank is used to evaluate the
prediction results of RWRHNMGL. The results show that the RWRHNMGL algorithm performs better in
predicting essential genes of AD.
Key-Words: Alzheimer's disease, essential genes, random walk, heterogeneous network, modular division,
subcellular location information.
Received: October 15, 2021. Revised: September 17, 2022. Accepted: October 19, 2022. Published: November 16, 2022.
1 Introduction
Alzheimer's disease (AD) is a chronic generative
degenerative disease of the central nervous system,
4-8% of the elderly suffer from AD when they are
over 65, AD has become the fourth killer after heart
disease, cancer, stroke in senile disease, which
brings great spiritual and economic pressure to the
family and society, [1].
These genetic disorders, such as cartilage dysplasia,
type I diabetes and AD, may arise from mutations
and abnormal expression of one or more genes.
These genes related to genetic disorders are called
essential genes for each specific genetic disease.
Essential gene prediction is crucial in illness
prevention, diagnosis and therapy, which aims at the
fundamental prevention and control of genetic
diseases, [2]. Some researchers employ machine
learning methods to predict candidate genes of
specific disease or identify personalized drugs for
specific gene mutation disease, [3], [4]. Random
walk algorithm is a typical method to predict
essential genes from candidate genes based on
relational network models, which obtain local or
global characteristics of the relational network by
information dissemination method. Weston et al.
constructed the predecessor of random walks, which
incorporated the idea of information dissemination
into protein-protein interaction (PPI) networks and
did a semi-supervised learning method by extending
the original RankProp algorithm, [5]. Zhu et al.
constructed a biological processes-related
subnetwork, which is expanded by a random walk
with a restart process to generate a so-called
expanded modularized network, [6]. Luo et al. fused
various networks to create a heterogeneous network,
then developed a random walk-based method on the
heterogeneous network (RWRHN) to rank possible
candidate genes for inherited disorders, [7]. Zhang
et al. proposed one improved algorithm KRWRHN
of a random walk with a restart in the heterogeneous
network based on the KNN (K-Nearest Neighbor)
method to predict the essential genes, which
modified the initial probability vector based on
enlarging the seed set which contained known
disease genes and their neighbor genes, [8].
The prioritization of candidate genes based on a
network is a common method for predicting
essential genes, which assigns rankings and scores
to all candidate genes according to the probability of
pathogenic risk. The prioritization methods can be
grouped into two categories approximately. The first
category is the algorithm on the PPI network of a
WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS
DOI: 10.37394/232011.2022.17.20
Haiyan Guo, Shujuan Cao,
Chen Zhou, Xiaolu Wu, Yongming Zou
E-ISSN: 2224-3429
158
Volume 17, 2022
single disease. The second category is the method
on the phenotype-gene heterogeneous network
(PGHN) to study the influence of each gene on the
disease. Some diseases with similar phenotypes
have been confirmed to be related to certain specific
genes, [9]. In other words, it is meaningful to add
similar disease phenotype information in predicting
essential genes. Joana et al. proposed a strategy that
network-based prioritization related to local
clustering on graphs and considered the full
topology of weighted gene association networks
integrating heterogeneous sources, [10]. Based on
the hypothesis that functionally related gene
mutations may lead to similar disease phenotypes,
Wang et al. proposed a PU induction matrix
completion algorithm based on heterogeneous
information fusion to predict candidate genes
involved in the pathogenicity of human diseases,
[11]. Dursun et al. introduced a network-
propagation algorithm for multiplex heterogeneous
networks, the results showed that PhenoGeneRanker
performed better to rank hypertension disease-
related strains using multiplex phenotype networks
than single or aggregated phenotype networks, [12].
In this paper, to identify AD essential genes, a refine
algorithm, named RWRHNMGL, is proposed based
on module partition and a gravity-like method in the
heterogeneous network.
2Materials and Methods
2.1 Data Preparation
The datasets utilized in this study are described in
this section, including PPI data, phenotype
similarity data, and gene-phenotype relationships
data. These data are extracted from public databases
described as follows.
2.1.1 PPI Data
The PPI data used in this paper comes from the
HumanNet database, which contained 399
interactions among 16,243 genes, [13]. In this work,
to build gene similarity networks, genes are
represented by nodes, and interactions are edges
with confidence scores, which indicate the
likelihood of pairwise genes interacting with each
other. Self-looped edges, isolated nodes, and edges
with scores less than 0.5 are removed from the PPI
network, [14]. Finally, the PPI network comprises
16079 genes and 404208 edges, whose adjacent
matrix 󰇛󰇜 is 16079* 16079 dimension (see
Supplementary Table S1), where 󰇛󰇜
indicates that the gene i is associated with the gene
j, and 󰇛󰇜 otherwise.
2.1.2 Phenotype Similarity Data
The phenotype similarity data associated with 5080
disease phenotypes are incorporated for
prioritization of candidate disease genes, which are
obtained from the literature [15]. These phenotypes
are described by utilizing the text mining method on
Online Mendelian Inheritance in Man (OMIM)
phenotype records, [16]. According to their analysis,
similarity values below the threshold of 0.3 are
uninformative and ignored. At last, we obtain a
5080*5080 dimensional similarity matrix
󰇛󰇜(see Supplementary Table S1), which
󰇛󰇜 indicates that phenotype i is associated
with phenotype j, and 󰇛󰇜 otherwise.
2.1.3 Gene-phenotype Relationships Data
DisGeNET is the largest public platform that links
human genes to diseases. It assigns a confidence
score to measure the reliability of gene-phenotype
relationships, [17]. In this work, we downloaded the
curated gene-disease association file and filtered the
gene-phenotype relationship with a score less than
0.4 as the literature, [18]. Then, 16079 genes in the
PPI network and 5080 phenotypes in the phenotype
similarity network are selected from DisGeNET to
construct gene-phenotype relationship network, and
an adjacent matrix  (16079 * 5080) was
obtained (see Supplementary Table S3), which 
indicates that gene i is associated with phenotype j,
and  otherwise.
2.1.4 Construction of PGHN
A PGHN is composed of the PPI network, the
phenotype similarity network, and the gene-
phenotype relationships network, as shown in Fig
1(a). The a PGHN is constructed, as shown in Fig
1(b), consists of 16079 genes and 5080 phenotypes,
whose adjacent matrix is 21159*21159
dimensions.
P
G
T
AB
ABA



(1)
where is the adjacent matrix of PPI network,
 is the adjacent matrix of phenotype similarity
network, B is the adjacent matrix of gene-phenotype
relationships network, represents the transposed
matrix of B.
WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS
DOI: 10.37394/232011.2022.17.20
Haiyan Guo, Shujuan Cao,
Chen Zhou, Xiaolu Wu, Yongming Zou
E-ISSN: 2224-3429
159
Volume 17, 2022
Fig. 1: (a) Illustration of the PGHN. The upper sub-
network is a phenotype similarity network, the
lower sub-network is a PPI network. The middle
sub-network is a bipartite graph of the gene-
phenotype relationship. (b) The PGNH. Where
yellow rectangles stand for phenotypes, blue
rectangles stand for genes.
2.2 RWRHNMGL Algorithm
In this paper, we propose a random walk with a
restart in the heterogeneous network based on
module partition and gravity-like method
(RWRHNMGL) by extending a random walk with
restart (RWR). RWR algorithm ranks candidate
genes and refines their relative importance to known
diseases. This method works like a walker, moving
from a seed gene to a randomly chosen neighbor
gene or returning to seed genes with a probability
[19]. The formula is as follows:

1
0
1
tt
P MP P
(2)
Where, M is the transition matrix, is the restart
probability and 󰇛󰇜,
is the initial
probability vector. After step t,
iterates to
which represents the current probability of the
random walker at gene i. After a few steps, the
probability vector will stabilize until 
, when
is the score of gene i, all genes are
ranked by the score of each gene in the network.
2.2.1 Construction of Initial Probability Vector
Seed nodes consist of phenotype seeds and gene
seeds. The phenotype seeds of the RWRHNMGL
algorithm are determined by disease association
analysis from the Human Phenotype Ontology
(HPO) database, [20]. The HPO contains over
13,000 terms arranged in a directed acyclic graph
and are connected by is-a (subclass-of) edges, such
that a term represents a more specific or limited
instance of its parent terms. The intersection of 25
AD-related phenotypes from the HPO database and
nodes of the phenotype similarity network from
2.1.2 are regarded as phenotype seeds, with a total
of 19 seeds. AD gene network is divided into
modules by Molecular complex detection
(MCODE), then the key module is selected through
functional enrichment analysis, [21]. In this study,
356 nodes of the key module are set as gene seeds.
The parameter indicates the PPI network's initial
probabilities, and the parameters indicate the
phenotype similarity network's initial probabilities.
The initial probability vector of genes is
constructed such that equal probability is assigned
to each gene-seed, and probability 0 is assigned to
other genes in the PPI network, with the sum of the
probabilities equal to 1. In the same way, the initial
probability vector is obtained. The initial
probability
of PGHN can be described as (see
Supplementary Table S4):
0
0
0
1
p





(3)
Where 󰇛󰇜weights the importance of the PPI
network and phenotype similarity network.
2.2.2 Calculation of Transition Matrix M in the
First Part of RWRHNMGL
(4)
The parameter is the jump probability of random
walkers from the PPI network and phenotypic
similarity network to each other. The transition
probability from gene i to gene j can be described as:
0
()
1
Gij
im
m
Gin
n
G j i
ij
Gij
Gin
n
Aif B
A
M P g g Aotherwise
A

(5)
The transition probability from gene i to phenotype j
can be described as:
0
()
0
ij
im
m
im
GP j i m
ij
Bif B
B
M P p g
otherwise

(6)
The transition probability from phenotype i to gene j
can be described as:
0
()
0
ji
ni
n
ni
PG j i n
ij
Bif B
B
M P g p
otherwise

(7)
The transition probability from phenotype i to
phenotype j can be described as:
0
()
1
Pij
ni
n
Pim
m
P j i
ij
Pij
Pim
m
Aif B
A
M P p p Aotherwise
A

(8)
Where     .
We substituted M into Formula (2) to start iteration
WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS
DOI: 10.37394/232011.2022.17.20
Haiyan Guo, Shujuan Cao,
Chen Zhou, Xiaolu Wu, Yongming Zou
E-ISSN: 2224-3429
160
Volume 17, 2022
until reaches a steady state, is recorded for the
next part of the RWRHNMGL algorithm (see
Supplementary Table S6), the first part of
RWRHNMGL called RWRHNM.
2.2.3 Gravity-like Algorithm
In this paper, to reduce the false positives of the PPI
network, a gravity-like method based on subcellular
location information is added to the refined
algorithm. The basic idea behind the method is that
two proteins should have a higher possibility to
physically interact with each other if they are active
together at least at a time point in the cell cycle and
appear together at the same subcellular location,
[22]. COMPARTMENTS is a weekly updated web
resource that integrates evidence on protein
subcellular localization, [23]. The subcellular
location information from COMPARTMENTS can
accurately express the cellular location of the gene
using GO terms. We download the subcellular
location information,  shows the number of
common GO terms between gene i and gene j.
Traditionally, Newton’s law of universal gravitation
measures the gravitation between two objects by
their masses and distance as follows:
2
Gij
ij
ij
kM M
r
(9)
where and represent the masses of two
objects,  represents the distance between them,
and K is the gravitation constant. In this study, the
parameter as proposed by literature, [24],
stands for the probability that a random walker
starting from seed nodes reaches candidate node i in
the steady state by the RWRHNM. Note that  is
the number of common GO terms between gene i
and gene j, which is inversely proportional to
distance, thus the topological distance  between
gene i and gene j is measured by  . In the
second part of the RWRHNMGL, the transfer
matrix is calculated as follows:
G GP
PG P
MM
MMM



(10)
Where,
G
G,0
()
1G ,
ij G ij
im
m
Gin
n
ji
ij
ij G ij
Gin
n
Aif B
A
M P g g Aotherwise
A

(11)
We substituted into
 󰇛 󰇜

to start iteration until

, when

is the final score of gene i related to AD (see
Supplementary Table S7).
3 Results
Table 1 shows the top 100 essential genes by
RWRHNMGL, which include numerous known
Alzheimer's disease genes: PSEN2, PSEN1,
ABCA7, A2M, HFE, CD2AP, APP, PICALM,
APOE, NOS3, CR1, ADAM10, CD33, MPO,
PLAU, TREM2 and so on, [25]. The above genes
highly related to AD are ranked in the top 1%,
which indicates that the RWRHNMGL algorithm is
effective in predicting AD essential genes.
Table 1. Some prediction results of AD essential
genes by RWRHNMGL
Gene
Symbol
To
p
Gene
Symbol
To
p
Gene
Symbol
Top
PSEN2
1
CORIN
34
ZFHX3
67
PSEN1
2
STOX1
35
CDK6
68
APP
3
MAPT
36
MPO
69
ACHE
4
NCSTN
37
FLT1
70
ATP5F1A
5
APOE
38
TNF
71
WNT2B
6
BACE1
39
INS
72
ABCA7
7
IDE
40
NOS3
73
TFAM
8
ADAM10
41
IL6
74
EPHA1
9
CASP3
42
PTGS2
75
MS4A4A
10
HFE
43
NPY
76
INPP5D
11
CHRNA7
44
SNCA
77
SORL1
12
CST3
45
EIF2AK3
78
TOMM40
13
TREM2
46
CYBB
79
APOC1
14
PLAU
47
GAPDH
80
CYP46A1
15
INSR
48
PPP3CC
81
VSNL1
16
GSK3B
49
CHRM1
82
CR1
17
NOS2
50
PPP3R1
83
BCHE
18
IL1B
51
FZD3
84
DPYSL2
19
GRN
52
RTN4
85
NECTIN2
20
CHMP2B
53
FYN
86
UROD
21
PDE3A
54
BECN1
87
TARDBP
22
DHCR24
55
NOS1
88
PPOX
23
CD2AP
56
IL1A
89
PLAT
24
A2M
57
AMBRA1
90
ALOX5AP
25
HJV
58
AXIN1
91
PLA2G7
26
MAOB
59
TUBB4A
92
LRCH1
27
GLA
60
ATP2A2
93
ALB
28
BAX
61
DVL1
94
HDAC9
29
KCNK3
62
DVL3
95
ABO
30
CLU
63
WNT5A
96
LPA
31
PICALM
64
HSD17B10
97
PTGIS
32
CD33
65
FZD2
98
ADD1
33
AGTR1
66
CYLD
99
CHAT
100
3.1 Functional Analysis of Essential Genes
The top 100 genes are selected for GO enrichment
analysis and Kyoto encyclopedia of genes and
genomes (KEGG) enrichment analysis based on the
Database for Annotation, Visualization, and
Integrated Discovery (DAVID), [26]. When the P
value of the Fisher-exact test is less than 0.05, GO
entries or pathways are valid in statistics. Some
results of GO enrichment analysis, including high
WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS
DOI: 10.37394/232011.2022.17.20
Haiyan Guo, Shujuan Cao,
Chen Zhou, Xiaolu Wu, Yongming Zou
E-ISSN: 2224-3429
161
Volume 17, 2022
enrichment items in biology process (BP), cell
component (CC) and molecular function (MF) are
described in Supplementary Table S8-S10. In Table
S11, the results of the KEGG-pathway enrichment
analysis are exhibited, and the gene number under
each GO term or pathway is numbered.
According to GO enrichment analysis and KEGG
pathway enrichment analysis, AD essential genes
involve many biological processes, such as beta-
amyloid metabolic process, negative regulation of
beta-amyloid formation, amyloid precursor protein
catabolic process and so on, which are consistent
with some research of AD, [27], [28], [29]. Some
metabolic processes, such as -amyloid protein, and
tau protein binding, correspond to the two clinical
biomarkers of AD: the extracellular -amyloid
protein found in senile plaques and
hyperphosphorylated tau protein aggregates. The
results of KEGG-pathway enrichment analysis
include the pathway of AD (hsa05010) with the
smallest P-Value. These results indicate that the AD
essential genes predicted by the RWRHNMGL
algorithm have high reliability.
3.2 Performance Evaluation
First, the Receiver operating characteristic (ROC),
which is a reflection of the relationship between
sensitivity and 1-specificity, is employed to evaluate
the performance of the RWRHNMGL algorithm.
The first group of positive controls includes known
AD disease genes from the KEGG pathway
(hsa05010: Alzheimer's disease), the negative
control group is composed of lung cancer genes,
leukemia genes, breast cancer genes, Parkinson
genes and epilepsy genes but not related with AD.
The scores of these genes associated with AD are
computed by the RWRHNMGL algorithm, then, the
ROC curve of the scores by RWRHNMGL is shown
with a dark purple line in Fig 2(a), and the area
under the ROC curve (AUC) is 0.986. The second
positive control group is composed of these
essential genes from the AD network based on three
mini metabolic networks, [25]. the negative control
group as before, the ROC curve of RWRHNMGL is
shown with a green line in Fig 2(a), the AUC is
0.988. The third positive control group is the union
of the first positive control group and the second
positive control group, the negative control group as
before, ROC curve of RWRHNMGL is shown with
a navy blue line in Fig 2(a), and the AUC is 0.980.
These results indicate that RWRHNMGL has a
better prediction performance for predicting AD
essential genes.
We further analysis the scores of genes from the
third control groups by RWRHNMGL. By applying
the Shapiro-Wilk test at 5% error probability, two
group of scores don’t fit the characteristics of a
normal distribution. So, the Wilcoxon-Manny-
Whitney test, which is a robust statistical method
with largely assumption-free nature, is used to
explain whether the two group of scores for the two
control groups are reliable or not. The two-sided P
value by the Wilcoxon-Manny-Whitney test is less
than  , the average rank of gene scores
from the positive control group and the negative
control group are 560.02 and 196.98, respectively.
These results show the two group of scores are
significant different. The one-sided P value by the
Wilcoxon-Manny-Whitney test is less than 
 too, the alternative hypothesis is accepted,
which means the scores of the negative control
group are less than the scores of the positive control
group significantly. All analyses are conducted with
R v 4.22.
4 Comparison with Other Methods
To illustrate the utility of the present method, the
ROC curve is used to compare the RWRHNMGL
with several methods. The seed set of RWR is the
known AD essential genes on the PPI network,
ROC curves of RWRHNMGL and RWR in three
control groups are shown in Fig 2(a). RWRMGL is
a random walk algorithm with a restart in the PPI
network based on module partition and gravity-like
method, whose gene seeds are consistent with
RWRHNMGL to test the effect of heterogeneous
network model on an algorithm, ROC curves of
RWRHNMGL and RWRMGL in three control
groups are shown in Fig 2(b), which shows that the
establishment of PGHN is crucial to RWRHNMGL,
and the information of phenotypes plays a positive
role in predicting AD essential genes. RWRHNGL
is a random walk algorithm with a restart in the
heterogeneous network based on a gravity-like
method, its seed set is AD-related nodes. The ROC
curves of RWRHNMGL and RWRHNGL in three
control groups are shown as Fig 2(c), which shows
that selecting a seed set by modular division can
improve the accuracy in predicting AD essential
genes. RWRHNM is a random walk algorithm with
a restart in the heterogeneous network based on
module partition, which does not consider the
gravity algorithm, ROC curves of RWRHNMGL
and RWRHNM in three control groups are shown in
Fig 2(d), which shows that the gravity-like
algorithm solves the problem of false positive in PPI
network to some extent.
WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS
DOI: 10.37394/232011.2022.17.20
Haiyan Guo, Shujuan Cao,
Chen Zhou, Xiaolu Wu, Yongming Zou
E-ISSN: 2224-3429
162
Volume 17, 2022
Fig. 2: ROC curves of RWRHNMGL, RWR,
RWRMGL, RWRHNGL and RWRHNM(a) ROC
curves of RWRHNMGL and RWR (b) ROC curves
of RWRHNMGL and RWRMGL (c) ROC curves of
RWRHNMGL and RWRHNGL (d) ROC curves of
RWRHNMGL and RWRHNM
RWRHN is a random walker on the reliable
heterogeneous network, [7]. KRWRHN is a random
walk with a restart in the heterogeneous network
based on KNN, [8]. ROC curves of RWRHNMGL,
RWR, RWRHN and KRWRHN in three control
groups are shown in Fig 3. Compared with previous
algorithms, RWRHNMGL generates better AUC
scores. In addition, we calculate the cumulative
distribution function (CDF) of the third positive
control genes in the top 500 genes of
RWRHNMGL, RWR, RWRHN and KRWRHN,
shown in Fig 3(d), which represents the
relationships between the gene ranks and the
cumulative percentage of positive control genes by
different algorithms. The proposed algorithm
achieves a distinct improvement in two evaluation
metrics for identifying AD essential genes in
comparison with the other methods.
Fig. 3: ROC curves of RWRHNMGL, RWR,
RWRHN and KRWRHN in three control groups. (a)
ROC curves of the first control group, (b) ROC
curves of the second control group, (c) ROC curves
of the third control group, (d) The CDF of the ranks
Another statistical measure for model performance
is obtained from the Mean Reciprocal Rank (MRR),
which is computed as follows:
N
i1
11
MRR N ran i
(12)
Where, ran i is the rank of the gene i, N is the
number of genes.
In this paper, N represents the number of genes in
the positive control group, the higher MRR value
means the genes in the positive control group get
smaller average rank, which shows the result is
reliable. We compute the MRR values of genes in
the three positive control group by different
methods, shown in Table 2, RWRHNMGL achieves
best performance with the highest MRR value.
Table 2. MRR of RWRHNMGL, RWR, RWRHN
and KRWRHN
control
group
RWRHNMGL
RWR
RWRHN
KRWRHN
first
0.01201205
0.011707858
0.00660667
0.007274614
second
0.046685512
0.002488351
0.027417649
0.042457116
third
0.011763921
0.010807044
0.006534787
0.007253108
RWRHNMGL has three parameters weighting
coefficient η, jumping probability , and restart
probability . The importance of each sub-network
is weighted by this parameter η. The algorithm
works slightly better when  is set as 0.5 according
to the literature, [30]. The reinforcement between
the PPI network and the phenotypic similarity
network is controlled by parameter . This paper
takes  to make the phenotype similarity
network as important as PPI networks in predicting
essential genes as proposed by literature, [31]. The
parameter is the restart probability, which
determines the possibility of jumping from any node
back to the starting points. RWRHNMGL is also
run with values of ranging from 0.1 to 0.9 in steps
of 0.1 to evaluate the influence of the parameter. As
shown in Fig 4, AUC reached a maximum of 0.988
at 0.7. Therefore, the restart probability is
fixed at 0.7 in this paper.
WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS
DOI: 10.37394/232011.2022.17.20
Haiyan Guo, Shujuan Cao,
Chen Zhou, Xiaolu Wu, Yongming Zou
E-ISSN: 2224-3429
163
Volume 17, 2022
Fig. 4: ROC curves of RWRHNMGL, RWR,
RWRMGL, RWRHNGL and RWRHNM
5 Conclusion
AD is a complex multi-genic neurodegenerative
disorder, the clinical drugs only relieve symptoms
but not cure the disease. The identification of
essential genes associated with AD is crucial in
illness prevention, diagnosis and therapy, which
aims at the fundamental prevention and effective
therapeutics.
In this paper, a refined random walk algorithm
RWRHNMGL is proposed to identify AD essential
genes based on the framework of RWR. The PGHN
is constructed from multiple data sources by
considering similar information. These nodes of the
optimal module, selected by module partition and
covers most functions of the AD gene network, are
set as gene seeds. A refined random walk algorithm
is developed to work in the PGHN, the transition
matrix is modified by adding a gravity-like method
based on subcellular location information, and
candidate genes are scored and ranked by a stable
probability vector. Finally, ROC and MRR are used
to evaluate the prediction results of RWRHNMGL.
These results show that the RWRHNMGL
algorithm performs better in predicting essential
genes of AD.
Some improvements may be necessary for future
research. First, gene seeds are taken from the key
module, which covers most functions of the AD
gene network. It may be an interesting direction by
combining module partition with the random walk
algorithm. Second, adding a gravity-like method
based on subcellular location information to the
random walk algorithm will solve the problem of
false positives to some extent.
Acknowledgments:
This research is funded by The Science Fund of
Tianjin Education Commission for Higher
Education, grant number 2019KJ025. This paper is
in memory of professor Jishou Ruan.
References:
[1] P. Pandey, M. Singh, I. Gambhir, Alzheimer's
disease&58; A Threat to mankind. Journal of
Stress Physiology & Biochemistry,Vol. 7,
2011, pp. 15-30.
[2] A.M. Glazier, J.H. Nadeau, T.J. Aitman,
Finding Genes That Underlie Complex Traits.
Science,Vol.298, No.5602, 2002, pp. 2345-
2349.
[3] R. Melchiotti, D. Liberati. Candidate gene
discriminating gliomas identification via a
supervised iteration of bipartitive k-means
initialised via partititve division according to
principal components. WSEAS Transactions
on Biology and Biomedicine, Vol. 15, 2018,
pp. 87-100.
[4] V. Hima, P. Namboori. Identification of
Lapatinib Derivatives and Analogs to Control
Metastatic Breast Cancer-specific to South
Asian Population - a Pharmacogenomic
Approach. WSEAS Transactions on Biology
and Biomedicine, Vol.18, 2021, pp. 51-62.
[5] W. Jason, K. Rui, L. Christina, S.N. William,
Protein Ranking by Semi-Supervised Network
Propagation. BMC Bioinformatics,
Vol.7,No.14, 2006, pp. 2345-2349.
[6] L.J. Zhu, J. Xiang, Q.L. Wang, A.L. Wang, C.
Li, G. Tian, H.J. Zhang, S.Z. Chen,
Revealing the Interactions Between Diabetes,
Diabetes-Related Diseases, and Cancers
Based on the Network Connectivity of Their
Related Genes. Frontiers in Genetics,
Vol.11,No.14, 2020, pp. 617136.
[7] J. Luo, S. Liang, Prioritization of potential
candidate disease genes by topological
similarity of protein-protein interaction
network and phenotype data. Journal of
Biomedical Informatics, Vol.53,No.7, 2015,
pp. 229-236.
[8] S.W. Zhang, D.D. Shao, S.Y. Zhang,
Prediction of risk pathogenic genes based on
pathogenic gene network modularity. Journal
of Biophysics, Vol.11,No.3, 2014, pp. 11.
[9] B. Xu, Y. Liu, S. Yu, L. Wang, J. Dong, H.F.
Lin, Z.H. Yang, J. Wang, F. Xia, A network
embedding model for pathogenic genes
prediction by multi-path random walking on
heterogeneous network. BMC Medical
Genomics, Vol.12,No.5, 2019, pp. 118.
[10] P.G. Joana, P.F. Alexandre, M. Yves, C.M.
Sara, Interactogeneous: Disease Gene
Prioritization Using Heterogeneous Networks
0,95
0,955
0,96
0,965
0,97
0,975
0,98
0,985
0,99
0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9
AUC
WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS
DOI: 10.37394/232011.2022.17.20
Haiyan Guo, Shujuan Cao,
Chen Zhou, Xiaolu Wu, Yongming Zou
E-ISSN: 2224-3429
164
Volume 17, 2022
and Full Topology Scores. PLOS ONE, Vol.7,
No.11, 2019, pp. e49634.
[11] C.Y. Wang, J. Zhang, X.P. Wang, K. Han,
M.Z. Guo. Pathogenic Gene Prediction
Algorithm Based on Heterogeneous
Information Fusion. Frontiers in
genetics,Vol.11, No.4, 2020, pp. 1-13.
[12] C. Dursun, A Kwitek, S Bozdag,
PhenoGeneRanker: Gene and Phenotype
Prioritization Using Multiplex Heterogeneous
Networks. IEEE/ACM transactions on
computational biology and bioinformatics,
2021,PP.
[13] I. Lee, U.M. Blom, P.I. Wang, J.E. Shim,
E.M. Marcotte, Prioritizing candidate disease
genes by network-based boosting of genome-
wide association data. Genome Res. Vol.21,
No.7, 2011, pp. 1109-1121.
[14] J.W. Lin, Biological network node
classification based on network representation
learning. Xiamen University, 2019.
[15] M.A. Driel, J. Bruggeman, G. Vriend, H.G.
Brunner, J.A. Leunissen. A text-mining
analysis of the human phenome. Eur J Hum
Genet, Vol.14, No.5, 2006, pp. 535-542.
[16] A Hamosh, A.F. Scott, J.S. Amberger, C.A.
Bocchini, V.A. McKusick, Online Mendelian
Inheritance in Man (OMIM), a
knowledgebase of human genes and genetic
disorders. Nucleic Acids Res,Vol.33, 2005, pp.
514-517.
[17] P. Janet, Q.R. Núria, B. Àlex, et al.
DisGeNET: a discovery platform for the
dynamical exploration of human diseases and
their genes. Database, 2015,bav028.
[18] B. Ljubic, M. Pavlovski, S. Roychoudhury,
N.C. Van, A. Salhi, M. Essack, V.B. Bajic,
Obradovic Z. Genes and comorbidities of
thyroid cancer. Informatics in Medicine
Unlocked,2021.
[19] S. Kohler, S. Bauer, D. Horn, P.N. Robinson.
Walking the interactome for prioritization of
candidate disease genes. Am J Hum
Genet,Vol. 82,No.4, 2008, pp. 949-958.
[20] S. Köhler, M. Gargano, N. Matentzoglu, L.C.
Carmody, P.N. Robinson, The Human
Phenotype Ontology in 2021. Nucleic acids
research,2020.
[21] C Zhou, H.Y. Guo, S.J. Cao, Gene Network
Analysis of Alzheimer’s Disease Based on
Network and Statistical Methods.
Entropy ,Vol. 23,No.10, 2021, pp. e2310136.
[22] M. Li, P. Ni, X. Chen, J. Wang, F.X. Wu, Y.
Pan, Construction of Refined Protein
Interaction Network for Predicting Essential
Proteins. IEEE/ACM Transactions On
Computational Biology And Bioinformatics,
Vol. 16,No.4, 2019, pp. 1386-1397.
[23] J.X. Binder, F.S. Pletscher, K. Tsafou, et al.
COMPARTMENTS: Unification and
visualization of protein subcellular
localization evidence, Database,
2014,pp.bau012.
[24] L.M. Lin, T.H. Yang, L. Fang, J. Yang, F.
Yang, J. Zhao. Gene gravity-like algorithm
for disease gene prediction based on
phenotype-specific network. BMC systems
biology,Vol. 11,No.1, 2017, pp. 121.
[25] S.J. Cao, L. Yu, J.Y. Mao, et al. Uncovering
the Molecular Mechanism of Actions between
Pharmaceuticals and Proteins on the
Alzheimer’s Disease Network. Plos One, Vol.
10,No.12, 2015, pp. e0144387.
[26] D.W. Huang, B.T. Sherman, R.A. Lempicki.
Systematic and integrative analysis of large
gene lists using DAVID Bioinformatics
Resources. Nature Protoc, Vol. 4,No.1, 2009,
pp. 44-57.
[27] T. Fan, J. Wang.The prediction of potential
risk genes for Alzheimer' s disease. Aerospace
medicine and medical engineering, Vol. 32,
No.6, 2019, pp. 497-502.
[28] C. Joachim, D. Games, J. Morris, et al.
Antibodies to non-beta regions of the beta-
amyloid precursor protein detect a subset of
senile plaques. American Journal of
Pathology, Vol. 138,No.2, 1991, pp. 373-384.
[29] L.K. Lu, X. Yu, Y.L. Cai, M. Sun, H. Yang,
Application of CRISPR/Cas9 in Alzheimer’s
Disease. Frontiers in Neuroscience,2021.
[30] Y.J. Li, J.C. Patra. Genome-wide inferring
gene-phenotype relationship by walking on
the heterogeneous network.
Bioinformatics,Vol. 26,No.9, 2010, pp. 1219-
1224.
[31] C. Lei, J. Ruan. A novel link prediction
algorithm for reconstructing protein-protein
interaction networks by topological similarity.
Bioinformatics. Vol. 29,No.3, 2013, pp. 355-
364.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS
DOI: 10.37394/232011.2022.17.20
Haiyan Guo, Shujuan Cao,
Chen Zhou, Xiaolu Wu, Yongming Zou
E-ISSN: 2224-3429
165
Volume 17, 2022