Predicting Essential Genes of Alzheimer Disease based on Module

Partition and Gravity-like Method in Heterogeneous Network

HAIYAN GUO1, SHUJUAN CAO1, CHEN ZHOU1, XIAOLU WU1 ,YONGMING ZOU2

1School of Mathematical Sciences, Tiangong University, Tianjin, 300382, CHINA

2Department of Neurology, Tianjin Huanhu Hospital, Tianjin, 300222, CHINA

Abstract: The pathogenic mechanism of Alzheimer's disease (AD) is complicated, predicting AD essential

genes is an important task in biomedical research, which is helpful in elucidating AD mechanisms and

revealing therapeutic targets. In this paper, we propose a random walk algorithm with a restart in the

heterogeneous network based on module partition and a gravity-like method (RWRHNMGL) for identifying

AD essential genes. The phenotype-gene heterogeneous network (PGHN) is constructed from multiple data

sources by considering similar information. These nodes of the optimal module, selected by module partition

and covering most functions of AD gene networks, are taken as gene seeds. A refined random walk algorithm is

developed to work in the PGHN, the transition matrix is modified by adding a gravity-like method based on

subcellular location information, and candidate genes are scored and ranked by a stable probability vector.

Finally, the receiver operating characteristic curve (ROC) and Mean Reciprocal Rank is used to evaluate the

prediction results of RWRHNMGL. The results show that the RWRHNMGL algorithm performs better in

predicting essential genes of AD.

Key-Words: Alzheimer's disease, essential genes, random walk, heterogeneous network, modular division,

subcellular location information.

Received: October 15, 2021. Revised: September 17, 2022. Accepted: October 19, 2022. Published: November 16, 2022.

1 Introduction

Alzheimer's disease (AD) is a chronic generative

degenerative disease of the central nervous system,

4-8% of the elderly suffer from AD when they are

over 65, AD has become the fourth killer after heart

disease, cancer, stroke in senile disease, which

brings great spiritual and economic pressure to the

family and society, [1].

These genetic disorders, such as cartilage dysplasia,

type I diabetes and AD, may arise from mutations

and abnormal expression of one or more genes.

These genes related to genetic disorders are called

essential genes for each specific genetic disease.

Essential gene prediction is crucial in illness

prevention, diagnosis and therapy, which aims at the

fundamental prevention and control of genetic

diseases, [2]. Some researchers employ machine

learning methods to predict candidate genes of

specific disease or identify personalized drugs for

specific gene mutation disease, [3], [4]. Random

walk algorithm is a typical method to predict

essential genes from candidate genes based on

relational network models, which obtain local or

global characteristics of the relational network by

information dissemination method. Weston et al.

constructed the predecessor of random walks, which

incorporated the idea of information dissemination

into protein-protein interaction (PPI) networks and

did a semi-supervised learning method by extending

the original RankProp algorithm, [5]. Zhu et al.

constructed a biological processes-related

subnetwork, which is expanded by a random walk

with a restart process to generate a so-called

expanded modularized network, [6]. Luo et al. fused

various networks to create a heterogeneous network,

then developed a random walk-based method on the

heterogeneous network (RWRHN) to rank possible

candidate genes for inherited disorders, [7]. Zhang

et al. proposed one improved algorithm KRWRHN

of a random walk with a restart in the heterogeneous

network based on the KNN (K-Nearest Neighbor)

method to predict the essential genes, which

modified the initial probability vector based on

enlarging the seed set which contained known

disease genes and their neighbor genes, [8].

The prioritization of candidate genes based on a

network is a common method for predicting

essential genes, which assigns rankings and scores

to all candidate genes according to the probability of

pathogenic risk. The prioritization methods can be

grouped into two categories approximately. The first

category is the algorithm on the PPI network of a

WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS

DOI: 10.37394/232011.2022.17.20

Haiyan Guo, Shujuan Cao,

Chen Zhou, Xiaolu Wu, Yongming Zou

E-ISSN: 2224-3429

158

Volume 17, 2022

single disease. The second category is the method

on the phenotype-gene heterogeneous network

(PGHN) to study the influence of each gene on the

disease. Some diseases with similar phenotypes

have been confirmed to be related to certain specific

genes, [9]. In other words, it is meaningful to add

similar disease phenotype information in predicting

essential genes. Joana et al. proposed a strategy that

network-based prioritization related to local

clustering on graphs and considered the full

topology of weighted gene association networks

integrating heterogeneous sources, [10]. Based on

the hypothesis that functionally related gene

mutations may lead to similar disease phenotypes,

Wang et al. proposed a PU induction matrix

completion algorithm based on heterogeneous

information fusion to predict candidate genes

involved in the pathogenicity of human diseases,

[11]. Dursun et al. introduced a network-

propagation algorithm for multiplex heterogeneous

networks, the results showed that PhenoGeneRanker

performed better to rank hypertension disease-

related strains using multiplex phenotype networks

than single or aggregated phenotype networks, [12].

In this paper, to identify AD essential genes, a refine

algorithm, named RWRHNMGL, is proposed based

on module partition and a gravity-like method in the

heterogeneous network.

2Materials and Methods

2.1 Data Preparation

The datasets utilized in this study are described in

this section, including PPI data, phenotype

similarity data, and gene-phenotype relationships

data. These data are extracted from public databases

described as follows.

2.1.1 PPI Data

The PPI data used in this paper comes from the

HumanNet database, which contained 399

interactions among 16,243 genes, [13]. In this work,

to build gene similarity networks, genes are

represented by nodes, and interactions are edges

with confidence scores, which indicate the

likelihood of pairwise genes interacting with each

other. Self-looped edges, isolated nodes, and edges

with scores less than 0.5 are removed from the PPI

network, [14]. Finally, the PPI network comprises

16079 genes and 404208 edges, whose adjacent

matrix 󰇛󰇜 is 16079* 16079 dimension (see

Supplementary Table S1), where 󰇛󰇜  

indicates that the gene i is associated with the gene

j, and 󰇛󰇜   otherwise.

2.1.2 Phenotype Similarity Data

The phenotype similarity data associated with 5080

disease phenotypes are incorporated for

prioritization of candidate disease genes, which are

obtained from the literature [15]. These phenotypes

are described by utilizing the text mining method on

Online Mendelian Inheritance in Man (OMIM)

phenotype records, [16]. According to their analysis,

similarity values below the threshold of 0.3 are

uninformative and ignored. At last, we obtain a

5080*5080 dimensional similarity matrix 

󰇛󰇜(see Supplementary Table S1), which

󰇛󰇜   indicates that phenotype i is associated

with phenotype j, and 󰇛󰇜   otherwise.

2.1.3 Gene-phenotype Relationships Data

DisGeNET is the largest public platform that links

human genes to diseases. It assigns a confidence

score to measure the reliability of gene-phenotype

relationships, [17]. In this work, we downloaded the

curated gene-disease association file and filtered the

gene-phenotype relationship with a score less than

0.4 as the literature, [18]. Then, 16079 genes in the

PPI network and 5080 phenotypes in the phenotype

similarity network are selected from DisGeNET to

construct gene-phenotype relationship network, and

an adjacent matrix    (16079 * 5080) was

obtained (see Supplementary Table S3), which  

 indicates that gene i is associated with phenotype j,

and    otherwise.

2.1.4 Construction of PGHN

A PGHN is composed of the PPI network, the

phenotype similarity network, and the gene-

phenotype relationships network, as shown in Fig

1(a). The a PGHN is constructed, as shown in Fig

1(b), consists of 16079 genes and 5080 phenotypes,

whose adjacent matrix  is 21159*21159

dimensions.

ABA







(1)

where  is the adjacent matrix of PPI network,

 is the adjacent matrix of phenotype similarity

network, B is the adjacent matrix of gene-phenotype

relationships network,  represents the transposed

matrix of B.

WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS

DOI: 10.37394/232011.2022.17.20

Haiyan Guo, Shujuan Cao,

Chen Zhou, Xiaolu Wu, Yongming Zou

E-ISSN: 2224-3429

159

Volume 17, 2022

Fig. 1: (a) Illustration of the PGHN. The upper sub-

network is a phenotype similarity network, the

lower sub-network is a PPI network. The middle

sub-network is a bipartite graph of the gene-

phenotype relationship. (b) The PGNH. Where

yellow rectangles stand for phenotypes, blue

rectangles stand for genes.

2.2 RWRHNMGL Algorithm

In this paper, we propose a random walk with a

restart in the heterogeneous network based on

module partition and gravity-like method

(RWRHNMGL) by extending a random walk with

restart (RWR). RWR algorithm ranks candidate

genes and refines their relative importance to known

diseases. This method works like a walker, moving

from a seed gene to a randomly chosen neighbor

gene or returning to seed genes with a probability 

[19]. The formula is as follows:

 



  

P MP P

(2)

Where, M is the transition matrix,  is the restart

probability and   󰇛󰇜, 

 is the initial

probability vector. After step t, 

 iterates to 

which represents the current probability of the

random walker at gene i. After a few steps, the

probability vector will stabilize until   

, when 

 is the score of gene i, all genes are

ranked by the score of each gene in the network.

2.2.1 Construction of Initial Probability Vector

Seed nodes consist of phenotype seeds and gene

seeds. The phenotype seeds of the RWRHNMGL

algorithm are determined by disease association

analysis from the Human Phenotype Ontology

(HPO) database, [20]. The HPO contains over

13,000 terms arranged in a directed acyclic graph

and are connected by is-a (subclass-of) edges, such

that a term represents a more specific or limited

instance of its parent terms. The intersection of 25

AD-related phenotypes from the HPO database and

nodes of the phenotype similarity network from

2.1.2 are regarded as phenotype seeds, with a total

of 19 seeds. AD gene network is divided into

modules by Molecular complex detection

(MCODE), then the key module is selected through

functional enrichment analysis, [21]. In this study,

356 nodes of the key module are set as gene seeds.

The parameter  indicates the PPI network's initial

probabilities, and the parameters  indicate the

phenotype similarity network's initial probabilities.

The initial probability vector  of genes is

constructed such that equal probability is assigned

to each gene-seed, and probability 0 is assigned to

other genes in the PPI network, with the sum of the

probabilities equal to 1. In the same way, the initial

probability vector is obtained. The initial

probability 

 of PGHN can be described as (see

Supplementary Table S4):

 













(3)

Where   󰇛󰇜weights the importance of the PPI

network and phenotype similarity network.

2.2.2 Calculation of Transition Matrix M in the

First Part of RWRHNMGL

G GP

PG P

MMM







(4)

The parameter  is the jump probability of random

walkers from the PPI network and phenotypic

similarity network to each other. The transition

probability from gene i to gene j can be described as:

 

   

 

   

()

Gij

Gin

G j i

Gij

Gin

Aif B

M P g g Aotherwise





















(5)

The transition probability from gene i to phenotype j

can be described as:

 

()

GP j i m

Bif B

M P p g

otherwise

















(6)

The transition probability from phenotype i to gene j

can be described as:

 

()

PG j i n

Bif B

M P g p

otherwise

















(7)

The transition probability from phenotype i to

phenotype j can be described as:

 

   

 

   

()

Pij

Pim

P j i

Pij

Pim

Aif B

M P p p Aotherwise





















(8)

Where          .

We substituted M into Formula (2) to start iteration

WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS

DOI: 10.37394/232011.2022.17.20

Haiyan Guo, Shujuan Cao,

Chen Zhou, Xiaolu Wu, Yongming Zou

E-ISSN: 2224-3429

160

Volume 17, 2022

until  reaches a steady state,  is recorded for the

next part of the RWRHNMGL algorithm (see

Supplementary Table S6), the first part of

RWRHNMGL called RWRHNM.

2.2.3 Gravity-like Algorithm

In this paper, to reduce the false positives of the PPI

network, a gravity-like method based on subcellular

location information is added to the refined

algorithm. The basic idea behind the method is that

two proteins should have a higher possibility to

physically interact with each other if they are active

together at least at a time point in the cell cycle and

appear together at the same subcellular location,

[22]. COMPARTMENTS is a weekly updated web

resource that integrates evidence on protein

subcellular localization, [23]. The subcellular

location information from COMPARTMENTS can

accurately express the cellular location of the gene

using GO terms. We download the subcellular

location information,  shows the number of

common GO terms between gene i and gene j.

Traditionally, Newton’s law of universal gravitation

measures the gravitation between two objects by

their masses and distance as follows:

Gij

kM M



(9)

where  and  represent the masses of two

objects,  represents the distance between them,

and K is the gravitation constant. In this study, the

parameter    as proposed by literature, [24], 

stands for the probability that a random walker

starting from seed nodes reaches candidate node i in

the steady state by the RWRHNM. Note that  is

the number of common GO terms between gene i

and gene j, which is inversely proportional to

distance, thus the topological distance  between

gene i and gene j is measured by  . In the

second part of the RWRHNMGL, the transfer

matrix  is calculated as follows:

G GP

PG P

MMM









(10)

Where,

 

   

 

   

G,0

()

1G ,

ij G ij

Gin

ij G ij

Gin

Aif B

M P g g Aotherwise























(11)

We substituted  into 



 󰇛  󰇜







to start iteration until 



  



, when





 is the final score of gene i related to AD (see

Supplementary Table S7).

3 Results

Table 1 shows the top 100 essential genes by

RWRHNMGL, which include numerous known

Alzheimer's disease genes: PSEN2, PSEN1,

ABCA7, A2M, HFE, CD2AP, APP, PICALM,

APOE, NOS3, CR1, ADAM10, CD33, MPO,

PLAU, TREM2 and so on, [25]. The above genes

highly related to AD are ranked in the top 1%,

which indicates that the RWRHNMGL algorithm is

effective in predicting AD essential genes.

Table 1. Some prediction results of AD essential

genes by RWRHNMGL

Gene

Symbol

Gene

Symbol

Gene

Symbol

Top

PSEN2

CORIN

ZFHX3

PSEN1

STOX1

CDK6

APP

MAPT

MPO

ACHE

NCSTN

FLT1

ATP5F1A

APOE

TNF

WNT2B

BACE1

INS

ABCA7

IDE

NOS3

TFAM

ADAM10

IL6

EPHA1

CASP3

PTGS2

MS4A4A

HFE

NPY

INPP5D

CHRNA7

SNCA

SORL1

CST3

EIF2AK3

TOMM40

TREM2

CYBB

APOC1

PLAU

GAPDH

CYP46A1

INSR

PPP3CC

VSNL1

GSK3B

CHRM1

CR1

NOS2

PPP3R1

BCHE

IL1B

FZD3

DPYSL2

GRN

RTN4

NECTIN2

CHMP2B

FYN

UROD

PDE3A

BECN1

TARDBP

DHCR24

NOS1

PPOX

CD2AP

IL1A

PLAT

A2M

AMBRA1

ALOX5AP

HJV

AXIN1

PLA2G7

MAOB

TUBB4A

LRCH1

GLA

ATP2A2

ALB

BAX

DVL1

HDAC9

KCNK3

DVL3

ABO

CLU

WNT5A

LPA

PICALM

HSD17B10

PTGIS

CD33

FZD2

ADD1

AGTR1

CYLD

CHAT

100

3.1 Functional Analysis of Essential Genes

The top 100 genes are selected for GO enrichment

analysis and Kyoto encyclopedia of genes and

genomes (KEGG) enrichment analysis based on the

Database for Annotation, Visualization, and

Integrated Discovery (DAVID), [26]. When the P

value of the Fisher-exact test is less than 0.05, GO

entries or pathways are valid in statistics. Some

results of GO enrichment analysis, including high

WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS

DOI: 10.37394/232011.2022.17.20

Haiyan Guo, Shujuan Cao,

Chen Zhou, Xiaolu Wu, Yongming Zou

E-ISSN: 2224-3429

161

Volume 17, 2022

enrichment items in biology process (BP), cell

component (CC) and molecular function (MF) are

described in Supplementary Table S8-S10. In Table

S11, the results of the KEGG-pathway enrichment

analysis are exhibited, and the gene number under

each GO term or pathway is numbered.

According to GO enrichment analysis and KEGG

pathway enrichment analysis, AD essential genes

involve many biological processes, such as beta-

amyloid metabolic process, negative regulation of

beta-amyloid formation, amyloid precursor protein

catabolic process and so on, which are consistent

with some research of AD, [27], [28], [29]. Some

metabolic processes, such as -amyloid protein, and

tau protein binding, correspond to the two clinical

biomarkers of AD: the extracellular -amyloid

protein found in senile plaques and

hyperphosphorylated tau protein aggregates. The

results of KEGG-pathway enrichment analysis

include the pathway of AD (hsa05010) with the

smallest P-Value. These results indicate that the AD

essential genes predicted by the RWRHNMGL

algorithm have high reliability.

3.2 Performance Evaluation

First, the Receiver operating characteristic (ROC),

which is a reflection of the relationship between

sensitivity and 1-specificity, is employed to evaluate

the performance of the RWRHNMGL algorithm.

The first group of positive controls includes known

AD disease genes from the KEGG pathway

(hsa05010: Alzheimer's disease), the negative

control group is composed of lung cancer genes,

leukemia genes, breast cancer genes, Parkinson

genes and epilepsy genes but not related with AD.

The scores of these genes associated with AD are

computed by the RWRHNMGL algorithm, then, the

ROC curve of the scores by RWRHNMGL is shown

with a dark purple line in Fig 2(a), and the area

under the ROC curve (AUC) is 0.986. The second

positive control group is composed of these

essential genes from the AD network based on three

mini metabolic networks, [25]. the negative control

group as before, the ROC curve of RWRHNMGL is

shown with a green line in Fig 2(a), the AUC is

0.988. The third positive control group is the union

of the first positive control group and the second

positive control group, the negative control group as

before, ROC curve of RWRHNMGL is shown with

a navy blue line in Fig 2(a), and the AUC is 0.980.

These results indicate that RWRHNMGL has a

better prediction performance for predicting AD

essential genes.

We further analysis the scores of genes from the

third control groups by RWRHNMGL. By applying

the Shapiro-Wilk test at 5% error probability, two

group of scores don’t fit the characteristics of a

normal distribution. So, the Wilcoxon-Manny-

Whitney test, which is a robust statistical method

with largely assumption-free nature, is used to

explain whether the two group of scores for the two

control groups are reliable or not. The two-sided P

value by the Wilcoxon-Manny-Whitney test is less

than   , the average rank of gene scores

from the positive control group and the negative

control group are 560.02 and 196.98, respectively.

These results show the two group of scores are

significant different. The one-sided P value by the

Wilcoxon-Manny-Whitney test is less than  

 too, the alternative hypothesis is accepted,

which means the scores of the negative control

group are less than the scores of the positive control

group significantly. All analyses are conducted with

R v 4.22.

4 Comparison with Other Methods

To illustrate the utility of the present method, the

ROC curve is used to compare the RWRHNMGL

with several methods. The seed set of RWR is the

known AD essential genes on the PPI network,

ROC curves of RWRHNMGL and RWR in three

control groups are shown in Fig 2(a). RWRMGL is

a random walk algorithm with a restart in the PPI

network based on module partition and gravity-like

method, whose gene seeds are consistent with

RWRHNMGL to test the effect of heterogeneous

network model on an algorithm, ROC curves of

RWRHNMGL and RWRMGL in three control

groups are shown in Fig 2(b), which shows that the

establishment of PGHN is crucial to RWRHNMGL,

and the information of phenotypes plays a positive

role in predicting AD essential genes. RWRHNGL

is a random walk algorithm with a restart in the

heterogeneous network based on a gravity-like

method, its seed set is AD-related nodes. The ROC

curves of RWRHNMGL and RWRHNGL in three

control groups are shown as Fig 2(c), which shows

that selecting a seed set by modular division can

improve the accuracy in predicting AD essential

genes. RWRHNM is a random walk algorithm with

a restart in the heterogeneous network based on

module partition, which does not consider the

gravity algorithm, ROC curves of RWRHNMGL

and RWRHNM in three control groups are shown in

Fig 2(d), which shows that the gravity-like

algorithm solves the problem of false positive in PPI

network to some extent.

WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS

DOI: 10.37394/232011.2022.17.20

Haiyan Guo, Shujuan Cao,

Chen Zhou, Xiaolu Wu, Yongming Zou

E-ISSN: 2224-3429

162

Volume 17, 2022

Fig. 2: ROC curves of RWRHNMGL, RWR,

RWRMGL, RWRHNGL and RWRHNM：(a) ROC

curves of RWRHNMGL and RWR (b) ROC curves

of RWRHNMGL and RWRMGL (c) ROC curves of

RWRHNMGL and RWRHNGL (d) ROC curves of

RWRHNMGL and RWRHNM

RWRHN is a random walker on the reliable

heterogeneous network, [7]. KRWRHN is a random

walk with a restart in the heterogeneous network

based on KNN, [8]. ROC curves of RWRHNMGL,

RWR, RWRHN and KRWRHN in three control

groups are shown in Fig 3. Compared with previous

algorithms, RWRHNMGL generates better AUC

scores. In addition, we calculate the cumulative

distribution function (CDF) of the third positive

control genes in the top 500 genes of

RWRHNMGL, RWR, RWRHN and KRWRHN,

shown in Fig 3(d), which represents the

relationships between the gene ranks and the

cumulative percentage of positive control genes by

different algorithms. The proposed algorithm

achieves a distinct improvement in two evaluation

metrics for identifying AD essential genes in

comparison with the other methods.

Fig. 3: ROC curves of RWRHNMGL, RWR,

RWRHN and KRWRHN in three control groups. (a)

ROC curves of the first control group, (b) ROC

curves of the second control group, (c) ROC curves

of the third control group, (d) The CDF of the ranks

Another statistical measure for model performance

is obtained from the Mean Reciprocal Rank (MRR),

which is computed as follows:

MRR N ran i





(12)

Where, ran i is the rank of the gene i, N is the

number of genes.

In this paper, N represents the number of genes in

the positive control group, the higher MRR value

means the genes in the positive control group get

smaller average rank, which shows the result is

reliable. We compute the MRR values of genes in

the three positive control group by different

methods, shown in Table 2, RWRHNMGL achieves

best performance with the highest MRR value.

Table 2. MRR of RWRHNMGL, RWR, RWRHN

and KRWRHN

control

group

RWRHNMGL

RWR

RWRHN

KRWRHN

first

0.01201205

0.011707858

0.00660667

0.007274614

second

0.046685512

0.002488351

0.027417649

0.042457116

third

0.011763921

0.010807044

0.006534787

0.007253108

RWRHNMGL has three parameters： weighting

coefficient η, jumping probability , and restart

probability . The importance of each sub-network

is weighted by this parameter η. The algorithm

works slightly better when  is set as 0.5 according

to the literature, [30]. The reinforcement between

the PPI network and the phenotypic similarity

network is controlled by parameter . This paper

takes    to make the phenotype similarity

network as important as PPI networks in predicting

essential genes as proposed by literature, [31]. The

parameter  is the restart probability, which

determines the possibility of jumping from any node

back to the starting points. RWRHNMGL is also

run with values of  ranging from 0.1 to 0.9 in steps

of 0.1 to evaluate the influence of the parameter. As

shown in Fig 4, AUC reached a maximum of 0.988

at   0.7. Therefore, the restart probability  is

fixed at 0.7 in this paper.

WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS

DOI: 10.37394/232011.2022.17.20

Haiyan Guo, Shujuan Cao,

Chen Zhou, Xiaolu Wu, Yongming Zou

E-ISSN: 2224-3429

163

Volume 17, 2022

Fig. 4: ROC curves of RWRHNMGL, RWR,

RWRMGL, RWRHNGL and RWRHNM

5 Conclusion

AD is a complex multi-genic neurodegenerative

disorder, the clinical drugs only relieve symptoms

but not cure the disease. The identification of

essential genes associated with AD is crucial in

illness prevention, diagnosis and therapy, which

aims at the fundamental prevention and effective

therapeutics.

In this paper, a refined random walk algorithm

RWRHNMGL is proposed to identify AD essential

genes based on the framework of RWR. The PGHN

is constructed from multiple data sources by

considering similar information. These nodes of the

optimal module, selected by module partition and

covers most functions of the AD gene network, are

set as gene seeds. A refined random walk algorithm

is developed to work in the PGHN, the transition

matrix is modified by adding a gravity-like method

based on subcellular location information, and

candidate genes are scored and ranked by a stable

probability vector. Finally, ROC and MRR are used

to evaluate the prediction results of RWRHNMGL.

These results show that the RWRHNMGL

algorithm performs better in predicting essential

genes of AD.

Some improvements may be necessary for future

research. First, gene seeds are taken from the key

module, which covers most functions of the AD

gene network. It may be an interesting direction by

combining module partition with the random walk

algorithm. Second, adding a gravity-like method

based on subcellular location information to the

random walk algorithm will solve the problem of

false positives to some extent.

Acknowledgments:

This research is funded by The Science Fund of

Tianjin Education Commission for Higher

Education, grant number 2019KJ025. This paper is

in memory of professor Jishou Ruan.

References:

[1] P. Pandey, M. Singh, I. Gambhir, Alzheimer's

disease&58; A Threat to mankind. Journal of

Stress Physiology & Biochemistry,Vol. 7,

2011, pp. 15-30.

[2] A.M. Glazier, J.H. Nadeau, T.J. Aitman,

Finding Genes That Underlie Complex Traits.

Science,Vol.298, No.5602, 2002, pp. 2345-

2349.

[3] R. Melchiotti, D. Liberati. Candidate gene

discriminating gliomas identification via a

supervised iteration of bipartitive k-means

initialised via partititve division according to

principal components. WSEAS Transactions

on Biology and Biomedicine, Vol. 15, 2018,

pp. 87-100.

[4] V. Hima, P. Namboori. Identification of

Lapatinib Derivatives and Analogs to Control

Metastatic Breast Cancer-specific to South

Asian Population - a Pharmacogenomic

Approach. WSEAS Transactions on Biology

and Biomedicine, Vol.18, 2021, pp. 51-62.

[5] W. Jason, K. Rui, L. Christina, S.N. William,

Protein Ranking by Semi-Supervised Network

Propagation. BMC Bioinformatics,

Vol.7,No.14, 2006, pp. 2345-2349.

[6] L.J. Zhu, J. Xiang, Q.L. Wang, A.L. Wang, C.

Li, G. Tian, H.J. Zhang, S.Z. Chen,

Revealing the Interactions Between Diabetes,

Diabetes-Related Diseases, and Cancers

Based on the Network Connectivity of Their

Related Genes. Frontiers in Genetics,

Vol.11,No.14, 2020, pp. 617136.

[7] J. Luo, S. Liang, Prioritization of potential

candidate disease genes by topological

similarity of protein-protein interaction

network and phenotype data. Journal of

Biomedical Informatics, Vol.53,No.7, 2015,

pp. 229-236.

[8] S.W. Zhang, D.D. Shao, S.Y. Zhang,

Prediction of risk pathogenic genes based on

pathogenic gene network modularity. Journal

of Biophysics, Vol.11,No.3, 2014, pp. 11.

[9] B. Xu, Y. Liu, S. Yu, L. Wang, J. Dong, H.F.

Lin, Z.H. Yang, J. Wang, F. Xia, A network

embedding model for pathogenic genes

prediction by multi-path random walking on

heterogeneous network. BMC Medical

Genomics, Vol.12,No.5, 2019, pp. 118.

[10] P.G. Joana, P.F. Alexandre, M. Yves, C.M.

Sara, Interactogeneous: Disease Gene

Prioritization Using Heterogeneous Networks

0,95

0,955

0,96

0,965

0,97

0,975

0,98

0,985

0,99

0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9

AUC

WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS

DOI: 10.37394/232011.2022.17.20

Haiyan Guo, Shujuan Cao,

Chen Zhou, Xiaolu Wu, Yongming Zou

E-ISSN: 2224-3429

164

Volume 17, 2022

and Full Topology Scores. PLOS ONE, Vol.7,

No.11, 2019, pp. e49634.

[11] C.Y. Wang, J. Zhang, X.P. Wang, K. Han,

M.Z. Guo. Pathogenic Gene Prediction

Algorithm Based on Heterogeneous

Information Fusion. Frontiers in

genetics,Vol.11, No.4, 2020, pp. 1-13.

[12] C. Dursun, A Kwitek, S Bozdag,

PhenoGeneRanker: Gene and Phenotype

Prioritization Using Multiplex Heterogeneous

Networks. IEEE/ACM transactions on

computational biology and bioinformatics,

2021,PP.

[13] I. Lee, U.M. Blom, P.I. Wang, J.E. Shim,

E.M. Marcotte, Prioritizing candidate disease

genes by network-based boosting of genome-

wide association data. Genome Res. Vol.21,

No.7, 2011, pp. 1109-1121.

[14] J.W. Lin, Biological network node

classification based on network representation

learning. Xiamen University, 2019.

[15] M.A. Driel, J. Bruggeman, G. Vriend, H.G.

Brunner, J.A. Leunissen. A text-mining

analysis of the human phenome. Eur J Hum

Genet, Vol.14, No.5, 2006, pp. 535-542.

[16] A Hamosh, A.F. Scott, J.S. Amberger, C.A.

Bocchini, V.A. McKusick, Online Mendelian

Inheritance in Man (OMIM), a

knowledgebase of human genes and genetic

disorders. Nucleic Acids Res,Vol.33, 2005, pp.

514-517.

[17] P. Janet, Q.R. Núria, B. Àlex, et al.

DisGeNET: a discovery platform for the

dynamical exploration of human diseases and

their genes. Database, 2015,bav028.

[18] B. Ljubic, M. Pavlovski, S. Roychoudhury,

N.C. Van, A. Salhi, M. Essack, V.B. Bajic,

Obradovic Z. Genes and comorbidities of

thyroid cancer. Informatics in Medicine

Unlocked,2021.

[19] S. Kohler, S. Bauer, D. Horn, P.N. Robinson.

Walking the interactome for prioritization of

candidate disease genes. Am J Hum

Genet,Vol. 82,No.4, 2008, pp. 949-958.

[20] S. Köhler, M. Gargano, N. Matentzoglu, L.C.

Carmody, P.N. Robinson, The Human

Phenotype Ontology in 2021. Nucleic acids

research,2020.

[21] C Zhou, H.Y. Guo, S.J. Cao, Gene Network

Analysis of Alzheimer’s Disease Based on

Network and Statistical Methods.

Entropy ,Vol. 23,No.10, 2021, pp. e2310136.

[22] M. Li, P. Ni, X. Chen, J. Wang, F.X. Wu, Y.

Pan, Construction of Refined Protein

Interaction Network for Predicting Essential

Proteins. IEEE/ACM Transactions On

Computational Biology And Bioinformatics,

Vol. 16,No.4, 2019, pp. 1386-1397.

[23] J.X. Binder, F.S. Pletscher, K. Tsafou, et al.

COMPARTMENTS: Unification and

visualization of protein subcellular

localization evidence, Database,

2014,pp.bau012.

[24] L.M. Lin, T.H. Yang, L. Fang, J. Yang, F.

Yang, J. Zhao. Gene gravity-like algorithm

for disease gene prediction based on

phenotype-specific network. BMC systems

biology,Vol. 11,No.1, 2017, pp. 121.

[25] S.J. Cao, L. Yu, J.Y. Mao, et al. Uncovering

the Molecular Mechanism of Actions between

Pharmaceuticals and Proteins on the

Alzheimer’s Disease Network. Plos One, Vol.

10,No.12, 2015, pp. e0144387.

[26] D.W. Huang, B.T. Sherman, R.A. Lempicki.

Systematic and integrative analysis of large

gene lists using DAVID Bioinformatics

Resources. Nature Protoc, Vol. 4,No.1, 2009,

pp. 44-57.

[27] T. Fan, J. Wang.The prediction of potential

risk genes for Alzheimer' s disease. Aerospace

medicine and medical engineering, Vol. 32,

No.6, 2019, pp. 497-502.

[28] C. Joachim, D. Games, J. Morris, et al.

Antibodies to non-beta regions of the beta-

amyloid precursor protein detect a subset of

senile plaques. American Journal of

Pathology, Vol. 138,No.2, 1991, pp. 373-384.

[29] L.K. Lu, X. Yu, Y.L. Cai, M. Sun, H. Yang,

Application of CRISPR/Cas9 in Alzheimer’s

Disease. Frontiers in Neuroscience,2021.

[30] Y.J. Li, J.C. Patra. Genome-wide inferring

gene-phenotype relationship by walking on

the heterogeneous network.

Bioinformatics,Vol. 26,No.9, 2010, pp. 1219-

1224.

[31] C. Lei, J. Ruan. A novel link prediction

algorithm for reconstructing protein-protein

interaction networks by topological similarity.

Bioinformatics. Vol. 29,No.3, 2013, pp. 355-

364.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on APPLIED and THEORETICAL MECHANICS

DOI: 10.37394/232011.2022.17.20

Haiyan Guo, Shujuan Cao,

Chen Zhou, Xiaolu Wu, Yongming Zou

E-ISSN: 2224-3429

165

Volume 17, 2022