
Stroke prediction model based on decision tree
YUHENG LIU, CHENXUAN ZHANG, XIAOYANG ZHENG, YUHAN LIU, JIANGPING HE
School of Artificial Intelligence, Liangjiang
Chongqing University of Technology
Chongqing, 401135, P.R.CHINA
Abstract: In this paper, the predictive model of stroke based on decision tree is implemented to predict the stroke
probability of ten samples by using Python language. The dataset of stroke is collected and is preprocessed, then
the Gini coefficients of each feature are calculated to select the division, and then the decision tree model is
obtained. Finally, the stroke probability is predicted for ten samples. In addition, Naive Bayes model is applied
to predict the stroke probability to compare with the decision tree method. The experimental results show that
older people with high blood pressure, heart disease, habitual smoking are more possible to have stroke, with a
prediction accuracy of 88% for decision tree method and 79% for Naive Bayes model, respectively.
Key-Words: Stroke prediction; Decision tree model; Naive Bayes model
Received: April 15, 2022. Revised: January 2, 2023. Accepted: February 3, 2023. Published: March 7, 2023.
1 Introduction
With the development and progress of society,
people's requirements for physical health are getting
higher and higher [1]. Stroke is an acute
cerebrovascular disease and is a group of diseases that
cause brain tissue damage due to the sudden rupture
of blood vessels in the brain or the inability of blood
to flow into the brain due to blood vessel blockage,
which poses a great threat to people's health [2].
Therefore, it is very important to understand the
connection between people's physical condition and
the probability of incidence and take different
precautions for different groups of people. In medical
diagnosis, time series disease prediction of
irreversible diseases is very important, and prediction
of future disease development can help patients
intervene in advance, which has great significance for
the effective control of diseases. Because of this,
machine learning algorithms are widely used in the
field of medical forecasting. In this paper, the
computational prediction of stroke probability using
decision tree models is obtained by the Python
language extension package.
2 Problem Formulation
2.1 Decision tree based on CART
The CART (Classification and Regression Tree)
algorithm is done in two parts, namely the generation
and pruning of the decision tree. We use the minimum
Gini index to choose the best features for constructing
a binary tree. The steps for constructing a CART
decision tree are as follows [2]:
1) After calculating the Gini index for all the labels,
the largest tag of the Gini index is selected as the
separation feature for branching.
2) All features in this label are calculated by the
Gini index, and the feature with the largest index
is also selected as the segmentation node, and the
above process is repeated until the Gini index
reaches the optimal, or the branching stops when
the threshold is reached.
3) Complete the construction of the decision tree.
For the classification problem, suppose that there is
a class, and the probability that the sample points
belong to the class is , then the Gini index of
WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE
DOI: 10.37394/23208.2023.20.3
Yuheng Liu, Chenxuan Zhang,
Xiaoyang Zheng, Yuhan Liu, Jiangping He