Exploring Feature Selection and Classification Algorithms

For Cardiac Arrhythmia Disease Prediction

RAVINDER AHUJA1, SC SHARMA2

1 Galgotias University Greater Noida and IIT Roorkee Saharanpur Campus, INDIA

2 IIT Roorkee Saharanpur Campus, INDIA

Abstract: Cardiac Arrhythmia is the disease in which heartbeats abnormally due to which death of a

person may occur if not diagnosed on time. Timely and accurate detection of cardiac arrhythmia can save

the life of the patient. In this study fourteen classification algorithms and six feature selection algorithms

are explored to find the best combination which can accurately detect cardiac arrhythmia. On the features

selected through feature selection techniques fourteen classification algorithms are applied to classify

cardiac arrhythmia. The random forest algorithm for feature selection and random forest classification

algorithm found best among all the models applied with an accuracy of 86.57%, precision 79.12%, recall

79.12%, and f1-score 79.12%.

Keywords: Cardiac arrhythmia, Confusion Matrix, Disease prediction, Ensemble Algorithms, Feature

Selection, multiclass Classification.

Received: June 14, 2021. Revised: June 10, 2022. Accepted: July 8, 2022. Published: August 5, 2022.

1. Introduction

Heart disease have affected the significant amount of the

population, not only in India but all over the world. Any

sort of irregular behavior in the heart may be life-

threatening. A tool can be used for diagnosing the proper

activity of heart, known as ElectroCardioGram (ECG),

which produces a graph for electrical pulses [1]. Specific

parameters are taken into account, for the proper

examination of the heart. If there is a slight change in

these parameters, it results in the ailment of heart, and it

may occur due to several reasons [2]. An arrhythmia is a

form of irregularity detected in the electronic pulses

generated by the heart, and it may occur due to several

reasons. If left untreated for a long time, it poses a threat

to human life and may lead to a cardiac arrest. Therefore,

accurate detection and classification of arrhythmia are

essential. These conditions may cause the heart to beat

fast or slow, skipping some beats. Because of these types

of behaviors, ECG will form different graphical patterns,

and therefore, arrhythmia can be easily detected [3].

There are generally two different categories, one which

causes the heart to beat too slowly, usually below 60 rpm

known as Bradycardia, and another one causes the heart

to beat at 100 rpm, known as Tachycardia [4]. There are

other types of arrhythmia too. Arrhythmias can be

identified by the location point of its occurrence in heart

and by the change in the rhythms generated by the heart.

Supraventricular arrhythmia starts in atria; therefore, it is

also known as a trial arrhythmia. But sometimes, these

ECG recordings are of long duration, and it raises

difficulties for a doctor to look at those and find

irregularities [5]. Therefore, Machine Learning can be

used

for the automation of arrhythmia diagnosis, and it will be

quite helpful. In this paper, six feature selection

techniques are applied to reduce the dimension and

further fourteen classification algorithms, and their

ensemble has been applied to classify into one of the

sixteen classes. The major points of the work done in this

study is as follows:

1. Six feature selection and fourteen classification

techniques are explored for finding the optimized

combination which can give better performance.

2. Due to small size of dataset cross validation technique

is applied with different values of k (2, 5, and 10). Best

results are reported at K=10.

The rest of the paper is divided into the following

sections: section 2 contains related work, section 3

contains dataset description, preprocessing techniques

used, feature selection techniques used, and classification

algorithms, and methodology used, section 5 contains

experimental results, and section 6 contains conclusion.

2. Related Work

So many techniques in the past have been developed for

cardiac arrhythmia detection. Principal Component

Analysis approach was applied on the ECG dataset to

reduce the features, and further six neural networks were

used to classify the records into normal or having cardiac

arrhythmia [6]. The development in the field of

automation of ECG analysis and recording the patient's

WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE

DOI: 10.37394/23208.2022.19.19

Ravinder Ahuja, SC Sharma

E-ISSN: 2224-2902

168

Volume 19, 2022

health status was reported by [7]. Some of these methods

that have helped in the development of ECG analysis are

neural networks [8], and self-organizing map [9]. The first

action that anyone has to take in saving the patient's life is

the proper diagnosing of arrhythmia [10]. The missing

values in the data set also play an important role. In

Standard ECG (12-lead), missing value is replaced by the

nearby value in the attribute, and a multilayer perceptron

algorithm is applied for the classification of arrhythmia

[11]. In paper [12], authors have used a correlation-based

feature selection technique to select essential features

from the UCI repository dataset. The neural network

algorithm is applied along with the Levenberg-Marquardt

method for the classification of arrhythmia. The random

forest classification algorithm is implemented with

resampling technique is used to classify cardiac

arrhythmia [13]. In paper [14], firstly, various

preprocessing techniques are applied first. Various feature

selection techniques are applied, and last Different

classification algorithms, including neural network,

random forest, gradient boosting, are used on the ECG

dataset. In paper [15-17], authors have applied various

classification algorithms on the dataset to classify into 16

classes. In paper [18] PCA (Principal Component

Analysis) feature selection technique is used to extract

essential features, and further SVM classification

algorithm is applied to classify arrhythmia. In paper [19],

authors have applied neural networks on the dataset for

prediction of cardiac arrhythmia and achieved an accuracy

of 76.67%. In paper [20], authors have applied the Naïve

Bayes classification algorithm with the train-test split

ratio of 70-30 and achieved an accuracy of 70.50%. In

paper [21], authors have applied feature selection

techniques in two steps: the wrapper method part and the

filtering part. Further SVM and KNN classification

algorithms are applied on the dataset, and the best

accuracy (73.80%) is achieved with 20-fold cross-

validation. In paper [22], authors have firstly replaced the

missing value with the closest value and applied feature

selection technique, and a total of 198 features were

selected. Further, modular Neural Network with three

layers was used for classification and achieved an

accuracy of 78.89% with a train-test split ratio of 90-10.

3. Materials and Methods

The methodology applied has different components which

are as follows:

3.1. Dataset Description: The dataset is taken from

the UCI machine learning repository [23]. The dataset

consists of 452 rows corresponding to each patient. Two

hundred seventy-nine attributes are recorded for each

patient like age, weight, ECG related data, heart-related

data, QRS duration, etc. There are sixteen classes

associated with the dataset. Class 1 is corresponding to no

arrhythmia and class 2 to 15 corresponding to different

types of arrhythmia. Class 16 is corresponding to the

unlabeled patient. Two hundred forty-five rows are

corresponding to class 1 label, remaining 185 classes

corresponding to 14 classes, and 22 rows are unlabeled.

3.2. Data Pre-processing: The dataset contains

missing values and outliers. The missing values are filled

with mean. The outliers are detected using the

interquartile range and handled with logarithm. A

standard scalar is used to normalize the data.

3.3. Feature Selection Methods: The dataset

contains 279 features. To reduce complexity and make

training faster, six feature selection techniques are

applied. The feature selection techniques used in our

study are as follows: (i) Random Forest (RF) [24] (ii)

Analysis of Variance (ANOVA) [25] (iii) Variance

Threshold (VT) [26] (iv) Dropping Highly Correlated

Feature [27] (v) Recursive Feature Elimination [28] (vi)

Chi-Square Test [29].

3.4. Classification Algorithms: Fourteen

classification algorithms are applied which are as follows:

(i) Support Vector Machine (SVM) [30] (ii) Bernoulli

Naive Bayes (NB) [31] (iii) Random Forest (RF) [32]

(no. of estimators as 1200 and random state as 42) (iv) K-

Nearest Neighbors (KNN) [33] (Euclidean distance, and

the number of neighbors taken is three) (v) Logistic

Regression [34] (vi) Decision Tree [35] (vii) Averaging:

[36] (viii) Bagging: [37] (linear kernel with the value of C

= 1) (ix) Light GBM [38] (x) AdaBoost [39] (xi)

XGBoost [40] (learning rate = 0.01) (xii) Stacking [41]

(xiii) ID3 [42] (xiv) Majority Voting [43]

WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE

DOI: 10.37394/23208.2022.19.19

Ravinder Ahuja, SC Sharma

E-ISSN: 2224-2902

169

Volume 19, 2022

5. Experimental Results

The four performance evaluation parameters namely

accuracy, f1-score, precision, and recall are considered.

The hyper parameters of classification algorithms are

tuned using a grid search approach. The dataset after

cleaning process is given as input to feature selection

algorithms and further, fourteen classification algorithms

are applied. The dataset considered is small so, f-fold

cross validation technique is applied considering k to be

2, 4, 5, and 10 which are presented in this section. The

best results are obtained with the value of k=10.

Corresponding to random forest feature selection and

fourteen classification algorithms results are presented in

Table 1. The threshold value is set to 0.001, which returns

163 features in random forest feature selection technique.

It is observed from the results that random forest classifier

has produced better results with an accuracy of 86.57%,

recall of 79.12%, precision of 79.12%, and F1-score of

79.12%.

Next to the random forest is the maximum voting

classifier giving an accuracy of 71.93, precision 77.12%,

recall 77.12%, and f1-score 77.12%. The next to

maximum voting is averaging, giving an accuracy of

73.05, precision 76.92%, recall 76.92%, and f1-score

76.92%. The worst performance is given by support

vector machine classifier giving an accuracy 55.34%,

precision 58.24%, recall 58.24%, and f1-score 58.24%.

Corresponding to variance threshold feature Selection and

fourteen classification algorithms results are presented in

Table 2. Threshold Value of 0.01 is set up for selecting

features, and 195 features are returned. It is observed from

the results that random forest classifier has produced

better results with an accuracy of 80.21%, recall of

78.23%, precision of 78.23%, and F1-score of 78.23%.

Next to the random forest is the maximum voting

classifier giving an accuracy of 73.22%, precision

77.12%, recall 77.12%, and f1-score 77.12%. The next to

maximum voting is ID3 giving an accuracy of 71.23,

Table 1: Results based on Random Forest feature Selection

Technique

Algorithm

Accuracy

F1-Score

Recall

Precision

Naive Bayes

68.78

72.52

Decision Tree

60.55

57.14

KNN

57.77

63.73

SVM

55.34

58.24

Random Forest

86.57

79.12

Bagging

64.72

69.23

Adaboost

61.67

61.43

Averaging

71.93

76.92

XGBoost

71.38

75.82

Light GBM

71.33

76.24

Max Voting

73.05

77.12

Logistic

Regression

62.67

65.93

Stacking

71.38

75.82

ID3

65.89

70.55

C1-C14 are classifiers

Figure 1: The overview of the methodology use

WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE

DOI: 10.37394/23208.2022.19.19

Ravinder Ahuja, SC Sharma

E-ISSN: 2224-2902

170

Volume 19, 2022

precision 74.85%, recall 74.85%, and f1-score 74.85%.

The KNN classifier has produced poor results among all,

giving an accuracy of 58.05%, precision 63.73%, recall

63.73%, and f1-score 63.73%. Corresponding to ANOVA

feature selection and fourteen classification algorithms

results are presented in Table 3. The number of selected

features is set up manually as 55, so it will return 55

features. It is observed from the results that random forest

classifier has produced better results with an accuracy of

76.87%, recall of 78.29%, precision of 78.29%, and F1-

score of 78.29%. Next to the random forest is the

maximum voting classifier giving an accuracy of 75.27%,

precision 77.21%, recall 77.21%, and f1-score 77.21%.

The next to maximum voting is stacking, giving an

accuracy of 70.83, precision 75.82%, recall 75.82%, and

f1-score 75.82%. The light GBM classifier has produced

poor results among all with an accuracy 59.55%,

precision 62.22%, recall 62.22%, and f1-score 62.22%.

Corresponding to CHI-2 feature selection and fourteen

classification algorithms results are presented in Table 4.

We set the number of features manually to be 69, so all

the algorithms will work on the best 69 features. It is

observed from the results that random forest classifier and

maximum voting has produced better results among all

the classification algorithms and giving an accuracy of

74.22%, recall of 77.70%, precision of 77.70%, and F1-

score of 77.70%. The next best performing classifier is

stacking, giving an accuracy of 72.36%, precision

75.39%, recall 75.39%, and f1-score 75.39%. The worst

performance is given by SVM classifier, giving an

accuracy 55.34%, precision 58.24%, recall 58.24%, and

f1-score 58.24%.

Table 2: Results based on Variance Threshold feature Selection

Technique

Algorithm

Accuracy

F1-Score

Recall

Precision

Naive Bayes

66.72

72.52

Decision Tree

59.72

58.24

KNN

58.05

63.73

6373

SVM

68.72

71.42

Random

Forest

80.21

78.23

Bagging

54.72

63.73

Adaboost

61.67

63.73

Averaging

70.67

74.35

Gradient

Boosting

71.11

74.72

Light GBM

58.22

61.31

MaxVoting

73.22

79.12

Stacking

71.11

74.72

Logistic

Regression

67.58

69.23

ID3

71.23

74.85

WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE

DOI: 10.37394/23208.2022.19.19

Ravinder Ahuja, SC Sharma

E-ISSN: 2224-2902

171

Volume 19, 2022

Corresponding to dropping correlated feature selection

feature selection and fourteen classification algorithms

results are presented in Table 5. The correlation factor of

0.25 is used through which 221 features were dropped. It

is observed from the results that random forest classifier

has produced better results with an accuracy of 80.22%,

precision of 79.12%, recall of 79.12%, and F1-score of

79.12%. Next to the random forest is maximum voting

and XGBoost classifier giving an accuracy of 72.83%,

precision 76.92%, recall 76.92%, and f1-score 76.92%.

The KNN classifier has produced poor results among all

with an accuracy of 58.05%, precision 60.73%, recall

60.73%, and f1-score 60.73%.

Corresponding to recursive feature elimination feature

selection and fourteen classification algorithms results are

presented in Table 6. It is observed from the results that

random forest classifier has produced better results with

an accuracy of 74.72%, precision of 66.63%, recall of

66.63%, and F1-score of 66.63%. Random forest is a

collection of decision tree methods. Each decision tree

constructs a classifier based on a random data sample, and

several classifiers are integrated to generate a single

classifier known as random forest.

Table 3: Results based on the ANOVA Technique

Algorithm

Accuracy

F1-Score

Recall

Precision

Naive Bayes

64.44

67.62

Decision Tree

63.33

66.63

KNN

61.38

65.93

SVM

67.77

69.30

Random Forest

76.87

78.29

Bagging

64.72

67.77

Adaboost

63.05

Averaging

68.14

71.81

Gradient

Boosting

69.83

74.28

Light GBM

59.55

62.22

Max Voting

75.27

77.21

Stacking

70.83

75.82

Logistic

Regression

65.93

66.63

ID3

59.99

63.83

Table 4: Results based on CHI-2 feature Selection

Technique

Algorithm

Accuracy

F1-

Score

Recall

Precision

Naive Bayes

64.72

67.77

Decision

Tree

63.33

66.63

KNN

61.38

65.93

SVM

55.34

58.24

Random

Forest

74.22

77.70

Bagging

59.55

62.22

Adaboost

60.83

63.33

Averaging

63.64

67.29

Gradient

Boosting

65.82

68.30

Light GBM

58.24

61.62

MaxVoting

74.22

77.70

Stacking

72.36

75.39

Logistic

Regression

65.21

69.87

ID3

67.85

72.54

WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE

DOI: 10.37394/23208.2022.19.19

Ravinder Ahuja, SC Sharma

E-ISSN: 2224-2902

172

Volume 19, 2022

Comparison of different approaches reported in literature

on the same dataset are compared with our approach and

presented in Table. The results showed that the

methodology designed in this work performed around 3%

better. It is found from the results presented in Tables 1 to

6 that random forest classifier has performed better in

comparison with fourteen classifiers concerning six

feature selection techniques. Random forest is a robust

classifier since it produces its result by combining the

outputs of many decision trees (created by dividing the

training data into subsets). Furthermore, the random forest

feature extraction technique assisted us in selecting

features from a large category with high weightage in

categorising the cardiac arrhythmia. Figure 2 gives a

comparison of six different feature selection techniques

used in our study. The table provides the performance

parameters reported in each of the feature selection

techniques. It has been observed that the random forest

feature selection technique is performing better as

compared to all other feature selection techniques applied.

Table 5: Results based on Drop Correlated feature Selection

Technique

Classification

Algorithm

Accuracy

F1-Score

Recall

Precision

Naive Bayes

67.16

71.42

Decision Tree

62.77

61.53

KNN

58.05

60.73

SVM

66.72

71.42

Random

Forest

80.22

79.12

Bagging

66.31

75.82

Ada Boost

62.22

61.53

Averaging

72.10

75.65

XGBoost

72.83

76.92

Light GBM

68.37

71.11

Max Voting

72.83

76.92

76..92

76.92

Logistic

Regression

64.11

69.23

Stacking

70.83

76.92

ID3

67.77

71.54

Table 6: Results based on Recursive Feature Elimination Feature

Selection Technique

Algorithm

Accuracy

F-Score

Recall

Precision

Naive Bayes

67.77

57.14

Decision Tree

68.61

54.94

54..94

KNN

59.16

53.80

53.80.

53.84

SVM

55.55

49.45

4945

49.45

74.72

66.63

Bagging

66.38

57.14

Ada Boost

66.94

58.24

Averaging

74.07

64.69

XGBoost

72.77

63.73

Light GBM

67.50

58.94

Max Voting

72.77

64.63

66.38

56.74

56.72

56.74

Stacking

72.77

64.63

ID3

64.32

57.14

WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE

DOI: 10.37394/23208.2022.19.19

Ravinder Ahuja, SC Sharma

E-ISSN: 2224-2902

173

Volume 19, 2022

Figure 2: Comparison of performance concerning feature

selection techniques

Kruskal-Wallis Test: This is a statistical test. H0 (Null

Hypothesis): There is no significant difference between

the performances of models. Alpha taken is 0.05 and from

the p-value observed from the test, which is less than

alpha, so we can say that the null hypothesis is rejected.

This test is performed with the scipy package in python.

Table 8 shows the p-value and test statistic observed for

the accuracy variable.

6. Conclusion

The paper presents a machine learning framework for

the detection of a multiclass arrhythmia. Firstly dataset is

preprocessed to make it suitable for further processing,

and then six feature selections were applied to extract the

essential features from the dataset having 279 attributes.

Fourteen classification algorithms are applied, including

some of the ensemble techniques also to detect the

presence of cardiac arrhythmia and classify them into

different sixteen categories. All the fourteen classification

algorithms were compared based on accuracy precision,

recall, and f-score. It is observed that the best

performance is reported in random forest feature selection

with random forest classification algorithms from all the

combinations of six feature selection techniques and

fourteen classifiers. The highest performance achieved

with this combination is the accuracy of 86.57%, 79.12%.

It is also observed that a random forest classifier is

performing better among all fourteen classifiers in all the

six feature selection techniques applied. Thus it can be

concluded from this study that random forest is the best

performer classifier in this dataset, and along with

Random forest classifier, it performs better among all the

combinations of feature selection and classifiers applied.

References

[1] J Zuo, W. M., Lu, W. G., Wang, K. Q., & Zhang, H.

(2008, September). Diagnosis of cardiac arrhythmia using

kernel difference weighted KNN classifier. In 2008

Computers in Cardiology (pp. 253-256).IEEE.

[2] Alickovic, E., &Subasi, A. (2016).Medical decision

support system for diagnosis of heart arrhythmia using

DWT and random forest classifier. Journal of medical

systems, 40(4), 108.

[3] Kumar, S. U., &Inbarani, H. H. (2017). Neighborhood

rough set based ECG signal classification for diagnosis of

cardiac diseases. Soft Computing, 21(16), 4721-4733.

[4] Özbay, Y., &Karlik, B. (2001). A recognition of ECG

arrhythmias using artificial neural networks. SELUK

UNIV KONYA (TURKEY) ELECTRICAL AND

ELECTRONICS ENGINEERING.

[5] Gupta, V., Srinivasan, S., & Kudli, S. S. (2014). Prediction

and Classification of Cardiac Arrhythmia.

[6] A. M. Elsayad, “Classification of ECG arrhythmia using

learning vector quantization neural networks,” in

Proceedings of the 2009International Conference on

Computer Engineering and Systems, ICCES'09, pp. 139–

144, egy, December 2009.

[7] Raut, R. D., &Dudul, S. V. (2008, July).Arrhythmias

classification with MLP neural network and statistical

analysis. In 2008 First International Conference on

Emerging Trends in Engineering and Technology (pp. 553-

558). IEEE.

[8] Yu, S. N., & Chen, Y. H. (2007). Electrocardiogram beat

classification based on wavelet transformation and

probabilistic neural network. Pattern Recognition

Letters, 28(10), 1142-1150.

[9] Hussain, H., &Fatt, L. L. (2007, December). Efficient ECG

signal classification using a sparsely connected radial basis

function neural network. In Proceeding of the 6th WSEAS

International Conference on Circuits, Systems,

Electronics, Control, and Signal Processing (pp. 412-416).

[10] Bhardwaj, P., Choudhary, R. R., &Dayama, R. (2012).

Analysis and classification of cardiac arrhythmia using

ECG signals. International Journal of Computer

Applications, 38(1), 37-40.

[11] S. M. Jadhav, S. L. Nalbalwar, and A. A. Ghatol,

“Arrhythmia disease classification using Artificial Neural

Table 7: Comparison with Existing approaches

Paper

Accurac

y (%)

Classification

Algorithms

A. Batra and V.

Jawa[15]

84%

SVM

E.Namsrai et al. [20]

70.50%

K. A., K.Niazi e. al. [21]

73.80%

KNN

D. Gao et al. [19]

76.67%

ANN

S.M. Jadhavet. al. [22]

78.89%

Modular NN

Vasu Gupta et. al.[5]

77.4%

RF+SVM

Our Approach

86.57%

Random

Forest Feature

Selection +

Random

Forest

Classifier

Table 8: Kruskal-Wallis Test Statistics

Test

Statistic

Variable

P-Value

59.571

Accuracy

p=0.000000062666

RF VT AFS CHI-

2DCF RFE

Accuracy 86,57 79,12 80,21 74,22 79,12 74,72

Precision 79,12 78,23 77,21 77,7 79,12 66,63

Recall 79,12 78,23 77,21 77,7 79,12 66,63

F1-Score 79,12 78,23 77,21 77,7 79,12 66,63

Performance(%)

Accuracy Precision Recall F1-Score

WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE

DOI: 10.37394/23208.2022.19.19

Ravinder Ahuja, SC Sharma

E-ISSN: 2224-2902

174

Volume 19, 2022

Network model,” in Proceedings of the 2010 IEEE

International Conference on Computational Intelligence

and Computing Research, ICCIC 2010, pp. 653–656,

India, December 2010.

[12] M. Mitra and R. Samanta, "Cardiac Arrhythmia

Classification Using Neural Networks with Selected

Features, "Procedia Technology, vol. 10, pp. 76–84, 2013.

[13] A.Ozc¸ift, "Random forests ensemble classifier trained

with data resampling strategy to improve cardiac

arrhythmia diagnosis, "Computers in Biology and

Medicine, vol. 41, no. 5, pp. 265–271,2011.

[14] A. Batra and V. Jawa, “Classification of Arrhythmia Using

ConjunctionofMachine Learning Algorithms and ECG

DiagnosticCriteria,” Training Journal, 1975.

[15] T. Soman and P. O. Bobbie, "Classification of arrhythmia

using machine learning techniques," WSEAS Transactions

on Computers, vol. 4, no. 6, pp. 548–552, 2005.

[16] A. Fazel, F. Algharbi, and B. Haider, Classification of

CardiacArrhythmias Patients, Haider B Classification of

CardiacArrhythmias Patients.

[17] S. Samad, S. A. Khan, A. Haq, and A. Riaz, "Classification

of arrhythmia," International Journal of Electrical Energy,

vol. 2, no.1, pp. 57–61, 2014.

[18] N. Kohli and N. Verma, "Arrhythmia classification using

SVM with selected features," International Journal of

Engineering, Science and Technology, vol. 3, no. 8, pp.

22–31, 2012.

[19] D. Gao, M. Madden, D. Chambers, and G. Lyons,

"Bayesian ANN Classifier for ECG arrhythmia diagnostic

system: A comparison study," in Proceedings of the

International Joint Conference on Neural Networks,

IJCNN 2005, pp. 2383–2388, Canada, August 2005.

[20] E.Namsrai, T.Munkhdalai, M. Li, J. Shin, O.Namsrai, and

K.H.Ryu, "A feature selection-based ensemble method for

arrhythmia classification," Journal of Information

Processing Systems, vol. 9, no. 1, pp. 31–40, 2013.

[21] K. A. K.Niazi, S. A. Khan, A. Shaukat, and. Akhtar,

"Identifyingbest feature subset for cardiac arrhythmia

classification," in Proceedings of the Science and

Information Conference, SAI 2015, pp. 494–499, UK, July

2015.

[22] S.M. Jadhav, S. L.Nalbalwar, and A. A. Ghatol, "Modular

neural network-based arrhythmia classification system

using ECG signal data," International Journal of

Information Technology and Knowledge Management1,

vol. 4, no. 1, pp. 205–209, 2011.

[23] https://archive.ics.uci.edu/ml/datasets/arrhythmia

[24] Díaz-Uriarte, R., & De Andres, S. A. (2006).Gene

selection and classification of microarray data using the

random forest. BMC bioinformatics, 7(1), 3.

[25] Ding, H., Feng, P. M., Chen, W., & Lin, H. (2014).

Identification of bacteriophage virion proteins by the

ANOVA feature selection and analysis. Molecular

BioSystems, 10(8), 2229-2235.

[26] Demir, Ö.,&YılmazÇamurcu, A. (2015). Computer-aided

detection of lung nodules using exterior surface features.

Bio-medical materials and engineering, 26(s1), S1213-

S1222.

[27] Koller, D., &Sahami, M. (1996). Toward optimal feature

selection. Stanford InfoLab.

[28] Granitto, P. M., Furlanello, C., Biasioli, F., &Gasperi, F.

(2006). Recursive feature elimination with random forest

for PTR-MS analysis of agroindustrial

products. Chemometrics and Intelligent Laboratory

Systems, 83(2), 83-90.

[29] Jin, X., Xu, A., Bie, R., &Guo, P. (2006, April). Machine

learning techniques and chi-square feature selection for

cancer classification using SAGE gene expression profiles.

In International Workshop on Data Mining for Biomedical

Applications (pp. 106-115).Springer, Berlin, Heidelberg.

[30] Schölkopf, B. (2005). Introduction to Kernel Methods. The

Analysis of Patterns, Erice, Italy.

[31] Vincent, P., &Bengio, Y. (2002).K-local hyperplane and

convex distance nearest neighbor algorithms.In Advances

in neural information processing systems (pp. 985-992).

[32] Liaw, A., & Wiener, M. (2002).Classification and

regression by randomForest. R news, 2(3), 18-22.

[33] Soucy, P., &Mineau, G. W. (2001).A simple KNN

algorithm for text categorization.In Proceedings 2001

IEEE International Conference on Data Mining (pp. 647-

648).IEEE.

[34] Tabaei, B., and Herman, W., A Multivariate logistic

regression equation to screen for diabetes. Diabetes Care

25:1999–2003, 2002.

[35] Freund, Y., & Mason, L. (1999, June). The alternating

decision tree learning algorithm. In icml (Vol. 99, pp. 124-

133).

[36] Sanders, S. R., Noworolski, J. M., Liu, X. Z., &Verghese,

G. C. (1990). Generalized averaging method for power-

conversion circuits. Technical report (No. AD-A-

221977/2/XAB; LIDS-P--1970).Massachusetts Inst. of

Tech., Cambridge, MA (USA).Lab. for Information and

Decision Systems.

[37] Dietterich, T. G. (2000, June). Ensemble methods in

machine learning.International workshop on multiple

classifier systems (pp. 1-15).Springer, Berlin, Heidelberg.

[38] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma,

W., ...& Liu, T. Y. (2017). Lightgbm: A highly efficient

gradient boosting decision tree.In Advances in Neural

Information Processing Systems (pp. 3146-3154).

[39] Korada, N. K., Kumar, N. S. P., &Deekshitulu, Y. V. N. H.

(2012). Implementation of naïve Bayesian classifier and

ADA-boost algorithm using maize expert system.

International Journal of Information Sciences and

Techniques (IJIST), 2.

[40] Ridgeway, G. (2007). Generalized Boosted Models: A

guide to the gbm package. Update, 1(1), 2007.

[41] Naess, O. E. (1979). Superstack—an iterative stacking

algorithm. Geophysical Prospecting, 27(1), 16-28.

[42] Rokach, L. (2010). Ensemble-based classifiers. Artificial

Intelligence Review, 33(1-2), 1-39.

[43] Umanol, M., Okamoto, H., Hatono, I., Tamura, H. I. R. O.

Y. U. K. I., Kawachi, F., Umedzu, S., & Kinoshita, J.

(1994, June). Fuzzy decision trees by fuzzy ID3 algorithm

and its application to diagnosis systems. In Proceedings of

1994 IEEE 3rd International Fuzzy Systems

Conference (pp. 2113-2118). IEEE.

[44] Ruta, D., &Gabrys, B. (2005).Classifier selection for

majority voting. Information fusion, 6(1), 63-81.

[45] Mukhopadhyay, S., &Sircar, P. (1996).Parametric

modeling of ECG signal. Medical and Biological

Engineering and Computing, 34(2), 171-174.

[46] Mustaqeem, A., Anwar, S. M., &Majid, M. (2018).

Multiclass classification of cardiac arrhythmia using

improved feature selection and SVM

invariants. Computational and mathematical methods in

medicine, 2018.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the Creative

Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en_US

WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE

DOI: 10.37394/23208.2022.19.19

Ravinder Ahuja, SC Sharma

E-ISSN: 2224-2902

175

Volume 19, 2022