Parallel feature subset selection wrappers using k-means classifier
Abstract: - In a world where the volume of data is constantly increasing, the implementation time of various
processes increases significantly. Therefore, the proper management and the effort to reduce the dimensions
of the datasets are considered imperative. Feature selection can reduce the size of the datasets by keeping a
smaller subset, while improving the accuracy of the classification. The main purpose of this paper is to propose
and examine the efficiency of parallel feature selection wrappers based on k-means classifier. The simple k-
means algorithm and a parallel version of it are used. Different parallelization variants of feature subset selection
(fss) are presented and their accuracy and computation time are also evaluated on four different datasets. The
comparison is performed among different parallelization variations and the serial implementation of fss with the
k-means clustering algorithm. Finally, the results of the research are presented, highlighting the importance of
parallelization in reducing the execution time of the proposed algorithms.
Key-Words: - Classification, Feature selection, k-means, Parallelization, Wrappers
Received: May 9, 2022. Revised: January 17, 2023. Accepted: February 11, 2023. Published: March 9, 2023.
1 Introduction
Nowadays, the amount of data is constantly increas-
ing. Considering big data, the evolution of technol-
ogy and the internet, it is perceptible that the amount
of data is growing rapidly year by year, as well as
the diversity of the sources which produce it. Con-
sequently, there are systems and problems that con-
tain thousands of variables, which are likely to lead
to millions, or even billions of relationships with each
other. Furthermore, management and processing this
large amount of data, turns into a challenge for data
mining researchers and engineers.
Data mining is the process of extracting useful in-
formation, such as dependencies and patterns from
large amounts of data. More specifically, it helps to
draw useful conclusions and information from seem-
ingly unrelated data. Data mining consists of a set
of important steps, such as data cleaning and format-
ting, database creation and maintenance, data visual-
ization, using algorithms and features selection. Var-
ious methods are used to draw conclusions from the
data mining process. Some of these methods are re-
lated to association, prediction, classification, cluster-
ing analysis, decision trees and neural networks, [1].
Although all these methods are extremely powerful
and useful tools in the process of data mining, it is
very important to correctly distinguish which method
will be chosen to be used, depending on the specificity
of each problem and the information that is desired to
be extracted.
Clustering is an unsupervised method of pattern
recognition used in machine learning. The main goal
of this process is to better understand the underlying
information and structure of the dataset, depending
on the similarities between the data. This goal is ac-
complished by dividing the data into smaller subsets
(classes) according to the similarity of those belong-
ing to the same class and the differences from those
belonging to the other classes. Deep insights of the
dataset is not the primary concern of clustering, but it
emphasizes of data classification and considering that
it is an unsupervised method, it turns it into a very use-
ful preparatory stage before a deeper further analysis.
One of the most common clustering algorithms
which is widely used is k-means, [2]. k-means is
an unsupervised learning algorithm that defines the
computer how to group different objects based on
their attributes-features. Like any other algorithm,
NIKOLAOS PAPAIOANNOU1, ALKIVIADIS TSIMPIRIS1, CHRISTOS TALAGOZIS1,
LEONIDAS FRAGIDIS2, ATHANASIOS ANGEIOPLASTIS1, SOTIRIOS TSAKIRIDIS1,
DIMITRIOS VARSAMIS1
1Department of Computer, Informatics and Telecommunications Engineering,
International Hellenic University, Serres, GREECE
2Department of Management Science and Technology, International Hellenic University, Kavala, GREECE
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
76
Volume 20, 2023
k-means has its pros and cons. From one point of
view, it is easy to implement and to define subsets
from the original complex datasets without any other
preprocess. Most of the times it was found out that
k means algorithm is faster than hierarchical cluster-
ing and also very efficient in terms of computational
cost which is O(K*n*d) for the first iteration (assign-
ment) and O(n*d) while updating the centroids, [3].
On the other hand, although k-means and other clus-
tering methods, [4], are great at initially separating the
data into clusters, it depends heavily on the location
of the points, rather than the actual relation of the data
itself. Furthermore, it can give significantly different
results, depending on the number of clusters that are
set (k), which increases the importance of the initial
choice of the number of clusters(k). Last but not least,
it is also sensitive to the outliers.
Outliers and big data in general, could reduce the
efficiency and therefore the reliability of the k-means
algorithm, [5]. These kinds of datasets may contain
a large amount of unrelated and redundant informa-
tion, which are likely to affect the performance of the
classification algorithm. In order to avoid these prob-
lems, feature selection methods have been used, [6],
[7], [8].
Feature selection is often used as a preparatory
stage of machine learning or in conjunction with it,
[9]. The selection of the useful features is an impor-
tant step, in order to increase the efficiency of the clus-
tering and the classification algorithms. Especially,
when the analysis concerns problems of classification
and/or regression of many observable features, fea-
ture selection becomes an important stage of analy-
sis, which can reduce the size of the dataset and the
computational cost, by removing irrelevant and un-
necessary data like outliers, but also by improving
accuracy, as well as understanding deeper the struc-
ture of the model or the data, [10]. Therefore, fea-
ture selection is considered necessary when dealing
with big data cases. The purpose of feature selection
algorithms is to create a subset of features from the
initial one (the whole dataset), based on some crite-
ria. Feature selection methods are divided into three
main categories. Filter methods, wrapper methods
and embedded methods. Filter methods depending
on the general characteristics of the data, without in-
cluding any learning algorithm. These methods use
some dependency measures such as mutual informa-
tion, [11], conditional mutual information, [12], Pear-
son’s correlation, [13], in order to examine the rela-
tionships between the features and to create the best
possibly feature subset. These methods are relatively
resistant to overfitting, but they often do not provide
the optimal subset of features. However, due to the
fact that the computational cost of these methods is
much lower compared to the others, it is widely used
by researchers when dealing with big data. Wrap-
per methods, [14], [15], require a predefined learn-
ing algorithm and uses its performance to evaluate
and determine which features will be selected. Thus,
for each new subset of features, the wrapper method
needs to satisfy a predefined hypothesis or classifier.
It tends to find features that fit better to the predefined
learning algorithm, which results in very good learn-
ing performance, but at a higher computational cost.
Although this method is really efficient and achieves
high performance, its computational cost makes it
prohibitive for large scale datasets (big data), because
the classifier needs to be trained plenty of times. The
embedded methods, [16], [17], [18], combine the
advantages of both filter and wrapper methods. In
this method, the algorithm learns which features con-
tribute best to the accuracy of the model, during the
process of the creation of the model. Although they
usually have lower computational cost than wrapper
methods, they are much slower than filter methods
and in addition, the selected features highly depend
on the learning machine.
In this paper, a wrapper feature selection method
will be used. As it was mentioned earlier, wrapper
methods lead to really good learning performances,
but at high computational cost. Parallelization can re-
duce the computational cost, without compromising
the efficiency of the methods.Recently, in feature se-
lection, different parallelization techniques have been
proposed and have been used for high dimensional
cases involving large amount of data, such as can-
cer classification, [19], and wind forecasting, [20]. In
order to overcome the limitations related to the in-
crease of computation time while using feature se-
lection methods, different approaches have been pro-
posed. Such examples are, a novel hybrid parallel
implementation that uses MPI and OpenMP for fea-
ture selection on distributed-memory clusters, [21],
a parallel heterogeneous ensemble feature selection
based on multi-core architectures (apply parallelism
to multi-core in CPU along with GPU), [22], and a
framework QCFS which exploiting the capabilities
of modern CPU’s, by executing in parallel, on four
cores, the feature selection process, [23]. Finally, a
feature selection library has been proposed, [24], in
order to help with the acceleration of the feature se-
lection process. This library includes seven methods,
which follow a hybrid MPI/multithreaded approach.
In this paper, different variations of parallelization of
feature selection algorithms based on k-means will
be examined, in order to reduce computational cost,
to quantify the reduction in time resulting from each
method and finally, to select the most effective one.
The rest of the paper is organized as follows: In chap-
ter 2 basic elements of theory, as well as the methodol-
ogy followed are presented. In Chapter 3, the Datasets
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
77
Volume 20, 2023
are presented, in Chapters 4, 5 a simulation study for
various datasets and a case study are performed re-
spectively, while chapter 7 presents the conclusions.
2 Methodology
The main purpose of this paper is to compare different
parallelized variations of a wrapper feature selection
algorithm (FSS) with each other and in serial imple-
mentations, both in terms of efficiency and execution
time. As classifier, k-means algorithm and its variant
are selected, while as evaluation functions two simi-
larity metrics (CRI and Accuracy) are used.
2.1 Classification algorithms
2.1.1 k-means
k-means, [25], is a simple unsupervised learning algo-
rithm, which is widely used in clustering problems.
Given a fixed a priori number of k clusters, the al-
gorithm sets k centroids, one for each cluster. Then,
the data objects of a given dataset are divided into
smaller classes, satisfying a minimum distance crite-
rion. Specifically, for each object, the distance with
all centroids is calculated and the objects are classi-
fied in the centroid class with which the respective
object had the minimum distance. Then, by calculat-
ing the mean of each class, the new centroids emerge.
The process is then repeated, until there are no more
changes in the resulting centroids.
2.1.2 k-meansRA
k-meansRA clustering method, [26], is similar to the
k-means clustering method, but with a significant dif-
ference at the first step. In contrast with the k-means
clustering method, which select randomly a defined
number of k centroids from the whole dataset and then
the whole process is applied, k-meansRA assigns the
patterns of the dataset randomly to k classes and the
centroids are calculated after the initial random as-
signments. The rest of the process is similar to the
simple k-means algorithm and the best solution re-
sults from the set for which the sum of the distances
of the patterns from the respective centroids is min-
imized. Although the differences between the meth-
ods are very small and differ only in the first step, ac-
cording to [26] the k-meansRA method performs bet-
ter and at a lower computational cost compared to the
others.
2.2 Classification efficiency
In order to measure the classification performance,
different metrics have been widely used. Accuracy,
CRI (corrected rand index), [27], log-loss, [28], and
AUC-ROC, [29], are some of the most popular met-
rics. In this paper, Accuracy and CRI will be used.
2.2.1 CRI
The Corrected Rand Index (CRI) is a standard metric
for the classification performance. With this metric,
two partitions can be compared and quantify the sim-
ilarity between them. Let’s suppose that
CRI =R
i=1 C
j=1 (kij
2)(K
2)1R
i=1 (ki.
2)C
j=1 (k.j
2)
1
2[R
i=1 (ki.
2)+C
j=1 (k.j
2)](K
2)1R
i=1 (ki.
2)C
j=1 (k.j
2)
,
where kij is the number of objects in the i-cluster of
the first partition and the j-cluster of the second, ki. is
the number of objects in the i-cluster of the first parti-
tion and k.j is the number of objects in the j-cluster of
the second partition. CRI ranges from -1 to 1, where
1 indicates exact agreement of the two partitions, val-
ues near zero indicate random agreement and negative
values indicate disagreement. In our computations of
CRI, the first partition regards the known classes.
The range of the values that CRI can take is from -1
to 1. A CRI value equal to 1 means an absolute match
between the two partitions in terms of classification,
while -1 indicates a complete disagreement. Values
close to zero show randomness.
2.2.2 Accuracy
Accuracy is a widely used metric for evaluating clas-
sification performance. It is a really simple metric and
specifically it is the ratio of number of the correct pre-
dictions to the total number of samples.
accuracy =C
T
where C is the number of correct predictions and T
is the total number of the initial samples. For binary
classification, it can easily be transformed to a frac-
tion from confusion matrix terms
accuracy =T P +T N
T P +T N +F P +F N
where:
TP (True Positive) represents the number of true
positive instances,
FP (False Positive) represents the number of false
positive instances,
FN (False Negative) represents the number of
false negative instances and
TN (True Negative) represents the number of true
negative instances.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
78
Volume 20, 2023
2.3 FSS
In order to reduce the dimensionality and increase
the efficiency of the clustering, a wrapper feature se-
lection is used. Forward Sequential Selection (FSS),
[30], [31], is a widely used algorithm that can reduce
the process time. The subset of features is built in pro-
gressively, adding one feature to each “circle”. First,
the feature which has the maximum similarity with
the class is found, based on the evaluation function
(CRI or Accuracy). Then, in each circle, the feature
for which the evaluation function value is better from
both all the other features and of the maximum value
of the previous circle is added. The algorithm stops
when the termination criterion is satisfied, given a
predefined threshold. When the value of the evalu-
ation function does not increase enough with the ad-
dition of a new feature, or even more, if it reduces its
performance, then the algorithm stops and as an opti-
mal feature subset is selected the one from the previ-
ous circle of the algorithm. The FSS algorithm reads
as follows:
1. Set l= 1 and find the best single feature cluster-
ing, i.e., Sl={q(l)}, where
q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.
2. For l > 1, compute clusterings for the feature
subsets Si
l=Sl1qi, where qi∈ Sl1and
find the one with the largest CRI, i.e., q(l)=
arg maxqi∈Sl1CRI(Si
l).
3. If T=CRI(S(l)
l)CRI(Sl1)
CRI(Sl1)> threshold
(here 0.001), then Sl=S(l)
l,l=l+ 1, and go to
step 2, otherwise stop.
2.4 Parallelization
To reduce the time of the aforementioned pro-
cess, parallelization variations are presented and per-
formed, using different parallelization functions of
matlab such as parfor and spmd.The aim is to find
which parallelization method (depending on which
part of the algorithm it was applied to) is the most ef-
ficient compared to others, in terms of lower compu-
tational cost. The parallelization techniques are per-
formed both to the feature selection process and also
to the k-means classifier.
2.4.1 MATLAB parallelization functions
In this paper the parfor and the spmd of Matlab com-
mands will be used. Spmd is a parallel region, while
parfor is a parallel for loop. The spmd command
offers more flexibility, because it can be used both
in loops as well as it can operate on distributed ar-
rays and vectors. Spmd allows a piece of code to be
defined and run simultaneously on multiple workers,
while parfor is a parallelization technique for loops
with several restrictions.
2.4.2 Parallelization variations
fssPkmeansRA In this variant, the entire FSS
process is parallelized using the parfor matlab com-
mand and as classifier, the serial k-meansRA cluster-
ing algorithm is used.
1. Set l= 1, parallelize the loop and find the
best single feature clustering, using the serial k-
meansRA as classifier, i.e., Sl={q(l)}, where
q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.
2. For l > 1, parallelize the loop and compute clus-
terings using k-meansRA as classifier for the fea-
ture subsets Si
l=Sl1qi, where qi∈ Sl1
and find the one with the largest CRI, i.e., q(l)=
arg maxqi∈Sl1CRI(Si
l).
3. If T=CRI(S(l)
l)CRI(Sl1)
CRI(Sl1)> threshold
(here 0.001), then Sl=S(l)
l,l=l+ 1, and go to
step 2, otherwise stop.
fsskmeansRAP As fsskmeansRAP, the serial
FSS algorithm is mentioned, which uses as classifier
the parallelized k-meansRA with the parfor command.
1. Set l= 1. By using the serial loop, parallelize
the k-meansRA classifier (here, with the parfor
command for MATLAB) and find the best single
feature clustering, i.e., Sl={q(l)}, where
q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.
2. For l > 1, use the serial loop and compute
clusterings using parallel k-meansRA as classifier
(here, with the parfor command for MATLAB)
for the feature subsets Si
l=Sl1qi, where
qi∈ Sl1and find the one with the largest CRI,
i.e., q(l)=arg maxqi∈Sl1CRI(Si
l).
3. If T=CRI(S(l)
l)CRI(Sl1)
CRI(Sl1)> threshold
(here 0.001), then Sl=S(l)
l,l=l+ 1, and go to
step 2, otherwise stop.
fsskmeansRASPMD The fsskmeansRASPMD
is similar to fsskmeansRAP, but the spmd command
is used instead of parallelizing the k-meansRA clus-
tering algorithm with the parfor command.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
79
Volume 20, 2023
1. Set l= 1. By using the serial loop, parallelize the
k-meansRA classifier (here, with the spmd com-
mand for MATLAB) and find the best single fea-
ture clustering, i.e., Sl={q(l)}, where
q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.
2. For l > 1, use the serial loop and compute
clusterings using parallel k-meansRA as classifier
(here, with the spmd command for MATLAB)
for the feature subsets Si
l=Sl1qi, where
qi∈ Sl1and find the one with the largest CRI,
i.e., q(l)=arg maxqi∈Sl1CRI(Si
l).
3. If T=CRI(S(l)
l)CRI(Sl1)
CRI(Sl1)> threshold
(here 0.001), then Sl=S(l)
l,l=l+ 1, and go to
step 2, otherwise stop.
fsskmeans fsskmeans is the serial implementa-
tion of FSS process, using the simple serial k-means
clustering algorithm as classifier, [32]. This varia-
tion is used for comparison with the examined parallel
variations.
1. Set l= 1 and find the best single feature cluster-
ing using k-means as classifier, i.e., Sl={q(l)},
where
q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.
2. For l > 1, compute clusterings using k-means as
classifier for the feature subsets Si
l=Sl1qi,
where qi∈ Sl1and find the one with the largest
CRI, i.e., q(l)=arg maxqi∈Sl1CRI(Si
l).
3. If T=CRI(S(l)
l)CRI(Sl1)
CRI(Sl1)> threshold
(here 0.001), then Sl=S(l)
l,l=l+ 1, and go to
step 2, otherwise stop.
fsskmeansRA Similar to fsskmeans, fsskmean-
sRA is the serial implementation of the FSS process,
using as classifier a variation of k-means clustering
algorithm, the k-meansRA.
1. Set l= 1 and find the best single feature clus-
tering using k-meansRA as classifier, i.e., Sl=
{q(l)}, where
q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.
2. For l > 1, compute clusterings using k-meansRA
as classifier for the feature subsets Si
l=Sl1qi,
where qi∈ Sl1and find the one with the largest
CRI, i.e., q(l)=arg maxqi∈Sl1CRI(Si
l).
3. If T=CRI(S(l)
l)CRI(Sl1)
CRI(Sl1)> threshold
(here 0.001), then Sl=S(l)
l,l=l+ 1, and go to
step 2, otherwise stop.
3 Dataset
This study is applied to four different datasets. The
datasets are derived from regression systems from
[12]. The study is performed for binary classifica-
tion for Datasets 1,2,3 and for five classes for Dataset
4. The variable ygives the classes after discretiza-
tion. For Datasets 1,2,3, each element of the variable
ygoes to the first class if its value is below zero and
respectively all the other elements whose values are
equal to or greater than zero, go to the second class.
The features fiare random variables. The systems are
generated for 33,50 and 100 features of 10000 sam-
ples each. For Dataset 4 (Mackey Glass differential
equation), the system is generated for 365 features of
1000 samples each and concerns the classification to
five different classes (regimes).
3.1 Dataset 1
The Dataset 1 resulting from
y=κy1+ (1 κ)y2,
where y1and y2are:
y1=β1f1+β2f2+e1,
y2=β3f3+β4f4+β5f5+e2
where κis a weighted parameter (κ= 0.5),fi=
1, ..., 5, are the predictors of y1and y2and conse-
quently are relevant features to the class variable C,
while e1and e2are standard normal random vari-
ables.The coefficients are β1=3,β2= 2,β3= 3,
β4= 2 and β5=4. This dataset is generated for
a total of 33,50 and 100 features. Beyond the five in-
dependent features the rest are irrelevant to C, while
they are identically correlated to each other. The fea-
tures are drawn from a multivariate normal distribu-
tion with zero mean and unit variance for each feature
component and cross-correlation r = 0.5 for all feature
pairs.
3.2 Dataset 2
y=β1f1+β2f2+β3f3+β4f4+e,
where:
f1=x1,
f2=x2,
f3=α1x1+α2x2+α3x3,
f4=α4x2
1+α5x2
2
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
80
Volume 20, 2023
where: fi,i= 1, ..., 4are the predictors of the class
variable y. The features (fi) are related to xi,i=
1,2,3which are random independent standard normal
variables. The coefficients are α1= 0.2,α2= 0.3,
α3= 2,α4= 0.1,α5= 0.1,β1= 1,β2= 1,
β3= 0.2,β4= 0.3. This dataset is also generated
for a total of 33,50 and 100 features. Beyond the four
independent features, the rest are irrelevant to C.
3.3 Dataset 3
y=β1f1+β2f2+β3f3+β4f4+βbf5+β6f6+e,
where:
f1=x1,
f2=x2,
f3=x1x2,
f4=x3,
f5=x2
4,
f6=x1x5
The class variable y consists of six features fi,
i= 1, ..., 6which are functionally related to xi,
i= 1, ..., 4(random independent standard variables)
and x5which is independent to the rest of xiand fol-
lows the distribution of the product of two random
variables (product distribution). The βiare setted as
1
σi,i= 1, ..., 6where σiis the standard deviation of
fi. Consequently, all βicontribute the same to y.
3.4 Dataset 4
˙x=dx
dt =0.2x(t∆)
1 + [x(t∆)]10 0.1x(t)
The dataset 4 is is based on the Mackey-Glass
delay differential equation chaotic system. Differ-
ent measures computed on different regimes, in or-
der to configure the classes to be identified. For
equals to 17, the system becomes chaotic and as
the increases, so does the complexity. In terms
of complexity, two similar settings are considered.
One for =120,140,...,200 and the second one for
= 110,130,...,190. Each regime is derived for 200
samples of the time series length of 1000. 365 dif-
ferent measures are computed on each time series,
which are considered as the features of this dataset.
The features have different relations with each other.
Some are strongly correlated, while others are irrel-
evant. For this dataset, the true relevant features for
the classification of the five dynamical regimes aren’t
known. The challenge in this dataset is to select the
optimal feature subset that discriminame better the
five chaotic regimes.
4 Results
In this paper,three Datasets (1, 2, 3) are generated
for 33,50 and 100 features of 10000 samples each,
while Dataset 4 generated for 365 different metrics
(features) of 1000 samples. Based on these Datasets,
different parallelized variations of FSS are compared
in terms of classification accuracy and execution time.
For Datasets 1, 2 and 3 the binary classification is ex-
amined, while for Dataset 4, the study is performed
for five classes. During the feature selection process
the CRI used as an evaluation function, as it is more
robust. When the optimal subset of features has been
selected, the performance of the algorithm is evalu-
ated with the accuracy metric. The study is performed
for 10 iterations and the results correspond to the av-
erage of execution time and accuracy.
The hardware that was used for the evaluation of
the algorithms, was a computing system with 12th
Gen Intel(R) Core(TM) i7-12700 2.10GHz and 16GB
RAM. The operating system was Windows 11 Pro and
the SSD was 1000GB.
4.1 Results for Dataset 1
In Table 1 the results of the examined variations for
a total of 33,50 and 100 features are illoustrated. The
processes repeated for 10 iterations. frefer the se-
lected features, tcorresponds to the average execu-
tion time, while ac refer to the average values of the
Accuracy similarity measure.
Table 1: Results for Dataset 1
Number of Features 33 50 100
ftac ftac ftac
fsskmeans 5,1 60.61 0.64 5,1 92.51 0.64 5,1 180.63 0.64
fsskmeansRA 5,1 23.43 0.64 5,1 35.37 0.64 5,1 67.62 0.64
fssPkmeansRA 5,1 2.95 0.64 5,1 4.30 0.64 5,1 7.70 0.64
fsskmeansRAP 5,1 26.03 0.64 5,1 36.86 0.64 5,1 76.49 0.64
fsskmeansRASPMD 5,1 31.88 0.64 5,1 47.65 0.64 5,1 92.29 0.64
The results for Dataset 1 (Table 1) suggest that all
the examined variations are equally reliable in terms
of the selected features and the accuracy. For all the
examined sets of features (33,50,100), all methods se-
lect features 1 and 5 as the optimal subset. For this
Dataset, only the features, fi, i=1,...,5 are relevant to
the class variable, while the rest of the features that
are included in the Dataset, are irrelevant to the class
variable. The methods are also equal in classifica-
tion accuracy, as they all achieve about 64%. How-
ever, there is a significant difference at the execution
time. The fssPkmeansRA is significantly faster than
the other examined methods. Specifically, on aver-
age, when the initial dataset consists of 33,50 or 100
features, it is respectively 7.9, 8.2 and 8.8 times faster
than the second one (fsskmeansRA). Additionally,
compared to the simple serial implementation of fss
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
81
Volume 20, 2023
using the simple k-means as classifier (fsskmeans),
the fssPkmeansRA is on average 21.8 times faster.
It is worth mentioning that, when comparing the
two serial methods, the fsskmeansRA employing k-
meansRA as classifier, is considerably quicker than
the one using the conventional k-means as classifier
(fsskmeans). In particular, when the Dataset con-
sists of 33 features, fsskmeans needs 60.61 seconds
on average, in order to stop the execution of the algo-
rithm, while fsskmeansRA needs only 23.43 seconds.
The same seems to apply when the Dataset consists
of 50 or 100 features, where fsskmeans needs 92.51
and 180.63 seconds respectively, while fsskmeansRA
needs 35.37 and 67.62 seconds respectively. Finally,
it is observed that for this dataset, the serial implemen-
tation which uses k-meansRA as classifier, is faster
than the parallelized one with the spmd matlab com-
mand (fsskmeansRASPMD). In terms of execution
time, the parallelized method fsskmeansRASPMD,
had the second worst performance, needed on aver-
age 57.27 seconds in order to complete the optimal
subset.
4.2 Results for Dataset 2
In Table 2 the results of the examined variations for
a total of 33,50 and 100 features are illoustrated. The
processes repeated for 10 iterations. frefer the se-
lected features, tcorresponds to the average execu-
tion time, while ac refer to the average values of the
Accuracy similarity measure.
Table 2: Results for Dataset 2
Number of Features 33 50 100
ftac ftac ftac
fsskmeans 2 49.06 0.74 270.72 0.74 2148.11 0.74
fsskmeansRA 217.36 0.74 224.42 0.74 252.30 0.74
fssPkmeansRA 22.78 0.74 23.29 0.74 26.58 0.74
fsskmeansRAP 215.54 0.74 220.15 0.74 242.05 0.74
fsskmeansRASPMD 222.18 0.74 230.57 0.74 263.17 0.74
For Dataset 2 (Table 2), the same conclusions as
for Dataset 1 are derived. The Datasets consist of 33,
50 and 100 features. Only the features fi, i=1,..,4, are
relevant to the class variable for this system, while
all the other features are irrelevant. The optimal sub-
sets consist of only one feature and specifically the
f2, which is the same for all the examined methods. It
is observed that all the methods under study have the
same accuracy, which is on average 74%, but their ex-
ecution time differs significantly (Table 2). The fssP-
kmeansRA is the fastest one, which needed on aver-
age 2.78, 3.29 and 6.58 seconds to complete the opti-
mal subsets, when the Dataset consists of 33, 50 and
100 features respectively. fssPkmeansRA is about 6
times faster than fsskmeansRAP, which is the second
fastest, while from the slowest one (fsskmeans), the
fssPkmeansRA is about 20 times faster. The serial
FSS method which uses the parallelized k-meansRA
as classifier (fsskmeansRAP), is slightly faster than
the corresponding serial one (fsskmeansRA). When
the Dataset consists of 33 features, fsskmeansRAP
is 1.82 seconds faster than the serial fsskmeansRA.
The same seems to apply when the Dataset consists
of 50 and 100 features, where fsskmeansRAP is faster
than the serial fsskmeansRA, 4.27 and 10.25 sec-
onds respectively. Comparing the two serial methods
fsskmeans and fsskmeansRA, similar to Dataset 1, the
fsskmeansRA is about 2.82, 2.89 and 2.83 times faster
than the serial one (fsskmeans), when the Dataset con-
sists of 33, 50 and 100 features respectively. The se-
rial FSS method, which utilize the parallelized with
the spmd method k-meansRA as classifier, although it
is equal to the other methods in terms of accuracy, it is
the second worse in terms of execution time, needed
38.64 seconds on average in order to complete the fea-
ture selection process.
4.3 Results for Dataset 3
Dataset 3 is considered a little more complicated than
the others. Similarly to Datasets 1 and 2, the true rel-
evant features with the class variable are known (fi,
i=1,..,6). It is observed that contrary to Datasets 1 and
2, the optimal feature subsets obtained from the exam-
ined methods are different. Specifically, the methods
that utilize k-meansRA as classifier, select two fea-
tures for the optimal subset, particularly f4and f2,
which are both true relevant features to the class vari-
able (Table 3). On the contrary, the serial FSS us-
ing the simple k-means as classifier (fsskmeans), se-
lects the feature f4for the optimal subset, but also
adds the feature f21, which is random normal variable
and irrelevant to the class variable y. The accuracy of
the methods that use k-meansRA as classifier is about
67%, while for fsskmeans is about 64%. Similar to
the aforementioned examples (Sections 4.1 and 4.2),
the fssPkmeansRA is the fastest method and specifi-
cally, it is on average 4.43 times faster than the second
fastest (fsskmeansRAP) and 11.4 times faster than the
serial implementation of fss, using the simple k-means
as classifier (fsskmeans). The fsskmeans had the
worst performance both in terms of accuracy and exe-
cution time.In order to complete the feature selection
process, fsskmeans needed 43.73, 65.45 and 150.20
seconds on average, when the Datasets consist of 33,
50 and 100 features respectively. The fsskmeansRAP
method which uses the MATLAB parfor command in
order to parallelize the k-meansRA classifier, is shown
to be faster than the corresponding one that uses the
spmd MATLAB command (fsskmeansRASPMD), re-
gardless the size of the Dataset. Worth noting that for
this Dataset (but also for Dataset 4 and Case study),
the fsskmeans algorithm was adding more features to
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
82
Volume 20, 2023
the optimal subset, but we stopped it to two, in or-
der to compare the execution time between the algo-
rithms. Specifically, for this Dataset, fsskmeans nor-
mally added also the features f7and f12 to the opti-
mal subset, which are also random normal variables
and irrelevant to the class variable y.
In Table 3 the results of the examined variations
for a total of 33,50 and 100 features are illoustrated.
The processes repeated for 10 iterations. frefer the
selected features, tcorresponds to the average execu-
tion time, while ac refer to the average values of the
Accuracy similarity measure.
Table 3: Results for Dataset 3
Number of Features 33 50 100
ftac ftac f t ac
fsskmeans 4,21 43.73 0.64 4,21 65.45 0.64 4,21 150.20 0.64
fsskmeansRA 4,2 24.78 0.67 4,2 36.34 0.67 4,2 89.58 0.67
fssPkmeansRA 4,2 4.87 0.67 4,2 5.55 0.67 4,2 11.28 0.67
fsskmeansRAP 4,2 18.26 0.67 4,2 25.90 0.67 4,2 55.05 0.67
fsskmeansRASPMD 4,2 29.25 0.67 4,2 43.07 0.67 4,2 96.41 0.67
4.4 Results for Dataset 4
For this dataset, the classification problem is to dis-
criminate five highly complex regimes. In Table 4
the results of the examined variations for the Mackey-
Glass delay differential equation chaotic system are
presented. frefer to the selected features, tcor-
responds to the average execution time and ac re-
fer to the average values of the similarity measure
Accuracy. For each case we consider five differ-
ent regimes. For 1case we considered the values
of = 110,130, ..., 190 and for 2we considered
= 120,140, ..., 200.
Table 4: Results for Dataset 4
Setting 12
f t ac f t ac
fsskmeans 54,71 108.81 0.76 54,74 107.58 0.75
fsskmeansRA 54,71 51.53 0.76 54,74 48.57 0.75
fssPkmeansRA 54,71 7.68 0.76 54,74 7.07 0.75
fsskmeansRAP 54,71 64.98 0.76 54,74 60.16 0.75
fsskmeansRASPMD 54,71 110.92 0.76 54,74 102.80 0.75
For this data set, the actual relevant features for
the classification of the five dynamic regimes are un-
known, in contrast to Datasets 1,2 and 3. However
the results (Table 4) are quite similar to the previ-
ous Datasets in terms of accuracy and computational
time. The fastest method is fssPkmeansRA, which
needed on average 7.68 seconds to select the opti-
mal feature subset for the setting 1(five regimes for
= 110,130, ..., 190) and 7.07 seconds for the set-
ting 2(∆ = 120,140, ..., 200). It is also faster than
fsskmeansRA (second fastest) 6.70 and 6.87 times
on average, for the settings 1and 2respectively.
Compared to the serial implementation fsskmeans,
fssPkmeansRA is faster 14.17 times for the setting 1
and 15.21 times for 2. In terms of Accuracy, all al-
gorithms are equal and achieve an average classifica-
tion accuracy of 76% for 1and 75% for 2. For the
setting 2, the common features selected by the algo-
rithms in all iterations are those presented in Table 4,
while depending on the iteration, the feature (each al-
gorithm is repeated 10 times and different results can
occur) f51 is selected two of ten times against f54.
It was also observed that fsskmeans, which uses the
simple k-means as classifier, added a lot more features
to the optimal subset compared to the other methods
using k-meansRA as classifier. We stopped the al-
gorithm when the optimal subset included two fea-
tures, in order to compare the execution time with the
other examined algorithms. The extra features that
fsskmeans added to the optimal subset, lead to an ex-
tra computational cost, but it also seems to achieve
better classification accuracy, since the Accuracy be-
comes slightly better than the other methods (0.77
for 1and 0.79 for 2). The optimal feature subset
that best discriminates the five chaotic regimes is un-
known, and the algorithms selected a feature set that
best served this purpose.
5 Case study
In order to evaluate the performance of the examined
methods, a study in real world dataset was carried
out. The study was performed to the Human Activity
Recognition using Smartphones Dataset, which car-
ried out with a group of 30 volunteers within an age
range of 19-48 years and each person performed six
activities such as walking, walking upstairs, walking
downstairs, sitting, standing, laying, wearing a smart-
phone on the waist. The Dataset consists of 561 fea-
tures and 10299 instances. The true relevant features
with the class variable aren’t known. The study was
performed similarly as before, but for real world data.
Similar to Dataset 4, the problem doesn’t concern bi-
nary classification (6 classes here) and the Accuracy
was performed in order to evaluate the classification
accuracy of the different examined methods. The re-
sults of the examined variations for the Human Activ-
ity Recognition Using Smartphones Dataset are pre-
sented in Table 5. frefer to the selected features, t
corresponds to the average execution time, while ac
refer to the average values of the similarity measure
Accuracy.
The fssPkmeansRA achieved the highest accuracy
(about 77%), while it was needed only 66.05 sec-
onds in order to complete the optimal subset. Com-
paring with the methods that use k-meansRA as clas-
sifier, fssPkmeansRA is about 2.16, 6.40 and 8.14
times faster than fsskmeansRAP, fsskmeansRA and
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
83
Volume 20, 2023
Table 5: Results of Case Study
f t ac
fsskmeans 10,42 737.56 0.71
fsskmeansRA 10,57 422.90 0.77
fssPkmeansRA 10,57 66.05 0.77
fsskmeansRAP 10,57 142.84 0.77
fsskmeansRASPMD 10,57 538.28 0.77
fsskmeansRASPMD respectively. The simple serial
implementation fsskmeans had the worst classifica-
tion accuracy (about 71%). Compared to the serial
fsskmeans, fssPkmeansRA was 11.17 times faster. It
is also observed that the methods that use k-meansRA
as classifier, select only different features com-
pared to fsskmeans. Specifically fsskmeansRA, fssP-
kmeansRA, fsskmeansRAP and fsskmeansRASPMD
select the features f10 and f57 for the optimal subset,
while the serial implementation fsskmeans selects the
features f10 and f42. It is worth noting that normally
the fsskmeans selected five or more features for the
optimal subset (f10,f42,f53,f147,f133). Similarly to
the Dataset 3 and the Dataset 4, the serial fsskmeans
algorithm was stopped when two features were se-
lected, since only a slight improvement was observed
in terms of accuracy and in order to compare it with
the other methods, in terms of the execution time. The
serial FSS method utilizing parallelized k-meansRA
with the MATLAB spmd command, needed on aver-
age 538.28 seconds in order to complete the feature
selection process, while achieving about 77% accu-
racy. Compared to the serial FSS method using the
parfor MATLAB command in order to parallelize the
k-meansRA classifier, it is observed that fsskmean-
sRASPMD is about 3.77 times slower, without im-
proving the accuracy.
In a study conducted in 2015, [33], for the same
dataset, the authors achieved classification accuracy
of 60% in 582.1 seconds using serial implementation
of k-means. As shown in Table 5, fssPkmeansRA
achieve 77% classification accuracy, while it only
takes 66.05 seconds to run, which as mentioned in
their paper, is really important in real time activity
monitoring, because it requires the model to be built
dynamically from the captured data.
6 Discussion
For Datasets with known functionally relevant fea-
tures to the class variable such as Dataset 1, Dataset
2 and Dataset 3, the fssPkmeansRA was faster than
both the serial implementations and the other exam-
ined parallelized methods, while the classification ac-
curacy was equal or even better. Such an example
concerns Dataset 3, where taking into account that
based on the design of the system, only the f1,...,f6
features are important and related to the class vari-
able, it was observed that the serial implementation
selected only f4and included feature f21 to the op-
timal subset of features. This could lead to really
bad conclusions, since for Dataset 3, beyond features
f1,...,f6, all the other features are random normal vari-
ables and irrelevant to the class variable y. It is there-
fore understood that the fssPkmeansRA, contrary to
the other examined methods, could dramatically re-
duce the processing time, without compromising the
efficiency.
For datasets where the actual relevant features are
unknown, the serial implementation fsskmeans, ap-
pears slightly more accurate in terms of classification
accuracy, when more features were added to the opti-
mal subset (but at a high computational cost) and less
accurate when the comparison is made for the same
size of the optimal subset.
For all the examined Datasets, the fsskmean-
sRASPMD that use spmd MATLAB command in
order to parallelize the k-meansRA classifier, was
slower even from the corresponding serial one
(fsskmeansRA), which indicates that this method
might not be appropriate for these systems.
In general, the proposed fssPkmeansRA variant,
where the whole feature selection process is paral-
lelized, appeared to be the fastest feature selection
method compared to the rest of the examined meth-
ods, but also with high accuracy rates, proving that it
could also be used in on-line processes for reducing
the dimensional space in clustering and classification
problems.
7 Conclusion
In this work, different parallelization variations of
feature selection algorithms using k-means as classi-
fier were proposed and examined, in order to evaluate
the significance of the parallelization process in terms
of computational cost and classification accuracy, as
well as to emphasize on the importance of the meth-
ods and the tools that will be chosen for the paral-
lelization to be occured.Specifically, different meth-
ods such as fsskmeans and fsskmeansRA was em-
ployed, in order to examine the importance of the
choice of classifier, since fsskmeans uses the sim-
ple k-means and fsskmeansRA uses the k-meansRA.
The fssPkmeansRA and fsskmeansRAP were com-
pared, in order to examine the significance of se-
lecting which part of the algorithm to parallelize,
since in the fssPkmeansRA the whole feature selec-
tion process was parallelized, while in fsskmeansRAP
only the classifier was parallelized. The fsskmean-
sRASPMD was employed for comparing different
parallelization MATLAB techniques such as parfor
and spmd, when all the other parameters remained the
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
84
Volume 20, 2023
same. The results suggest that in order to maximize
accuracy and minimize the execution time, it is cru-
cial to determine carefully the parameters for both the
classifier to be used and the part of the algorithm that
will be parallelized. At least for the Datasets that used
in this study, the fssPkmeansRA considered as the
most reliable both in terms of accuracy and execution
time. In the fssPkmeansRA the whole feature selec-
tion process was parallelized using the parfor MAT-
LAB command and the k-meansRA was employed as
classifier.
Further research would involve a different kind of
parallelization, not only on the CPU but also on the
GPU. A comparison of execution time and accuracy
with other feature selection methods, using different
classifiers, could also be performed to further exam-
ine the importance of parallelization.
8 Declarations
All authors declare that they have no conflicts of in-
terest.
References:
[1] S. Mittal, M. Shuja, and M. Zaman, “A review of
data mining literature,” IJCSIS, vol. 14, pp. 437–
442, 2016.
[2] J. A. Hartigan and M. A. Wong, “Algorithm as
136: A k-means clustering algorithm,” Journal
of the Royal Statistical Society. Series C (Ap-
plied Statistics), vol. 28, no. 1, pp. 100–108,
1979.
[3] M. Capo, A. Pérez, and J. Lozano, “An efficient
k-means algorithm for massive data,” 2016.
[4] M. Omran, A. Engelbrecht, and A. Salman, “An
overview of clustering methods,” Intell. Data
Anal., vol. 11, pp. 583–605, 2007.
[5] X.-D. Wang, R.-C. Chen, F. Yan, Z.-Q. Zeng,
and C.-Q. Hong, “Fast adaptive k-means sub-
space clustering for high-dimensional data,”
IEEE Access, vol. 7, pp. 42639–42651, 2019.
[6] R. Chen, C. Dewi, S. Huang, and R. Caraka,
“Selecting critical features for data classification
based on machine learning methods,” Journal
Of Big Data, vol. 7, p. 26, 2020.
[7] R. Shang, J. Chang, L. Jiao, and Y. Xue,
“Unsupervised feature selection based on self-
representation sparse regression and local simi-
larity preserving,” International Journal of Ma-
chine Learning and Cybernetics, vol. 10, 2019.
[8] S. Shekhar, N. Hoque, and K. Bhattacharyya,
“Pknn-mifs: A parallel knn classifier over an op-
timal subset of feature,” Intelligent Systems with
Applications, vol. 14, p. 200073, 2022.
[9] I. Guyon and A. Elisseeff, An Introduction to
Feature Extraction, vol. 207, pp. 1–25. 2008.
[10] C. C. Aggarwal and C. K. Reddy, eds., Data
Clustering: Algorithms and Applications. CRC
Press, 2014.
[11] C. Ding and H. Peng, “Peng, h.: Minimum
redundancy feature selection from microarray
gene expression data. journal of bioinformat-
ics and computational biology 3(2), 185-205,”
Journal of bioinformatics and computational bi-
ology, vol. 3, pp. 185–205, 2005.
[12] A. Tsimpiris, I. Vlachos, and D. Kugiumtzis,
“Nearest neighbor estimate of conditional mu-
tual information in feature selection,” Expert
Systems with Applications, vol. 39, p. 12697–
12708, 2012.
[13] H. Chen and X. Chang, “Photovoltaic power
prediction of lstm model based on pearson
feature selection,” Energy Reports, vol. 7,
pp. 1047–1054, 2021. 2021 International Con-
ference on Energy Engineering and Power Sys-
tems.
[14] J. Maldonado, M. Riff, and B. Neveu, “A re-
view of recent approaches on wrapper feature
selection for intrusion detection,” Expert Sys-
tems with Applications, vol. 198, p. 116822,
2022.
[15] K. Bouzoubaa, Y. Taher, and B. Nsiri, “Pre-
dicting dos-ddos attacks: Review and evalua-
tion study of feature selection methods based
on wrapper process,” International Journal of
Advanced Computer Science and Applications,
vol. 12, 2021.
[16] X. Zhang, G. Wu, Z. Dong, and C. Craw-
ford, “Embedded feature-selection support vec-
tor machine for driving pattern recognition,”
Journal of the Franklin Institute, vol. 352, no. 2,
pp. 669–685, 2015. Special Issue on Control and
Estimation of Electrified vehicles.
[17] M. Zhu and J. Song, “An embedded backward
feature selection method for mclp classifica-
tion algorithm,” Procedia Computer Science,
vol. 17, pp. 1047–1054, 2013. First Interna-
tional Conference on Information Technology
and Quantitative Management.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
85
Volume 20, 2023
[18] N. Mahendran and P. Vincent, “A deep learn-
ing framework with an embedded-based feature
selection approach for the early detection of the
alzheimers disease,” Computers in Biology and
Medicine, vol. 141, p. 105056, 2022.
[19] L. Venkataramana, S. Jacob, and R. Rama-
doss, “A parallel multilevel feature selection
algorithm for improved cancer classification,”
Journal of Parallel and Distributed Computing,
vol. 138, 2019.
[20] Q. L., J. W., and H. Z., “A wind speed interval
forecasting system based on constrained lower
upper bound estimation and parallel feature se-
lection,” Knowledge-Based Systems, vol. 231,
p. 107435, 2021.
[21] J. González-Domínguez, V. Bolón-Canedo,
B. Freire Castro, and J. Touriño, “Parallel fea-
ture selection for distributed-memory clusters,”
Information Sciences, vol. 496, 2019.
[22] N. Hijazi, H. Faris, and I. Aljarah, “A paral-
lel metaheuristic approach for ensemble feature
selection based on multi-core architectures,”
Expert Systems with Applications, vol. 182,
p. 115290, 2021.
[23] H. Kizilöz and A. Deniz, “An evolutionary
parallel multiobjective feature selection frame-
work,” Computers and Industrial Engineering,
vol. 159, p. 107481, 2021.
[24] B. Beceiro, J. González-Domínguez, and
J. Touriño, “Parallel-fst: A feature selection
library for multicore clusters,” Journal of
Parallel and Distributed Computing, vol. 169,
pp. 106–116, 2022.
[25] J. B. MacQueen, “Some methods for classifica-
tion and analysis of multivariate observations,”
vol. 1, pp. 281–297, 1967.
[26] D. Varsamis and A. Tsimpiris, “Parallel imple-
mentations of k-means in matlab,” Contempo-
rary Engineering Sciences, vol. 13, pp. 359–
366, 2020.
[27] L. Hubert and P. Arabie, “Comparing parti-
tions,” Journal of Classification, vol. 2, 1985.
[28] C. E. Shannon, “A mathematical theory of com-
munication,” The Bell System Technical Jour-
nal, vol. 27, no. 3, pp. 379–423, 1948.
[29] J. Hanley and B. Mcneil, “The meaning and use
of the area under a receiver operating character-
istic (roc) curve,” Radiology, vol. 143, pp. 29–
36, 1982.
[30] K. Fu, J.-C. Simon, A. Checroun, C. Roche,
E. Coffman, and J. Eve, “Sequential methods
in pattern recognition and machine learning,”
Comptes Rendus Hebdomadaires des Séances
de l’Académie des Sciences, Série A, 1971.
[31] D. W. Aha and R. L. Bankert, A Compara-
tive Evaluation of Sequential Feature Selection
Algorithms, pp. 199–206. New York, NY:
Springer New York, 1996.
[32] A. Tsimpiris and D. Kugiumtzis, “Feature selec-
tion for classification of oscillating time series,”
Expert Systems, vol. 29, no. 5, pp. 456 477,
2012.
[33] M. Yamin and G. Chetty, “Intelligent human ac-
tivity recognition scheme for e health applica-
tions,” Malaysian Journal of Computer Science,
2015.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.10
Nikolaos Papaioannou, Alkiviadis Tsimpiris,
Christos Talagozis, Leonidas Fragidis,
Athanasios Angeioplastis,
Sotirios Tsakiridis, Dimitrios Varsamis
E-ISSN: 2224-3402
86
Volume 20, 2023
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The authors have no conflicts of interest to declare
that are relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US