Parallel feature subset selection wrappers using k-means classifier

Abstract: - In a world where the volume of data is constantly increasing, the implementation time of various

processes increases significantly. Therefore, the proper management and the effort to reduce the dimensions

of the datasets are considered imperative. Feature selection can reduce the size of the datasets by keeping a

smaller subset, while improving the accuracy of the classification. The main purpose of this paper is to propose

and examine the efficiency of parallel feature selection wrappers based on k-means classifier. The simple k-

means algorithm and a parallel version of it are used. Different parallelization variants of feature subset selection

(fss) are presented and their accuracy and computation time are also evaluated on four different datasets. The

comparison is performed among different parallelization variations and the serial implementation of fss with the

k-means clustering algorithm. Finally, the results of the research are presented, highlighting the importance of

parallelization in reducing the execution time of the proposed algorithms.

Key-Words: - Classification, Feature selection, k-means, Parallelization, Wrappers

Received: May 9, 2022. Revised: January 17, 2023. Accepted: February 11, 2023. Published: March 9, 2023.

1 Introduction

Nowadays, the amount of data is constantly increas-

ing. Considering big data, the evolution of technol-

ogy and the internet, it is perceptible that the amount

of data is growing rapidly year by year, as well as

the diversity of the sources which produce it. Con-

sequently, there are systems and problems that con-

tain thousands of variables, which are likely to lead

to millions, or even billions of relationships with each

other. Furthermore, management and processing this

large amount of data, turns into a challenge for data

mining researchers and engineers.

Data mining is the process of extracting useful in-

formation, such as dependencies and patterns from

large amounts of data. More specifically, it helps to

draw useful conclusions and information from seem-

ingly unrelated data. Data mining consists of a set

of important steps, such as data cleaning and format-

ting, database creation and maintenance, data visual-

ization, using algorithms and features selection. Var-

ious methods are used to draw conclusions from the

data mining process. Some of these methods are re-

lated to association, prediction, classification, cluster-

ing analysis, decision trees and neural networks, [1].

Although all these methods are extremely powerful

and useful tools in the process of data mining, it is

very important to correctly distinguish which method

will be chosen to be used, depending on the specificity

of each problem and the information that is desired to

be extracted.

Clustering is an unsupervised method of pattern

recognition used in machine learning. The main goal

of this process is to better understand the underlying

information and structure of the dataset, depending

on the similarities between the data. This goal is ac-

complished by dividing the data into smaller subsets

(classes) according to the similarity of those belong-

ing to the same class and the differences from those

belonging to the other classes. Deep insights of the

dataset is not the primary concern of clustering, but it

emphasizes of data classification and considering that

it is an unsupervised method, it turns it into a very use-

ful preparatory stage before a deeper further analysis.

One of the most common clustering algorithms

which is widely used is k-means, [2]. k-means is

an unsupervised learning algorithm that defines the

computer how to group different objects based on

their attributes-features. Like any other algorithm,

NIKOLAOS PAPAIOANNOU1, ALKIVIADIS TSIMPIRIS1, CHRISTOS TALAGOZIS1,

LEONIDAS FRAGIDIS2, ATHANASIOS ANGEIOPLASTIS1, SOTIRIOS TSAKIRIDIS1,

DIMITRIOS VARSAMIS1

1Department of Computer, Informatics and Telecommunications Engineering,

International Hellenic University, Serres, GREECE

2Department of Management Science and Technology, International Hellenic University, Kavala, GREECE

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

k-means has its pros and cons. From one point of

view, it is easy to implement and to define subsets

from the original complex datasets without any other

preprocess. Most of the times it was found out that

k means algorithm is faster than hierarchical cluster-

ing and also very efficient in terms of computational

cost which is O(K*n*d) for the first iteration (assign-

ment) and O(n*d) while updating the centroids, [3].

On the other hand, although k-means and other clus-

tering methods, [4], are great at initially separating the

data into clusters, it depends heavily on the location

of the points, rather than the actual relation of the data

itself. Furthermore, it can give significantly different

results, depending on the number of clusters that are

set (k), which increases the importance of the initial

choice of the number of clusters(k). Last but not least,

it is also sensitive to the outliers.

Outliers and big data in general, could reduce the

efficiency and therefore the reliability of the k-means

algorithm, [5]. These kinds of datasets may contain

a large amount of unrelated and redundant informa-

tion, which are likely to affect the performance of the

classification algorithm. In order to avoid these prob-

lems, feature selection methods have been used, [6],

[7], [8].

Feature selection is often used as a preparatory

stage of machine learning or in conjunction with it,

[9]. The selection of the useful features is an impor-

tant step, in order to increase the efficiency of the clus-

tering and the classification algorithms. Especially,

when the analysis concerns problems of classification

and/or regression of many observable features, fea-

ture selection becomes an important stage of analy-

sis, which can reduce the size of the dataset and the

computational cost, by removing irrelevant and un-

necessary data like outliers, but also by improving

accuracy, as well as understanding deeper the struc-

ture of the model or the data, [10]. Therefore, fea-

ture selection is considered necessary when dealing

with big data cases. The purpose of feature selection

algorithms is to create a subset of features from the

initial one (the whole dataset), based on some crite-

ria. Feature selection methods are divided into three

main categories. Filter methods, wrapper methods

and embedded methods. Filter methods depending

on the general characteristics of the data, without in-

cluding any learning algorithm. These methods use

some dependency measures such as mutual informa-

tion, [11], conditional mutual information, [12], Pear-

son’s correlation, [13], in order to examine the rela-

tionships between the features and to create the best

possibly feature subset. These methods are relatively

resistant to overfitting, but they often do not provide

the optimal subset of features. However, due to the

fact that the computational cost of these methods is

much lower compared to the others, it is widely used

by researchers when dealing with big data. Wrap-

per methods, [14], [15], require a predefined learn-

ing algorithm and uses its performance to evaluate

and determine which features will be selected. Thus,

for each new subset of features, the wrapper method

needs to satisfy a predefined hypothesis or classifier.

It tends to find features that fit better to the predefined

learning algorithm, which results in very good learn-

ing performance, but at a higher computational cost.

Although this method is really efficient and achieves

high performance, its computational cost makes it

prohibitive for large scale datasets (big data), because

the classifier needs to be trained plenty of times. The

embedded methods, [16], [17], [18], combine the

advantages of both filter and wrapper methods. In

this method, the algorithm learns which features con-

tribute best to the accuracy of the model, during the

process of the creation of the model. Although they

usually have lower computational cost than wrapper

methods, they are much slower than filter methods

and in addition, the selected features highly depend

on the learning machine.

In this paper, a wrapper feature selection method

will be used. As it was mentioned earlier, wrapper

methods lead to really good learning performances,

but at high computational cost. Parallelization can re-

duce the computational cost, without compromising

the efficiency of the methods.Recently, in feature se-

lection, different parallelization techniques have been

proposed and have been used for high dimensional

cases involving large amount of data, such as can-

cer classification, [19], and wind forecasting, [20]. In

order to overcome the limitations related to the in-

crease of computation time while using feature se-

lection methods, different approaches have been pro-

posed. Such examples are, a novel hybrid parallel

implementation that uses MPI and OpenMP for fea-

ture selection on distributed-memory clusters, [21],

a parallel heterogeneous ensemble feature selection

based on multi-core architectures (apply parallelism

to multi-core in CPU along with GPU), [22], and a

framework QCFS which exploiting the capabilities

of modern CPU’s, by executing in parallel, on four

cores, the feature selection process, [23]. Finally, a

feature selection library has been proposed, [24], in

order to help with the acceleration of the feature se-

lection process. This library includes seven methods,

which follow a hybrid MPI/multithreaded approach.

In this paper, different variations of parallelization of

feature selection algorithms based on k-means will

be examined, in order to reduce computational cost,

to quantify the reduction in time resulting from each

method and finally, to select the most effective one.

The rest of the paper is organized as follows: In chap-

ter 2 basic elements of theory, as well as the methodol-

ogy followed are presented. In Chapter 3, the Datasets

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

are presented, in Chapters 4, 5 a simulation study for

various datasets and a case study are performed re-

spectively, while chapter 7 presents the conclusions.

2 Methodology

The main purpose of this paper is to compare different

parallelized variations of a wrapper feature selection

algorithm (FSS) with each other and in serial imple-

mentations, both in terms of efficiency and execution

time. As classifier, k-means algorithm and its variant

are selected, while as evaluation functions two simi-

larity metrics (CRI and Accuracy) are used.

2.1 Classification algorithms

2.1.1 k-means

k-means, [25], is a simple unsupervised learning algo-

rithm, which is widely used in clustering problems.

Given a fixed a priori number of k clusters, the al-

gorithm sets k centroids, one for each cluster. Then,

the data objects of a given dataset are divided into

smaller classes, satisfying a minimum distance crite-

rion. Specifically, for each object, the distance with

all centroids is calculated and the objects are classi-

fied in the centroid class with which the respective

object had the minimum distance. Then, by calculat-

ing the mean of each class, the new centroids emerge.

The process is then repeated, until there are no more

changes in the resulting centroids.

2.1.2 k-meansRA

k-meansRA clustering method, [26], is similar to the

k-means clustering method, but with a significant dif-

ference at the first step. In contrast with the k-means

clustering method, which select randomly a defined

number of k centroids from the whole dataset and then

the whole process is applied, k-meansRA assigns the

patterns of the dataset randomly to k classes and the

centroids are calculated after the initial random as-

signments. The rest of the process is similar to the

simple k-means algorithm and the best solution re-

sults from the set for which the sum of the distances

of the patterns from the respective centroids is min-

imized. Although the differences between the meth-

ods are very small and differ only in the first step, ac-

cording to [26] the k-meansRA method performs bet-

ter and at a lower computational cost compared to the

others.

2.2 Classification efficiency

In order to measure the classification performance,

different metrics have been widely used. Accuracy,

CRI (corrected rand index), [27], log-loss, [28], and

AUC-ROC, [29], are some of the most popular met-

rics. In this paper, Accuracy and CRI will be used.

2.2.1 CRI

The Corrected Rand Index (CRI) is a standard metric

for the classification performance. With this metric,

two partitions can be compared and quantify the sim-

ilarity between them. Let’s suppose that

CRI =∑R

i=1 ∑C

j=1 (kij

2)−(K

2)−1∑R

i=1 (ki.

2)∑C

j=1 (k.j

2[∑R

i=1 (ki.

2)+∑C

j=1 (k.j

2)]−(K

2)−1∑R

i=1 (ki.

2)∑C

j=1 (k.j

where kij is the number of objects in the i-cluster of

the first partition and the j-cluster of the second, ki. is

the number of objects in the i-cluster of the first parti-

tion and k.j is the number of objects in the j-cluster of

the second partition. CRI ranges from -1 to 1, where

1 indicates exact agreement of the two partitions, val-

ues near zero indicate random agreement and negative

values indicate disagreement. In our computations of

CRI, the first partition regards the known classes.

The range of the values that CRI can take is from -1

to 1. A CRI value equal to 1 means an absolute match

between the two partitions in terms of classification,

while -1 indicates a complete disagreement. Values

close to zero show randomness.

2.2.2 Accuracy

Accuracy is a widely used metric for evaluating clas-

sification performance. It is a really simple metric and

specifically it is the ratio of number of the correct pre-

dictions to the total number of samples.

accuracy =C

where C is the number of correct predictions and T

is the total number of the initial samples. For binary

classification, it can easily be transformed to a frac-

tion from confusion matrix terms

accuracy =T P +T N

T P +T N +F P +F N

where:

• TP (True Positive) represents the number of true

positive instances,

• FP (False Positive) represents the number of false

positive instances,

• FN (False Negative) represents the number of

false negative instances and

• TN (True Negative) represents the number of true

negative instances.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

2.3 FSS

In order to reduce the dimensionality and increase

the efficiency of the clustering, a wrapper feature se-

lection is used. Forward Sequential Selection (FSS),

[30], [31], is a widely used algorithm that can reduce

the process time. The subset of features is built in pro-

gressively, adding one feature to each “circle”. First,

the feature which has the maximum similarity with

the class is found, based on the evaluation function

(CRI or Accuracy). Then, in each circle, the feature

for which the evaluation function value is better from

both all the other features and of the maximum value

of the previous circle is added. The algorithm stops

when the termination criterion is satisfied, given a

predefined threshold. When the value of the evalu-

ation function does not increase enough with the ad-

dition of a new feature, or even more, if it reduces its

performance, then the algorithm stops and as an opti-

mal feature subset is selected the one from the previ-

ous circle of the algorithm. The FSS algorithm reads

as follows:

1. Set l= 1 and find the best single feature cluster-

ing, i.e., Sl={q(l)}, where

q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.

2. For l > 1, compute clusterings for the feature

subsets Si

l=Sl−1∪qi, where qi∈ Sl−1and

find the one with the largest CRI, i.e., q(l)=

arg maxqi∈Sl−1CRI(Si

l).

3. If T=CRI(S(l)

l)−CRI(Sl−1)

CRI(Sl−1)> threshold

(here 0.001), then Sl=S(l)

l,l=l+ 1, and go to

step 2, otherwise stop.

2.4 Parallelization

To reduce the time of the aforementioned pro-

cess, parallelization variations are presented and per-

formed, using different parallelization functions of

matlab such as parfor and spmd.The aim is to find

which parallelization method (depending on which

part of the algorithm it was applied to) is the most ef-

ficient compared to others, in terms of lower compu-

tational cost. The parallelization techniques are per-

formed both to the feature selection process and also

to the k-means classifier.

2.4.1 MATLAB parallelization functions

In this paper the parfor and the spmd of Matlab com-

mands will be used. Spmd is a parallel region, while

parfor is a parallel for loop. The spmd command

offers more flexibility, because it can be used both

in loops as well as it can operate on distributed ar-

rays and vectors. Spmd allows a piece of code to be

defined and run simultaneously on multiple workers,

while parfor is a parallelization technique for loops

with several restrictions.

2.4.2 Parallelization variations

•fssPkmeansRA In this variant, the entire FSS

process is parallelized using the parfor matlab com-

mand and as classifier, the serial k-meansRA cluster-

ing algorithm is used.

1. Set l= 1, parallelize the loop and find the

best single feature clustering, using the serial k-

meansRA as classifier, i.e., Sl={q(l)}, where

q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.

2. For l > 1, parallelize the loop and compute clus-

terings using k-meansRA as classifier for the fea-

ture subsets Si

l=Sl−1∪qi, where qi∈ Sl−1

and find the one with the largest CRI, i.e., q(l)=

arg maxqi∈Sl−1CRI(Si

l).

3. If T=CRI(S(l)

l)−CRI(Sl−1)

CRI(Sl−1)> threshold

(here 0.001), then Sl=S(l)

l,l=l+ 1, and go to

step 2, otherwise stop.

•fsskmeansRAP As fsskmeansRAP, the serial

FSS algorithm is mentioned, which uses as classifier

the parallelized k-meansRA with the parfor command.

1. Set l= 1. By using the serial loop, parallelize

the k-meansRA classifier (here, with the parfor

command for MATLAB) and find the best single

feature clustering, i.e., Sl={q(l)}, where

q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.

2. For l > 1, use the serial loop and compute

clusterings using parallel k-meansRA as classifier

(here, with the parfor command for MATLAB)

for the feature subsets Si

l=Sl−1∪qi, where

qi∈ Sl−1and find the one with the largest CRI,

i.e., q(l)=arg maxqi∈Sl−1CRI(Si

l).

3. If T=CRI(S(l)

l)−CRI(Sl−1)

CRI(Sl−1)> threshold

(here 0.001), then Sl=S(l)

l,l=l+ 1, and go to

step 2, otherwise stop.

•fsskmeansRASPMD The fsskmeansRASPMD

is similar to fsskmeansRAP, but the spmd command

is used instead of parallelizing the k-meansRA clus-

tering algorithm with the parfor command.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

1. Set l= 1. By using the serial loop, parallelize the

k-meansRA classifier (here, with the spmd com-

mand for MATLAB) and find the best single fea-

ture clustering, i.e., Sl={q(l)}, where

q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.

2. For l > 1, use the serial loop and compute

clusterings using parallel k-meansRA as classifier

(here, with the spmd command for MATLAB)

for the feature subsets Si

l=Sl−1∪qi, where

qi∈ Sl−1and find the one with the largest CRI,

i.e., q(l)=arg maxqi∈Sl−1CRI(Si

l).

3. If T=CRI(S(l)

l)−CRI(Sl−1)

CRI(Sl−1)> threshold

(here 0.001), then Sl=S(l)

l,l=l+ 1, and go to

step 2, otherwise stop.

•fsskmeans fsskmeans is the serial implementa-

tion of FSS process, using the simple serial k-means

clustering algorithm as classifier, [32]. This varia-

tion is used for comparison with the examined parallel

variations.

1. Set l= 1 and find the best single feature cluster-

ing using k-means as classifier, i.e., Sl={q(l)},

where

q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.

2. For l > 1, compute clusterings using k-means as

classifier for the feature subsets Si

l=Sl−1∪qi,

where qi∈ Sl−1and find the one with the largest

CRI, i.e., q(l)=arg maxqi∈Sl−1CRI(Si

l).

3. If T=CRI(S(l)

l)−CRI(Sl−1)

CRI(Sl−1)> threshold

(here 0.001), then Sl=S(l)

l,l=l+ 1, and go to

step 2, otherwise stop.

•fsskmeansRA Similar to fsskmeans, fsskmean-

sRA is the serial implementation of the FSS process,

using as classifier a variation of k-means clustering

algorithm, the k-meansRA.

1. Set l= 1 and find the best single feature clus-

tering using k-meansRA as classifier, i.e., Sl=

{q(l)}, where

q(l)=arg maxqiCRI({qi})and i= 1, . . . , d.

2. For l > 1, compute clusterings using k-meansRA

as classifier for the feature subsets Si

l=Sl−1∪qi,

where qi∈ Sl−1and find the one with the largest

CRI, i.e., q(l)=arg maxqi∈Sl−1CRI(Si

l).

3. If T=CRI(S(l)

l)−CRI(Sl−1)

CRI(Sl−1)> threshold

(here 0.001), then Sl=S(l)

l,l=l+ 1, and go to

step 2, otherwise stop.

3 Dataset

This study is applied to four different datasets. The

datasets are derived from regression systems from

[12]. The study is performed for binary classifica-

tion for Datasets 1,2,3 and for five classes for Dataset

4. The variable ygives the classes after discretiza-

tion. For Datasets 1,2,3, each element of the variable

ygoes to the first class if its value is below zero and

respectively all the other elements whose values are

equal to or greater than zero, go to the second class.

The features fiare random variables. The systems are

generated for 33,50 and 100 features of 10000 sam-

ples each. For Dataset 4 (Mackey Glass differential

equation), the system is generated for 365 features of

1000 samples each and concerns the classification to

five different classes (regimes).

3.1 Dataset 1

The Dataset 1 resulting from

y=κy1+ (1 −κ)y2,

where y1and y2are:

y1=β1f1+β2f2+e1,

y2=β3f3+β4f4+β5f5+e2

where κis a weighted parameter (κ= 0.5),fi=

1, ..., 5, are the predictors of y1and y2and conse-

quently are relevant features to the class variable C,

while e1and e2are standard normal random vari-

ables.The coefficients are β1=−3,β2= 2,β3= 3,

β4= 2 and β5=−4. This dataset is generated for

a total of 33,50 and 100 features. Beyond the five in-

dependent features the rest are irrelevant to C, while

they are identically correlated to each other. The fea-

tures are drawn from a multivariate normal distribu-

tion with zero mean and unit variance for each feature

component and cross-correlation r = 0.5 for all feature

pairs.

3.2 Dataset 2

y=β1f1+β2f2+β3f3+β4f4+e,

where:

f1=x1,

f2=x2,

f3=α1x1+α2x2+α3x3,

f4=α4x2

1+α5x2

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

where: fi,i= 1, ..., 4are the predictors of the class

variable y. The features (fi) are related to xi,i=

1,2,3which are random independent standard normal

variables. The coefficients are α1= 0.2,α2= 0.3,

α3= 2,α4= 0.1,α5= 0.1,β1= 1,β2= 1,

β3= 0.2,β4= 0.3. This dataset is also generated

for a total of 33,50 and 100 features. Beyond the four

independent features, the rest are irrelevant to C.

3.3 Dataset 3

y=β1f1+β2f2+β3f3+β4f4+βbf5+β6f6+e,

where:

f1=x1,

f2=x2,

f3=x1x2,

f4=x3,

f5=x2

f6=x1x5

The class variable y consists of six features fi,

i= 1, ..., 6which are functionally related to xi,

i= 1, ..., 4(random independent standard variables)

and x5which is independent to the rest of xiand fol-

lows the distribution of the product of two random

variables (product distribution). The βiare setted as

σi,i= 1, ..., 6where σiis the standard deviation of

fi. Consequently, all βicontribute the same to y.

3.4 Dataset 4

˙x=dx

dt =0.2x(t−∆)

1 + [x(t−∆)]10 −0.1x(t)

The dataset 4 is is based on the Mackey-Glass

delay differential equation chaotic system. Differ-

ent measures computed on different regimes, in or-

der to configure the classes to be identified. For

∆equals to 17, the system becomes chaotic and as

the ∆increases, so does the complexity. In terms

of complexity, two similar settings are considered.

One for ∆=120,140,...,200 and the second one for

∆= 110,130,...,190. Each regime is derived for 200

samples of the time series length of 1000. 365 dif-

ferent measures are computed on each time series,

which are considered as the features of this dataset.

The features have different relations with each other.

Some are strongly correlated, while others are irrel-

evant. For this dataset, the true relevant features for

the classification of the five dynamical regimes aren’t

known. The challenge in this dataset is to select the

optimal feature subset that discriminame better the

five chaotic regimes.

4 Results

In this paper,three Datasets (1, 2, 3) are generated

for 33,50 and 100 features of 10000 samples each,

while Dataset 4 generated for 365 different metrics

(features) of 1000 samples. Based on these Datasets,

different parallelized variations of FSS are compared

in terms of classification accuracy and execution time.

For Datasets 1, 2 and 3 the binary classification is ex-

amined, while for Dataset 4, the study is performed

for five classes. During the feature selection process

the CRI used as an evaluation function, as it is more

robust. When the optimal subset of features has been

selected, the performance of the algorithm is evalu-

ated with the accuracy metric. The study is performed

for 10 iterations and the results correspond to the av-

erage of execution time and accuracy.

The hardware that was used for the evaluation of

the algorithms, was a computing system with 12th

Gen Intel(R) Core(TM) i7-12700 2.10GHz and 16GB

RAM. The operating system was Windows 11 Pro and

the SSD was 1000GB.

4.1 Results for Dataset 1

In Table 1 the results of the examined variations for

a total of 33,50 and 100 features are illoustrated. The

processes repeated for 10 iterations. frefer the se-

lected features, tcorresponds to the average execu-

tion time, while ac refer to the average values of the

Accuracy similarity measure.

Table 1: Results for Dataset 1

Number of Features 33 50 100

ftac ftac ftac

fsskmeans 5,1 60.61 0.64 5,1 92.51 0.64 5,1 180.63 0.64

fsskmeansRA 5,1 23.43 0.64 5,1 35.37 0.64 5,1 67.62 0.64

fssPkmeansRA 5,1 2.95 0.64 5,1 4.30 0.64 5,1 7.70 0.64

fsskmeansRAP 5,1 26.03 0.64 5,1 36.86 0.64 5,1 76.49 0.64

fsskmeansRASPMD 5,1 31.88 0.64 5,1 47.65 0.64 5,1 92.29 0.64

The results for Dataset 1 (Table 1) suggest that all

the examined variations are equally reliable in terms

of the selected features and the accuracy. For all the

examined sets of features (33,50,100), all methods se-

lect features 1 and 5 as the optimal subset. For this

Dataset, only the features, fi, i=1,...,5 are relevant to

the class variable, while the rest of the features that

are included in the Dataset, are irrelevant to the class

variable. The methods are also equal in classifica-

tion accuracy, as they all achieve about 64%. How-

ever, there is a significant difference at the execution

time. The fssPkmeansRA is significantly faster than

the other examined methods. Specifically, on aver-

age, when the initial dataset consists of 33,50 or 100

features, it is respectively 7.9, 8.2 and 8.8 times faster

than the second one (fsskmeansRA). Additionally,

compared to the simple serial implementation of fss

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

using the simple k-means as classifier (fsskmeans),

the fssPkmeansRA is on average 21.8 times faster.

It is worth mentioning that, when comparing the

two serial methods, the fsskmeansRA employing k-

meansRA as classifier, is considerably quicker than

the one using the conventional k-means as classifier

(fsskmeans). In particular, when the Dataset con-

sists of 33 features, fsskmeans needs 60.61 seconds

on average, in order to stop the execution of the algo-

rithm, while fsskmeansRA needs only 23.43 seconds.

The same seems to apply when the Dataset consists

of 50 or 100 features, where fsskmeans needs 92.51

and 180.63 seconds respectively, while fsskmeansRA

needs 35.37 and 67.62 seconds respectively. Finally,

it is observed that for this dataset, the serial implemen-

tation which uses k-meansRA as classifier, is faster

than the parallelized one with the spmd matlab com-

mand (fsskmeansRASPMD). In terms of execution

time, the parallelized method fsskmeansRASPMD,

had the second worst performance, needed on aver-

age 57.27 seconds in order to complete the optimal

subset.

4.2 Results for Dataset 2

In Table 2 the results of the examined variations for

a total of 33,50 and 100 features are illoustrated. The

processes repeated for 10 iterations. frefer the se-

lected features, tcorresponds to the average execu-

tion time, while ac refer to the average values of the

Accuracy similarity measure.

Table 2: Results for Dataset 2

Number of Features 33 50 100

ftac ftac ftac

fsskmeans 2 49.06 0.74 270.72 0.74 2148.11 0.74

fsskmeansRA 217.36 0.74 224.42 0.74 252.30 0.74

fssPkmeansRA 22.78 0.74 23.29 0.74 26.58 0.74

fsskmeansRAP 215.54 0.74 220.15 0.74 242.05 0.74

fsskmeansRASPMD 222.18 0.74 230.57 0.74 263.17 0.74

For Dataset 2 (Table 2), the same conclusions as

for Dataset 1 are derived. The Datasets consist of 33,

50 and 100 features. Only the features fi, i=1,..,4, are

relevant to the class variable for this system, while

all the other features are irrelevant. The optimal sub-

sets consist of only one feature and specifically the

f2, which is the same for all the examined methods. It

is observed that all the methods under study have the

same accuracy, which is on average 74%, but their ex-

ecution time differs significantly (Table 2). The fssP-

kmeansRA is the fastest one, which needed on aver-

age 2.78, 3.29 and 6.58 seconds to complete the opti-

mal subsets, when the Dataset consists of 33, 50 and

100 features respectively. fssPkmeansRA is about 6

times faster than fsskmeansRAP, which is the second

fastest, while from the slowest one (fsskmeans), the

fssPkmeansRA is about 20 times faster. The serial

FSS method which uses the parallelized k-meansRA

as classifier (fsskmeansRAP), is slightly faster than

the corresponding serial one (fsskmeansRA). When

the Dataset consists of 33 features, fsskmeansRAP

is 1.82 seconds faster than the serial fsskmeansRA.

The same seems to apply when the Dataset consists

of 50 and 100 features, where fsskmeansRAP is faster

than the serial fsskmeansRA, 4.27 and 10.25 sec-

onds respectively. Comparing the two serial methods

fsskmeans and fsskmeansRA, similar to Dataset 1, the

fsskmeansRA is about 2.82, 2.89 and 2.83 times faster

than the serial one (fsskmeans), when the Dataset con-

sists of 33, 50 and 100 features respectively. The se-

rial FSS method, which utilize the parallelized with

the spmd method k-meansRA as classifier, although it

is equal to the other methods in terms of accuracy, it is

the second worse in terms of execution time, needed

38.64 seconds on average in order to complete the fea-

ture selection process.

4.3 Results for Dataset 3

Dataset 3 is considered a little more complicated than

the others. Similarly to Datasets 1 and 2, the true rel-

evant features with the class variable are known (fi,

i=1,..,6). It is observed that contrary to Datasets 1 and

2, the optimal feature subsets obtained from the exam-

ined methods are different. Specifically, the methods

that utilize k-meansRA as classifier, select two fea-

tures for the optimal subset, particularly f4and f2,

which are both true relevant features to the class vari-

able (Table 3). On the contrary, the serial FSS us-

ing the simple k-means as classifier (fsskmeans), se-

lects the feature f4for the optimal subset, but also

adds the feature f21, which is random normal variable

and irrelevant to the class variable y. The accuracy of

the methods that use k-meansRA as classifier is about

67%, while for fsskmeans is about 64%. Similar to

the aforementioned examples (Sections 4.1 and 4.2),

the fssPkmeansRA is the fastest method and specifi-

cally, it is on average 4.43 times faster than the second

fastest (fsskmeansRAP) and 11.4 times faster than the

serial implementation of fss, using the simple k-means

as classifier (fsskmeans). The fsskmeans had the

worst performance both in terms of accuracy and exe-

cution time.In order to complete the feature selection

process, fsskmeans needed 43.73, 65.45 and 150.20

seconds on average, when the Datasets consist of 33,

50 and 100 features respectively. The fsskmeansRAP

method which uses the MATLAB parfor command in

order to parallelize the k-meansRA classifier, is shown

to be faster than the corresponding one that uses the

spmd MATLAB command (fsskmeansRASPMD), re-

gardless the size of the Dataset. Worth noting that for

this Dataset (but also for Dataset 4 and Case study),

the fsskmeans algorithm was adding more features to

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

the optimal subset, but we stopped it to two, in or-

der to compare the execution time between the algo-

rithms. Specifically, for this Dataset, fsskmeans nor-

mally added also the features f7and f12 to the opti-

mal subset, which are also random normal variables

and irrelevant to the class variable y.

In Table 3 the results of the examined variations

for a total of 33,50 and 100 features are illoustrated.

The processes repeated for 10 iterations. frefer the

selected features, tcorresponds to the average execu-

tion time, while ac refer to the average values of the

Accuracy similarity measure.

Table 3: Results for Dataset 3

Number of Features 33 50 100

ftac ftac f t ac

fsskmeans 4,21 43.73 0.64 4,21 65.45 0.64 4,21 150.20 0.64

fsskmeansRA 4,2 24.78 0.67 4,2 36.34 0.67 4,2 89.58 0.67

fssPkmeansRA 4,2 4.87 0.67 4,2 5.55 0.67 4,2 11.28 0.67

fsskmeansRAP 4,2 18.26 0.67 4,2 25.90 0.67 4,2 55.05 0.67

fsskmeansRASPMD 4,2 29.25 0.67 4,2 43.07 0.67 4,2 96.41 0.67

4.4 Results for Dataset 4

For this dataset, the classification problem is to dis-

criminate five highly complex regimes. In Table 4

the results of the examined variations for the Mackey-

Glass delay differential equation chaotic system are

presented. frefer to the selected features, tcor-

responds to the average execution time and ac re-

fer to the average values of the similarity measure

Accuracy. For each case we consider five differ-

ent regimes. For ∆1case we considered the values

of ∆ = 110,130, ..., 190 and for ∆2we considered

∆ = 120,140, ..., 200.

Table 4: Results for Dataset 4

Setting ∆1∆2

f t ac f t ac

fsskmeans 54,71 108.81 0.76 54,74 107.58 0.75

fsskmeansRA 54,71 51.53 0.76 54,74 48.57 0.75

fssPkmeansRA 54,71 7.68 0.76 54,74 7.07 0.75

fsskmeansRAP 54,71 64.98 0.76 54,74 60.16 0.75

fsskmeansRASPMD 54,71 110.92 0.76 54,74 102.80 0.75

For this data set, the actual relevant features for

the classification of the five dynamic regimes are un-

known, in contrast to Datasets 1,2 and 3. However

the results (Table 4) are quite similar to the previ-

ous Datasets in terms of accuracy and computational

time. The fastest method is fssPkmeansRA, which

needed on average 7.68 seconds to select the opti-

mal feature subset for the setting ∆1(five regimes for

∆ = 110,130, ..., 190) and 7.07 seconds for the set-

ting ∆2(∆ = 120,140, ..., 200). It is also faster than

fsskmeansRA (second fastest) 6.70 and 6.87 times

on average, for the settings ∆1and ∆2respectively.

Compared to the serial implementation fsskmeans,

fssPkmeansRA is faster 14.17 times for the setting ∆1

and 15.21 times for ∆2. In terms of Accuracy, all al-

gorithms are equal and achieve an average classifica-

tion accuracy of 76% for ∆1and 75% for ∆2. For the

setting ∆2, the common features selected by the algo-

rithms in all iterations are those presented in Table 4,

while depending on the iteration, the feature (each al-

gorithm is repeated 10 times and different results can

occur) f51 is selected two of ten times against f54.

It was also observed that fsskmeans, which uses the

simple k-means as classifier, added a lot more features

to the optimal subset compared to the other methods

using k-meansRA as classifier. We stopped the al-

gorithm when the optimal subset included two fea-

tures, in order to compare the execution time with the

other examined algorithms. The extra features that

fsskmeans added to the optimal subset, lead to an ex-

tra computational cost, but it also seems to achieve

better classification accuracy, since the Accuracy be-

comes slightly better than the other methods (0.77

for ∆1and 0.79 for ∆2). The optimal feature subset

that best discriminates the five chaotic regimes is un-

known, and the algorithms selected a feature set that

best served this purpose.

5 Case study

In order to evaluate the performance of the examined

methods, a study in real world dataset was carried

out. The study was performed to the Human Activity

Recognition using Smartphones Dataset, which car-

ried out with a group of 30 volunteers within an age

range of 19-48 years and each person performed six

activities such as walking, walking upstairs, walking

downstairs, sitting, standing, laying, wearing a smart-

phone on the waist. The Dataset consists of 561 fea-

tures and 10299 instances. The true relevant features

with the class variable aren’t known. The study was

performed similarly as before, but for real world data.

Similar to Dataset 4, the problem doesn’t concern bi-

nary classification (6 classes here) and the Accuracy

was performed in order to evaluate the classification

accuracy of the different examined methods. The re-

sults of the examined variations for the Human Activ-

ity Recognition Using Smartphones Dataset are pre-

sented in Table 5. frefer to the selected features, t

corresponds to the average execution time, while ac

refer to the average values of the similarity measure

Accuracy.

The fssPkmeansRA achieved the highest accuracy

(about 77%), while it was needed only 66.05 sec-

onds in order to complete the optimal subset. Com-

paring with the methods that use k-meansRA as clas-

sifier, fssPkmeansRA is about 2.16, 6.40 and 8.14

times faster than fsskmeansRAP, fsskmeansRA and

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

Table 5: Results of Case Study

f t ac

fsskmeans 10,42 737.56 0.71

fsskmeansRA 10,57 422.90 0.77

fssPkmeansRA 10,57 66.05 0.77

fsskmeansRAP 10,57 142.84 0.77

fsskmeansRASPMD 10,57 538.28 0.77

fsskmeansRASPMD respectively. The simple serial

implementation fsskmeans had the worst classifica-

tion accuracy (about 71%). Compared to the serial

fsskmeans, fssPkmeansRA was 11.17 times faster. It

is also observed that the methods that use k-meansRA

as classifier, select only different features com-

pared to fsskmeans. Specifically fsskmeansRA, fssP-

kmeansRA, fsskmeansRAP and fsskmeansRASPMD

select the features f10 and f57 for the optimal subset,

while the serial implementation fsskmeans selects the

features f10 and f42. It is worth noting that normally

the fsskmeans selected five or more features for the

optimal subset (f10,f42,f53,f147,f133). Similarly to

the Dataset 3 and the Dataset 4, the serial fsskmeans

algorithm was stopped when two features were se-

lected, since only a slight improvement was observed

in terms of accuracy and in order to compare it with

the other methods, in terms of the execution time. The

serial FSS method utilizing parallelized k-meansRA

with the MATLAB spmd command, needed on aver-

age 538.28 seconds in order to complete the feature

selection process, while achieving about 77% accu-

racy. Compared to the serial FSS method using the

parfor MATLAB command in order to parallelize the

k-meansRA classifier, it is observed that fsskmean-

sRASPMD is about 3.77 times slower, without im-

proving the accuracy.

In a study conducted in 2015, [33], for the same

dataset, the authors achieved classification accuracy

of 60% in 582.1 seconds using serial implementation

of k-means. As shown in Table 5, fssPkmeansRA

achieve 77% classification accuracy, while it only

takes 66.05 seconds to run, which as mentioned in

their paper, is really important in real time activity

monitoring, because it requires the model to be built

dynamically from the captured data.

6 Discussion

For Datasets with known functionally relevant fea-

tures to the class variable such as Dataset 1, Dataset

2 and Dataset 3, the fssPkmeansRA was faster than

both the serial implementations and the other exam-

ined parallelized methods, while the classification ac-

curacy was equal or even better. Such an example

concerns Dataset 3, where taking into account that

based on the design of the system, only the f1,...,f6

features are important and related to the class vari-

able, it was observed that the serial implementation

selected only f4and included feature f21 to the op-

timal subset of features. This could lead to really

bad conclusions, since for Dataset 3, beyond features

f1,...,f6, all the other features are random normal vari-

ables and irrelevant to the class variable y. It is there-

fore understood that the fssPkmeansRA, contrary to

the other examined methods, could dramatically re-

duce the processing time, without compromising the

efficiency.

For datasets where the actual relevant features are

unknown, the serial implementation fsskmeans, ap-

pears slightly more accurate in terms of classification

accuracy, when more features were added to the opti-

mal subset (but at a high computational cost) and less

accurate when the comparison is made for the same

size of the optimal subset.

For all the examined Datasets, the fsskmean-

sRASPMD that use spmd MATLAB command in

order to parallelize the k-meansRA classifier, was

slower even from the corresponding serial one

(fsskmeansRA), which indicates that this method

might not be appropriate for these systems.

In general, the proposed fssPkmeansRA variant,

where the whole feature selection process is paral-

lelized, appeared to be the fastest feature selection

method compared to the rest of the examined meth-

ods, but also with high accuracy rates, proving that it

could also be used in on-line processes for reducing

the dimensional space in clustering and classification

problems.

7 Conclusion

In this work, different parallelization variations of

feature selection algorithms using k-means as classi-

fier were proposed and examined, in order to evaluate

the significance of the parallelization process in terms

of computational cost and classification accuracy, as

well as to emphasize on the importance of the meth-

ods and the tools that will be chosen for the paral-

lelization to be occured.Specifically, different meth-

ods such as fsskmeans and fsskmeansRA was em-

ployed, in order to examine the importance of the

choice of classifier, since fsskmeans uses the sim-

ple k-means and fsskmeansRA uses the k-meansRA.

The fssPkmeansRA and fsskmeansRAP were com-

pared, in order to examine the significance of se-

lecting which part of the algorithm to parallelize,

since in the fssPkmeansRA the whole feature selec-

tion process was parallelized, while in fsskmeansRAP

only the classifier was parallelized. The fsskmean-

sRASPMD was employed for comparing different

parallelization MATLAB techniques such as parfor

and spmd, when all the other parameters remained the

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

same. The results suggest that in order to maximize

accuracy and minimize the execution time, it is cru-

cial to determine carefully the parameters for both the

classifier to be used and the part of the algorithm that

will be parallelized. At least for the Datasets that used

in this study, the fssPkmeansRA considered as the

most reliable both in terms of accuracy and execution

time. In the fssPkmeansRA the whole feature selec-

tion process was parallelized using the parfor MAT-

LAB command and the k-meansRA was employed as

classifier.

Further research would involve a different kind of

parallelization, not only on the CPU but also on the

GPU. A comparison of execution time and accuracy

with other feature selection methods, using different

classifiers, could also be performed to further exam-

ine the importance of parallelization.

8 Declarations

All authors declare that they have no conflicts of in-

terest.

References:

[1] S. Mittal, M. Shuja, and M. Zaman, “A review of

data mining literature,” IJCSIS, vol. 14, pp. 437–

442, 2016.

[2] J. A. Hartigan and M. A. Wong, “Algorithm as

136: A k-means clustering algorithm,” Journal

of the Royal Statistical Society. Series C (Ap-

plied Statistics), vol. 28, no. 1, pp. 100–108,

1979.

[3] M. Capo, A. Pérez, and J. Lozano, “An efficient

k-means algorithm for massive data,” 2016.

[4] M. Omran, A. Engelbrecht, and A. Salman, “An

overview of clustering methods,” Intell. Data

Anal., vol. 11, pp. 583–605, 2007.

[5] X.-D. Wang, R.-C. Chen, F. Yan, Z.-Q. Zeng,

and C.-Q. Hong, “Fast adaptive k-means sub-

space clustering for high-dimensional data,”

IEEE Access, vol. 7, pp. 42639–42651, 2019.

[6] R. Chen, C. Dewi, S. Huang, and R. Caraka,

“Selecting critical features for data classification

based on machine learning methods,” Journal

Of Big Data, vol. 7, p. 26, 2020.

[7] R. Shang, J. Chang, L. Jiao, and Y. Xue,

“Unsupervised feature selection based on self-

representation sparse regression and local simi-

larity preserving,” International Journal of Ma-

chine Learning and Cybernetics, vol. 10, 2019.

[8] S. Shekhar, N. Hoque, and K. Bhattacharyya,

“Pknn-mifs: A parallel knn classifier over an op-

timal subset of feature,” Intelligent Systems with

Applications, vol. 14, p. 200073, 2022.

[9] I. Guyon and A. Elisseeff, An Introduction to

Feature Extraction, vol. 207, pp. 1–25. 2008.

[10] C. C. Aggarwal and C. K. Reddy, eds., Data

Clustering: Algorithms and Applications. CRC

Press, 2014.

[11] C. Ding and H. Peng, “Peng, h.: Minimum

redundancy feature selection from microarray

gene expression data. journal of bioinformat-

ics and computational biology 3(2), 185-205,”

Journal of bioinformatics and computational bi-

ology, vol. 3, pp. 185–205, 2005.

[12] A. Tsimpiris, I. Vlachos, and D. Kugiumtzis,

“Nearest neighbor estimate of conditional mu-

tual information in feature selection,” Expert

Systems with Applications, vol. 39, p. 12697–

12708, 2012.

[13] H. Chen and X. Chang, “Photovoltaic power

prediction of lstm model based on pearson

feature selection,” Energy Reports, vol. 7,

pp. 1047–1054, 2021. 2021 International Con-

ference on Energy Engineering and Power Sys-

tems.

[14] J. Maldonado, M. Riff, and B. Neveu, “A re-

view of recent approaches on wrapper feature

selection for intrusion detection,” Expert Sys-

tems with Applications, vol. 198, p. 116822,

2022.

[15] K. Bouzoubaa, Y. Taher, and B. Nsiri, “Pre-

dicting dos-ddos attacks: Review and evalua-

tion study of feature selection methods based

on wrapper process,” International Journal of

Advanced Computer Science and Applications,

vol. 12, 2021.

[16] X. Zhang, G. Wu, Z. Dong, and C. Craw-

ford, “Embedded feature-selection support vec-

tor machine for driving pattern recognition,”

Journal of the Franklin Institute, vol. 352, no. 2,

pp. 669–685, 2015. Special Issue on Control and

Estimation of Electrified vehicles.

[17] M. Zhu and J. Song, “An embedded backward

feature selection method for mclp classifica-

tion algorithm,” Procedia Computer Science,

vol. 17, pp. 1047–1054, 2013. First Interna-

tional Conference on Information Technology

and Quantitative Management.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

[18] N. Mahendran and P. Vincent, “A deep learn-

ing framework with an embedded-based feature

selection approach for the early detection of the

alzheimer’s disease,” Computers in Biology and

Medicine, vol. 141, p. 105056, 2022.

[19] L. Venkataramana, S. Jacob, and R. Rama-

doss, “A parallel multilevel feature selection

algorithm for improved cancer classification,”

Journal of Parallel and Distributed Computing,

vol. 138, 2019.

[20] Q. L., J. W., and H. Z., “A wind speed interval

forecasting system based on constrained lower

upper bound estimation and parallel feature se-

lection,” Knowledge-Based Systems, vol. 231,

p. 107435, 2021.

[21] J. González-Domínguez, V. Bolón-Canedo,

B. Freire Castro, and J. Touriño, “Parallel fea-

ture selection for distributed-memory clusters,”

Information Sciences, vol. 496, 2019.

[22] N. Hijazi, H. Faris, and I. Aljarah, “A paral-

lel metaheuristic approach for ensemble feature

selection based on multi-core architectures,”

Expert Systems with Applications, vol. 182,

p. 115290, 2021.

[23] H. Kizilöz and A. Deniz, “An evolutionary

parallel multiobjective feature selection frame-

work,” Computers and Industrial Engineering,

vol. 159, p. 107481, 2021.

[24] B. Beceiro, J. González-Domínguez, and

J. Touriño, “Parallel-fst: A feature selection

library for multicore clusters,” Journal of

Parallel and Distributed Computing, vol. 169,

pp. 106–116, 2022.

[25] J. B. MacQueen, “Some methods for classifica-

tion and analysis of multivariate observations,”

vol. 1, pp. 281–297, 1967.

[26] D. Varsamis and A. Tsimpiris, “Parallel imple-

mentations of k-means in matlab,” Contempo-

rary Engineering Sciences, vol. 13, pp. 359–

366, 2020.

[27] L. Hubert and P. Arabie, “Comparing parti-

tions,” Journal of Classification, vol. 2, 1985.

[28] C. E. Shannon, “A mathematical theory of com-

munication,” The Bell System Technical Jour-

nal, vol. 27, no. 3, pp. 379–423, 1948.

[29] J. Hanley and B. Mcneil, “The meaning and use

of the area under a receiver operating character-

istic (roc) curve,” Radiology, vol. 143, pp. 29–

36, 1982.

[30] K. Fu, J.-C. Simon, A. Checroun, C. Roche,

E. Coffman, and J. Eve, “Sequential methods

in pattern recognition and machine learning,”

Comptes Rendus Hebdomadaires des Séances

de l’Académie des Sciences, Série A, 1971.

[31] D. W. Aha and R. L. Bankert, A Compara-

tive Evaluation of Sequential Feature Selection

Algorithms, pp. 199–206. New York, NY:

Springer New York, 1996.

[32] A. Tsimpiris and D. Kugiumtzis, “Feature selec-

tion for classification of oscillating time series,”

Expert Systems, vol. 29, no. 5, pp. 456 – 477,

2012.

[33] M. Yamin and G. Chetty, “Intelligent human ac-

tivity recognition scheme for e health applica-

tions,” Malaysian Journal of Computer Science,

2015.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.10

Nikolaos Papaioannou, Alkiviadis Tsimpiris,

Christos Talagozis, Leonidas Fragidis,

Athanasios Angeioplastis,

Sotirios Tsakiridis, Dimitrios Varsamis

E-ISSN: 2224-3402

Volume 20, 2023

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US