Machine learning approach for intrusion detection systems as a cyber

security strategy for Small and Medium Enterprises

NEVILA BACI, KRESHNIK VUKATANA, MARIUS BACI

Department of Statistics and Applied Informatics

University of Tirana

Nënë Tereza Square, 4

ALBANIA

Abstract: Small and medium enterprises (SMEs) are businesses that account for a large percentage of the econ-

omy in many countries, but they lack cyber security. The present study examines different supervised machine

learning methods with a focus on intrusion detection systems (IDSs) that will help in improving SMEs’ security.

The algorithms that are tested through a real dataset, are Naïve Bayes, Sequential minimal optimization (SMO),

C4.5 decision tree, and Random Forest. The experiments are run using the Waikato Environment for Knowledge

Analyses (WEKA) 3.8.4 tools and the metrics used to evaluate the results were: accuracy, false-positive rate

(FPR), and total time to train and build a classification model. The results obtained from the original dataset with

130 features show a high value of accuracy, but the computation time to build the classification model was notably

high for the cases of C4.5 (1 hr. and 20 mins) and SMO algorithm (4 hrs. and 20 mins). the Information Gain (IG)

method was used and the result was impressive. The time needed to train the model was reduced in the order of

a few minutes and the accuracy was high (above 95%). In the end, challenges that SMEs can have for choosing

an IDS such as lack of scalability and autonomic self-adaptation, can be solved by using a correct methodology

with machine learning techniques.

Key-Words: Intrusion detection systems, Machine learning, Small and medium enterprises, Cyber-security.

Received: August 9, 2021. Revised: January 7, 2022. Accepted: January 19, 2022. Published: January 20, 2022.

1 Introduction

This paper presents an overview of machine learning

techniques applied in intrusion detection systems with

a focus on Small and Medium Enterprises (SMEs).

Recently, data breaches and cyber-attacks continue to

increase not only in SMEs but in every business in the

market. The growing number of attacks derives an in-

creasing cost of dealing with them, that is why secu-

rity should be a priority for the businesses. Most of

the Intrusion Detection Systems (IDSs) in the market

are signature-based and for this reason, the process of

discovering new intrusions becomes a big challenge.

The selection of appropriate classification algorithms

for IDSs is a challenging task and has become a prior-

ity in the security field. A lot of techniques of machine

learning have been proposed as a solution to improve

the accuracy of signature-based methods[1]. These

techniques have enormous potential and can be used

to build robust models for the classification of mali-

cious activities on the SME information systems. The

IDSs must be capable to identify the existing malware

or discover new ones.

Different factors should be considered when applying

these techniques such as the dataset size and the pre-

sented features in the dataset, because they have a big

impact on the system performance[2]. There are cases

when irrelevant features present in the dataset, can

lead machine learning techniques to different issues

such as classification misleading, overfitting, general-

ity reduction, model run-time enhancement, and pro-

cessing complexity. When it comes to SMEs, one of

the challenges to face with the IDSs is the presence of

a false-positive rate (FPR) resulting in a high work-

load for analyzing the logs. SMEs having in place

IDSs can reduce the streamline and improve system

accuracy. Applying machine learning techniques can

be a solution to the intrusion detection process. The

classification of the attacks in different classes is the

most important task performed by an IDS and can

be performed using different machine learning tech-

niques. These techniques must be properly tuned and

not blindly applied to reduce complexity by not af-

fecting the performance of the system.

Nowadays, SMEs are using different means of com-

munication such as Cloud services, social media, mo-

bile devices, etc. This leads to more breaches, ren-

dering the SME systems more vulnerable. SMEs are

more exposed to cyber-criminals than other big en-

terprises. The vulnerability of SMEs is shown by the

number of breaches on their systems that for the year

2018 is increased by 424%[3]. Hackers are increas-

ingly targeting more small businesses rather than big

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2022.19.43

Nevila Baci, Kreshnik Vukatana, Marius Baci

E-ISSN: 2224-2899

474

Volume 19, 2022

ones. The main reason that SMEs are becoming a

target for cybercriminals is the assumption that their

security systems are less strong compared to the big

companies. In SMEs, the vulnerabilities often arise

because of not taking adequate cyber security mea-

surements, mainly due to the lack of financial and hu-

man resources. By doing so, they increase the risk

to guarantee the data confidentiality and integrity of

their clients. For example, in the year 2019, around

58% of SMEs have been a victim of a cyberattack, re-

sulting on average downtime for every breach in more

than 8 hours[4]. In terms of money, these attacks are

estimated to cost around $3 million, resulting in los-

ing profits, but most important losing clients because

of trustiness. On the other hand, big enterprises un-

like SMEs have human resources, technical expertise,

and finance to protect their information assets as ex-

plained in the Kshetri[5]. So, the solution is to in-

crease the cyber security investment.

Machine learning techniques are wildly used in IDSs

to achieve effectiveness with datasets that are not suf-

fering from irrelevant, and redundant feature sets.

The aim is to analyze the impact and consequences

of cyber-attacks in an information system with a fo-

cus on SMEs, and to show the effectiveness of apply-

ing machine learning techniques in intrusion detec-

tion systems. For example, in cases when an attacker

tends to gain access or interrupt normal operations

of an information system, almost always he is trying

to cause damage and malfunctions. Different super-

vised and unsupervised machine learning techniques

are used to address the major challenges faced by

IDSs such as Decision Tree algorithm (DTA) and Sup-

port Vector Machine (SVM) as shown by Ektefa re-

search[6]. Some methods outperform others in terms

of classification accuracy, but less interest is shown

in computational time that is an important factor in

choosing the right algorithm and is addressed in this

work.

With the new General Data Protection Regulation

(GDPR)[7], which came into force in May 2018, new

regulations must be followed by enterprises during a

data breach. If the company systems incur any data

breach, it should be documented no later than 72 hours

after having become aware of it. In these circum-

stances, implementing strong IDS can guarantee the

enterprises to monitor the network or the systems for

malicious activity and policy violations, and have the

possibility to document it, for example through logs.

In this paper, the focus is to investigate the different

machine learning techniques used in the context of

IDS to ascertain the potential presence of any tech-

nique through experimental exploration which can be

used for SME scenarios by showing the power of

feature selection methods in improving the classifi-

cation of different attacks into classes. The purpose

is to show the effectiveness of using the right ma-

chine learning techniques for the IDS to solve the

most significant challenges faced such as high com-

putational time and low accuracy. To evaluate these

two parameters on the IDSs, several experiments were

conducted with real data, the Aegean Wi-Fi Intrusion

Dataset (AWID) dataset[8]. Initially, the data were

pre-processed, and then the relevant features were ex-

tracted to reduce the dimensionality of the dataset.

These two steps were important for improving the

classification accuracy and reducing the computation

time. In the end, different machine learning methods

were applied, and the results were compared through

the metrics of accuracy, FPR, and total time to build

the classification model.

2 Materials and Methods

2.1 Intrusion Detection Systems

Cyber security experts implement different methods

to defend from malicious attacks like firewalls, Intru-

sion Prevention System (IPS), or IDS. The latter is

one of the most essential components of computer se-

curity used to detect attacks before they are widely

spread. An intrusion is classified as the set of actions

aimed to compromise the security goals that are in-

tegrity, confidentiality, and availability of computer

resources[9]. An IDS is a device or software that de-

tects any malicious activity or attack on protected as-

sets. It can analyze the collected data in a given net-

work to identify malicious behavior or policy viola-

tions and then prepare a report for the system admin-

istrator to handle the intrusion, summarizing the func-

tions of IDS such as:

• to monitor user and system activity;

• to detect attacks as soon as possible;

• to enforce the network traffic;

• to analyse statistical patterns;

• to audit of operating system.

There is also, a classification on types of IDS that

are Network-based IDS (NIDS) or Host-based IDS

(HIDS), depending on weather the system monitors

a single host or a network[10].

2.1.1 HIDS

A HIDS relies heavily on audit trials, becoming lim-

ited in finding new attacks. It monitors and analy-

ses the input/output packets from a single device per-

forming log analyses, file integrity checking, policy

monitoring, etc. In any case, HIDS tends to be desir-

able for some reasons. For example, because it can

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2022.19.43

Nevila Baci, Kreshnik Vukatana, Marius Baci

E-ISSN: 2224-2899

475

Volume 19, 2022

Table 1: Difference between HIDS and NIDS.

Types Advantages Disadvantages

NIDS Monitors multiple hosts at a time.

Attacks of different hosts can be correlated.

It does not decrease the host performance.

Problem with encrypted network traffic.

Keep up with the network speed.

HIDS Can analyse data for specific use in the host by

using audit trials.

Operates in environments that are encrypted.

Cross platform based.

Can’t see network traffic.

Large cost in setting up.

Table 2: Comparison of detection methods in IDS.

Types Advantages Disadvantages

AIDS Able to detect new attacks.

Signature database can be updated based on new

attacks.

Cannot handle encrypted packets.

False-positive alarms are high.

Difficult to classify alerts in different categories.

Training phase is complex.

SIDS False-positives are low.

Better for detecting the known attacks.

Simple design.

Cross platform based.

Can’t see network traffic.

Large cost in setting up.

Table 3: AWID Dataset Characteristics[8].

Dataset Purpose No. of Records Target variables

AWID-CLS-R-Trn

AWID-CLS-R-Tst

Training phase

Testing phase

1,795,575

530,643

4 classes

monitor access to information in terms of “who ac-

cessed what”, this system can trace the activities of

a specific user and determine whether an attack has

occurred or not. Moreover, this system is capable to

operate in an encrypted computer environment. Since

HIDS comes with the system, there are also cost ad-

vantages to using it among other systems.

On the other hand, there are some disadvantages. The

main is that it cannot monitor the network traffic be-

ing heavily dependent on the operating system that is

hosting it. More in detail, a typical HIDS must raise

a flag or report the information about any malicious

activity that occurred. This can be a downside on the

performance of the hosting machine as HIDS uses the

same resources and does not have a standalone oper-

ating system like other types of IDS[11].

2.1.2 NIDS

The NIDS offers a different approach. The data are

collected from the network rather than from a single

host. NIDS checks for misbehavior by inspecting In-

ternet Protocol (IP) protocol-level activities and net-

work packet structure to detect many IP-based Denial

Of Service (DOS) attacks such as Transmission Con-

trol Protocol (TCP) Synchronized attacks.

The disadvantage with NIDS is that it has limited vis-

ibility within the host machine and there is no effec-

tive way to analyze the encrypted network traffic to

defend the system. There exists available software

or tools with different solutions such as Network In-

trusion Detection & Prevention System (SNORT) or

NetSTAT, a command-line network utility on Unix-

like operating systems that monitor the network traf-

fic in real-time.

Table 1. summarizes shortly the advantages and dis-

advantages of both, NIDS and HIDS. The first fo-

cuses more on vulnerability abuse while the second

focuses on privilege abuse. From a financial perspec-

tive, NIDS costs less and is faster in time response

than HIDS, because it monitors the traffic in real-time

or close to real-time.

2.2 Intrusion Detection Approaches

Based on the detection method, IDSs can be princi-

pally classified into two main categories, signature-

based and anomaly-based, but in general, some sys-

tems operate as a hybrid system.

Signature-based Intrusion Detection Systems (SIDS)

detect attacks based on the most used method, the pat-

tern matching technique. Patterns detected in IDSs

are known as signatures. The system tries to match a

new intrusion with the existing ones stored in the sig-

nature database, and when a match occurs an alarm

is triggered. SIDS among experts is known also as

Knowledge-Based Detection[12]. This system can

detect already known attacks, whose signature al-

ready exists in the system, but they are incapable of

detecting new attacks because their signatures are not

stored in the database. The problem here is that the

signature database must be updated frequently, oth-

erwise, attacks whose signatures do not exist in the

catalog are unlikely to be detected. In practice, SIDS

gives a good classification accuracy for the detection

of previously known attacks.

Anomaly-based Intrusion Detection Systems (AIDS)

have drawn interest because they overcome the lim-

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2022.19.43

Nevila Baci, Kreshnik Vukatana, Marius Baci

E-ISSN: 2224-2899

476

Volume 19, 2022

itations of SIDS. In AIDS a reliable behavior model

is developed using different approaches such as Ma-

chine Learning, statistical methods, or knowledge-

based methods. The observed behavior is compared

with the data model and every significant deviation is

an anomaly. These anomalies can be classified as in-

trusions. The statistical method deals with anomaly

detection from randomness, while the knowledge-

based method includes capturing the alleged behavior

by network traffic instances and other relevant system

data[13]. AIDS goes throw two processes: the train-

ing phase when a model is developed filled with the

normal behavior data, and then in the testing phase, a

new data set is used to test the capability of the system

to detect the “not normal behavior” classified as an in-

trusion. In comparison to SIDS, AIDS is better when

it comes to the chance to identify zero-day attacks be-

cause it does not depend on matching the data with

patterns in the signature database. The main differ-

ences between these two types of systems are shown

in Table 2.

Taking into consideration the advantages of both sys-

tems, a hybrid one can be implemented with both

methods. This system can detect zero-day attacks

and reduce the number of false alarms. In their re-

search[13] found that no system was just signature or

anomaly-based, IDSs are usually deployed as a hybrid

system.

2.3 Datasets

The experiments conducted in this research are eval-

uated on the AWID Dataset[8]. This dataset is avail-

able and public. It is focused on 802.11 networks and

was introduced in 2015. There are different datasets

available for the intrusion detection systems, but this

dataset is recommended to be used as it contains

real data, captured through Wireless Local Area Net-

work(WLAN) traffic in a packet-based format.

The data collected are around 37 million packets, cap-

tured in one hour. Originally this dataset has 155 at-

tributes and one target variable. There are two types

of datasets available, the “CLS” and the “ATK”. The

first type named “CLS” has four target classes: nor-

mal, impersonation, overflow, and injection. On the

other dataset, the attacks are classified into 15 differ-

ent categories. The dataset creators have produced

for research purposes two different datasets for the

first type ”CLS”, a complete and a reduced one. To

simplify our work, and because the experiment runs

on one personal computer (PC), the reduced version

of the data is used. The properties of the dataset are

shown in Table 3.

2.4 Data pre-processing

To enhance the classification accuracy, the data are

pre-processed. First, all the string attributes are con-

verted into numeric ones. Some missing values in dif-

ferent attributes were discovered in the dataset. There

were not applied imputation methods for the miss-

ing values as the focus of this work is not on those

techniques. The approach followed was to replace all

the missing values with zero. Some machine learn-

ing techniques do not work with missing data, for this

reason, the transformation of the data was obligatory

to be done. Also, some attributes have the same con-

stant values for all the instances, but they do not con-

tain any relevant information to interfere in creating

new classes. In this case, the attributes were simply

discarded. Next, followed a process of normalizing

the data to avert the feature influence of the measure-

ment scale, transforming the raw data feature values

between zero and one.

2.5 Algorithms for Machine Learning

Classification

Thiis section describes some of the machine learn-

ing methods that were applied in the experiments.

There are a lot of methods that might be applied in the

dataset selected, but onlys some of them were chosen,

evidenced during the literature review. The classifica-

tion algorithms used in machine learning models for

classifying a given dataset were as follows: Bayesian

Network, SVM, C4.5 decision tree, and Random For-

est.

The Bayesian Network method provides a graphical

representation of different probabilistic relationships

between variables[14]. Compared to other statistical

methods, this method has some advantages such as:

• Graphical representation of the relationship be-

tween all the variables.

• Can be used to learn about causal relationships.

• Identifies random relationships.

• Addresses multidimensional statistical problems.

The dependencies between variables are presented

graphically by a Directed Acyclic Graph (DAG) and a

probability table. Bayesian networks are widely used

because they provide an efficient method for prevent-

ing overfitting.

SVM is a linear model used for classification or re-

gression problems where each data item is plotted as

a point in n-dimensional space[15]. N is the number

of attributes that a dataset has. This method is rec-

ommended in small and medium datasets. SVM uses

a high dimensional feature space, a kernel function,

and the training phase is based on the optimization

theory. Sequential minimal optimization (SMO) is a

typical algorithm for solving problems that arise dur-

ing the training of SVMs[16].

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2022.19.43

Nevila Baci, Kreshnik Vukatana, Marius Baci

E-ISSN: 2224-2899

477

Volume 19, 2022

The C4.5 decision tree is a model based on decision

trees. It is used when dealing with supervised clas-

sification problems. C4.5 comes as an extension of

Iterative Dichotomiser 3 (ID3), which has two limi-

tations[17]. The first occurs when two or more cases

with identical values belong to different classes. The

second is related to the risk of overfitting. The deci-

sion tree is built using the concept of entropy. Accord-

ing to this algorithm, the attribute is selected as the

tree node separates objects more productively com-

pared to other attributes, thanks to the gain of infor-

mation. This algorithm is based on probability con-

cepts to create a complete data table. The most im-

portant benefit of using decision trees is comprehen-

sibility, because of their visualization in the form of

a tree. The tree is created by a recursive-separation

algorithm, which in each non-terminal node, deter-

mines a value for a variable. In this manner, the re-

maining branches or classes have a better differenti-

ation. The root of the tree is a question or a qualita-

tive variable, which has several categories. The main

disadvantage is that Decision trees suffer from over-

fitting when the dataset is small and from testing only

one attribute at a time.

Random Forest is one of the most popular assembling

supervised machine learning algorithms that are ca-

pable of unpruned classification or regression. The

random forest creates decision trees on data samples

and is very efficient in large datasets. Random forest

models are robust to overfitting and are based on the

bagging technique to combine the decision trees[18].

Compared to decision trees, random forest models are

more difficult to be interpreted.

2.6 Experimental environment and

Evaluation metrics

The experiments are run using WEKA 3.8.4 (Waikato

Environment for Knowledge Analyses) on a PC with

4 Core CPU Intel i7-4900MQ. Steps involved in the

experiments and the logical flow of the process are

shown in Fig. 1.

Different metrics[19] can be used to evaluate the

results of machine learning methods like accuracy or

recognition rate, confusion matrix, recall, FPR, sensi-

tivity or true positive rate (TPR), specificity, learning

time, precision, and Receiver Operating Characteris-

tic (ROC) curve. When it comes to IDS, True Positive

(TP), False Positive (FP), True Negative (TN), and

False Negative (FN) are important metrics to measure

system reliability. The concepts of accuracy and FPR

are defined by the metrics as follows:

Accuracy =T P +T N

T P +T N +F P +F N (1)

Figure 1: Steps Involved in the Experiments.

F P R =F P

F P +T N (2)

where accuracy (1) is the percentage of group records

that are correctly classified and FPR (2) is the report

of normal events classified as attacks and training

time is the total time to build a classification model.

2.7 Feature selection process

To show the differences between the original dataset

and a reduced one, we applied a feature selection

method known as Information Gain (IG). In this tech-

nique, the significance of each feature is calculated as

the relationship between the information gain factor

following the class. This method is a ranked-based

technique that does provide the final list of selected

features and it is part of filter-based feature selection

techniques.

Selection of the features is an important task during

the training data set as it influences the correct classi-

fication. In literature is stressed the fact that reducing

the feature space can often contribute to increasing

system accuracy[20]. Table 4 shows the feature se-

lection results based on the AWID datasets where is

applied the IG feature selection method. All the se-

lected machine learning methods are running in both

datasets, the original one and the reduced dataset us-

ing the IG methods, to have a comparison.

3 Results and Discussion

Table 4 shows the results obtained from the original

dataset with 130 variables, taking into consideration

the accuracy and the FPR. The accuracy is high,

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2022.19.43

Nevila Baci, Kreshnik Vukatana, Marius Baci

E-ISSN: 2224-2899

478

Volume 19, 2022

Table 4: Classifier Evaluation with/without applying feature selection method.

Method Accuracy FPR Model Training Time Feature Selection[No. of features]

Naïve Bayes

SMO

C4.5

Random Forest

83.02%

95.21%

93.99%

95.98%

0.138

0.512

0.114

0.231

00:04:47

04:19:23

01:20:29

00:10:32

Original dataset[130]

Naïve Bayes

SMO

C4.5

Random Forest

95.02%

96.69%

99.76%

97.98%

0,143

0,021

0,072

0,029

00:01:01

00:04:22

00:05:29

00:01:67

IG[39]

where the best result is obtained from the application

of the Random Tree algorithm, which achieved a

95.98% accuracy, and the lowest is 83.02% ob-

tained by the Naïve Bayes algorithm. The accuracy

achieved is quite satisfactory, but the computation

time is notably high, especially in the cases of C4.5

(1 hour and 20 minutes) and SMO algorithm (4 hours

and 20 minutes). To reduce this time required for the

training phase is used the IG method.

After the feature selection method is applied, the

original dataset is reduced to 39 features. Using the

IG method, the irrelevant features are removed. In

the experiment, all the machine learning methods are

re-executed. Results of the reduced dataset are shown

in Table 4. The table shows that all the methods have

a high accuracy rate above 95%. More in detail, there

is respectively for Naïve Bayes, SMO, C4.5, and

Random Forest an increment of 12%, 1,48%, 5,77%,

and 2%.

The experimental results demonstrate also an im-

provement for all the algorithms regarding the FPR,

except the Naïve that remain in the same order. Most

important, the time needed to train the model is

reduced drastically in order of 5 minutes also for

the C4.5 and SMO method. This is a satisfactory

result and shows that the feature selection algorithm

has selected the right components, improving the

performance of intrusion classification in terms of

accuracy and computing time.

4 Conclusions and Further Work

SMEs are considered the backbone of the economy,

as they have great potential for job creation, growth,

and innovation. Cloud services, social media, and

mobile devices bring a range of challenges and de-

mands to SMEs affecting how business is done via

different means of communication. Nowadays, it is

common for SMEs to face cyber security incidents

where personal data is stolen. Mostly, these security

incidents go undetected or unreported. While big en-

terprises have long started employing cyber security

strategies including IDS and IPS to defend against at-

tacks, SMEs are still standing on the edge.

This paper aimed to address the challenges that SMEs

must face related to IDS as the most important de-

fense tool against network attacks. Several experi-

ments were conducted applying different supervised

machine learning methods to improve the detection of

attacks in these systems. The algorithms applied were

SMO, C4.5, Random Forest, and Naïve Bayes. All

the experiments were conducted in the AWID dataset

which has been widely used in the research for sim-

ulating the IDS. For all the experiments were used

Weka 3.8 tools.

In our findings, once the feature selection method is

applied, the dataset complexity and dimensionality

are drastically reduced improving the system accu-

racy and reducing the computation time. This result is

based on the experiments ran on the comparison be-

tween the original dataset and the one reduced with

the IG feature selection method.

The dataset used in this paper is an unbalanced dataset

where there is a huge difference between the number

of records of the normal class (majority class) and the

malicious class (minor class). In literature different

balancing approaches are known and, in the future,

these findings can be enriched with analysis on exper-

iments on the effect of balancing in the system perfor-

mance.

References:

[1] Feizollah, A., Anuar, N. B., Salleh, R. and Wahab,

A. W. A., A review on feature selection in mobile

malware detection, Digital investigation, Vol. 13,

2015, pp. 22–37.

[2] Faris, H., Hassonah, M. A., Ala’M, A.Z., Mir-

jalili, S. and Aljarah, I., A multi-verse optimizer

approach for feature selection and optimizing pa-

rameters based on a robust system architecture,

Neural Computing and Applications, Vol. 30, No.

8, 2018, pp. 2355–2369.

[3] Identity Breach Report 2019, [Online] Available:

https://4iq.com/2019-identity-breach-report/

(Last accessed September 15, 2021).

[4] Yeng, P., Nimbe, P., Weyori, B., Solvoll, T. and

Yang, B., Web Vulnerability Measures for SMEs,

NISK, Vol 12., 2019, pp. 1-16.

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2022.19.43

Nevila Baci, Kreshnik Vukatana, Marius Baci

E-ISSN: 2224-2899

479

Volume 19, 2022

[5] Kshetri, N., The Economics of Cyber-Insurance,

IT Professional, Vol. 20, No. 6, 2018, pp. 9-14.

[6] Ektefa, M., Memar, S., Sidi, F., and Affendey, L.

S., Intrusion Detection Using Data Mining Tech-

niques, In the proceedings of IEEE International

Conference on Information Retrieval & Knowl-

edge Management, Exploring Invisible World,

CAMP10, 2010, pp. 200-203.

[7] Regulation (EU) 2016/679 of the European Par-

liament and of the Council of 27 April 2016 on

the protection of natural persons with regard to

the processing of personal data and on the free

movement of such data, and repealing Directive

95/46/EC (General Data Protection Regulation),

Official Journal of the European Union Vol. 59:

L. 119, 2016, pp. 1-89.

[8] Kolias C., Kambourakis, G., Stavrou, A. and

Gritzalis, S., Intrusion Detection in 802.11 Net-

works: Empirical Evaluation of Threats and a

Public Dataset, IEEE Communications Surveys &

Tutorials, Vol. 18, 2016, pp. 184-208.

[9] Heady, R., Luger, G., Maccabe, A. and Servilla,

M., The architecture of a network level intrusion

detection system, 1990.

[10] Othman, S., Alsohybe, N., Ba-Alëi, F., Zahary,

A., Survey on Intrusion Detection Types, Inter-

national Journal of Cyber-Security and Digital

Forensics, Vol. 7, No. 4, 2018, pp. 444-462.

[11] Singh, A. P. and Singh, M., Analysis of Host-

Based and Network-Based Intrusion Detection

System, International Journal of Computer Net-

work and Information Security, Vol. 8, 2014, pp.

41-47.

[12] Modi C., Patel D., Borisaniya B., Patel H., Patel

A., and Rajarajan, M., A survey of intrusion de-

tection techniques in cloud, The Journal of Net-

work and Computer Applications, Vol. 36, No. 1,

2013, pp. 42–57.

[13] Butun, I., Morgera, SD. and Sankar, R., A survey

of intrusion detection systems in wireless sensor

networks. IEEE Communications Surveys & Tu-

torials, Vol. 16, No.1, 2014, pp. 266–282.

[14] Holmes, D.E. and Jain, L.C., Innovations in

Bayesian Networks. Studies in Computational In-

telligence, Vol 156, Springer, Berlin, Heidelberg,

2008.

[15] Cortes, C., Vapnik, V., Support-vector networks,

Machine Learning, Vol. 20, 1995, pp. 273–297.

[16] Platt, J., Sequential Minimal Optimization :

A Fast Algorithm for Training Support Vector

Machines, Microsoft Research Technical Report,

1998.

[17] Salzberg, S.L., C4.5: Programs for Machine

Learning by J. Ross Quinlan. Morgan Kaufmann

Publishers, Inc., 1993, Machine Learning, Vol.

16, 1994, pp. 235–240.

[18] Breiman, L., Bagging predictors, Machine

Learning, Vol. 24, 1996, pp. 123–140.

[19] Sokolova M., Japkowicz N., Szpakowicz S., Be-

yond Accuracy, F-Score and ROC: A Family of

Discriminant Measures for Performance Evalua-

tion. In: Sattar A., Kang B. (eds) AI 2006: Ad-

vances in Artificial Intelligence. AI 2006. Lecture

Notes in Computer Science, Vol. 4304, Springer,

Berlin, Heidelberg, 2006.

[20] Liu, H. and Motoda, H., Computational methods

of feature selection. Chapman & Hall/CRC Data

Mining and Knowledge Discovery Series, Taylor

& Francis, New York, 2007.

Creative Commons Attribution

License 4.0 (Attribution 4.0

International , CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/li-

censes/by/4.0/deed.en_US

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2022.19.43

Nevila Baci, Kreshnik Vukatana, Marius Baci

E-ISSN: 2224-2899

480

Volume 19, 2022