DDoS Attacks Classification using SVM
VANYA IVANOVA, TASHO TASHEV, IVO DRAGANOV
French Faculty of Electrical Engineering
Technical University of Sofia
8 Kliment Ohridski Blvd., 1756 Sofia
BULGARIA
Abstract: - In this paper two types of classifiers of Distributed Denial of Service (DDoS) attacks, based on
Support Vector Machines, are presented a binary and a multiclass one. They use numerical samples,
aggregated from packet switched network connections records, captured between attacking machines, most
typically IoT bots and a victim machine. Ten of the most popular DDoS attacks are studied and represented as
either 10- or 8-feature vectors. Detection rate and classification accuracy is being measured in both cases, along
with lots of other parameters, such as Precision, Recall, F1-measure, training and testing time, and others.
Variations with Linear, Polynomial, RBF and Sigmoid kernels are being tried with the SVM. The most accurate
turns out to be the RBF SVM, both as detector and multiclass classifier, achieving classification accuracy as
high as 0.9999 for some of the attacks. Testing times reveal the practical fitness of the implemented classifiers
for real-world application.
Key-Words: - distributed denial of service, network attack, Internet of Things, Support Vector Machine, kernel
function, optimized classifier
Received: March 2, 2021. Revised: December 3, 2021. Accepted: January 15, 2022. Published: February 9, 2022.
1 Introduction
DDoS attacks induce large financial losses [1] by
interrupting mass-type services, causing data loss
and sometimes ease the process of compromising
various Internet based machines, leading to data
theft and other malicious activities. One of the more
recent challenges, related to the prevention of DDoS
attacks, is connected to the Software Defined
Networks (SDN) and the ways of efficiently
limiting their scale extension, analyzing the
influence over the SDN controller [2]. Neural
networks are not considered practical solution for
this particular instance as Ye et al. [2] show, while
employing 6 component features from the flow of
the switch. They use them as training samples for
Support Vector Machines (SVM) and get 95.24%
accuracy of detecting DDoS attacks.
Detection and classification of DDoS attacks,
along with their further prediction, has been
proposed by Yusof et al. [3], using K-Nearest
Neighbor as a combination with SVM (KNN-SVM).
Good partitioning of the pre-attack data flow and the
actual influence over the network with a peak traffic
and other effects have been achieved. The proposed
approach is considered useful for intrusion detection
as well. The classification rate is in the order of
96.4% for the SVM and 96.6% for the KNN when
using the KDD99 dataset. In [4] Daneshgadeh et al.
combine Kernel Online Anomaly Detection, SVM
and principles from the Information Theory with the
aim of differentiating DDoS and Flash Events, the
latter being completely legitimate activities. The
authors also find their scheme useful for normal
traffic discrimination. Another study, performed by
Khuphiran et al. [5] gives a comparison between
Deep Feed Forward (DFF) and SVM models, which
are able of detecting DDoS attacks. Testing over
DARPA 2009 dataset with these models, produce an
accuracy of 99.63% for the DFF and 93.01% - for
the SVM. DFF appears to be around 1.28 times
faster than the SVM as per the training phase.
Despite this relation, the authors report that SVM is
faster during the testing phase and should be
preferred if the accuracy is not the primary issue.
More general approach is undertaken in [6],
where Ali et al. propose a framework, based on
machine learning, directed towards the prevention
of the DDoS attacks in SDN and in the same time it
is capable of reducing the dimensionality of
processed data, transferred in huge amounts. Thus, it
becomes possible to reduce the risk of launching a
spoof controller and changing the routing tables.
Principal Component Analysis (PCA) is also
implemented in the framework, along with SVM,
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
1
Volume 19, 2022
which goes as successful solution towards smaller
false positive rate, and the overall accuracy
increased. Another more complex approach, relying
on a hybrid algorithm for spotting DDoS attacks is
developed by Adhikary et al. [7]. It is specifically
oriented towards Vehicular Adhoc Networks
(VANETs). Kernel methods, based on AnovaDot
and RBFDot, lie at the base of a SVM for solving
that task. With regard to the normal traffic lots of
real-world acting factors, e.g. packets loss, jitter and
collisions, are being introduced to make the testbed
as close to real networks as possible when the time
comes to discriminate a DDoS attack from that
normal traffic. AnovaDot and RBFDot realizations
seem to be more efficient when combined than
when applied independently.
During the recent years, it was also
demonstrated that simpler realizations of SVM
classifiers, such as the linear l1 type, developed by
Nazih et al. [8], could be efficiently exploited for
discovering of attacks in Session Initiation Protocol
(SIP) Voice over Internet Protocol (VoIP) networks.
Denial of Service (DoS) and Spam over Internet
Telephony (SPIT) attacks have been successfully
discovered due to the introduction of n-gram string
features, projected in a space with high
dimensionality. Detection speed is higher for the l1-
SVM when discovering SPIT attacks, compared to
the combined classifier, comprised of Markov Chain
and SVM in the order of 20 times and in the same
time the F1 measure for the proposed classifier is
close to 100%, while the accuracy rate for the
combined model is 96.3%. The l1-SVM has almost
the same accuracy into discovering DoS attacks
with comparison to the Dual SVM and also
considering the Malformed Msgs when compared to
the SDP Parser.
Malware and spoofing discovery, along with
DDoS attacks, is an object of the study, presented in
[9] by Kajal and Nandal. Feature selection is done
with the use of Genetic Algorithm as a first step,
refining them later by Artificial Bee Colony and
Discrete Wavelet Transform algorithms. Combining
in a hybrid approach an Artificial Neural Network
(ANN) and SVM allows the more accurate detection
of malicious behaviors of separate nodes in the
communication network. The increase in precision
and recall while detecting DDoS attacks, compared
to earlier strategy of query expansion with
convolution kernels and dependency parses, is 0.112
and 0.049, respectively. The applicability of SVM
into discovering DDoS network attacks, consisting
primarily from HTTP (Hypertext Transfer Protocol)
flood and DDoS using SQL (Structured Query
Language) injection (SIDDoS) has been
investigated in [10]. Multiple classifiers for the
same task have been compared, such as Naïve
Bayes, Decision Trees and Multilayer Perceptron
(MPL). It appears that an Enhanced Multi Class
SVM (EMCSVM) could detect accurately enough
DDoS related events, while maintaining low false
alarm rate, when taking as an input 14 components
in a feature vector and trying to discriminate 10
kinds of attacks.
Based on all recent developments, as described
above, in the field of detection and classification of
DDoS attacks with the independent or combined use
of SVM, we seek to find within this study the most
optimal configuration of a single SVM, capable of
detecting the presence of at least 1 type of such
attack, discriminating it from normal traffic. Then,
in a second, extended configuration, an SVM that
could classify 1 of 10 kinds of attacks precisely
enough, given the contemporary efficiency of
comparable classifiers. Consideration has been
made on the kernel function and configuration
parameters while taking as an input 10- and 8-
elements (reduced set) features from a recent and
popular UNSW (University of New South Wales,
Canberra, Australia) IoT-based (Internet of Things)
DDoS attacks dataset [11].
In the Second part of the paper a brief
description has been given of the distribution of
features, depending on the present attack in the test
dataset, as well as the general mathematical
description of an SVM, using various kernel
functions and their related parameters, and general
optimization procedure to get the most efficient
configuration of a classifier of this type. In Section
3, experimental results are given from testing SVMs
as detectors and classifiers of DDoS attacks,
revealing the most optimal configuration, followed
by a discussion in Section 4. Finally, a conclusion is
presented in part 5 of the paper.
2 Classifier Structure
2.1 Dataset and Feature Distribution
The dataset [11], used in this study, which is freely
available and used by other authors in their research,
contains 2934817 training and 733705 test samples.
Each sample has 10 components, that are being
aggregated from IP network flows packets rate
between source and destination machine (srate), the
rate in opposite direction (drate), the estimated
variance by its square root of the number of
recorded connections (stddev), the consecutive
number of a captured sequence (seq), the minimal,
average and maximal period of exchange (min,
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
2
Volume 19, 2022
mean, max), identifier of a state for a feature
(state_number), and the number of connections that
get into the destination and the source machine
(N_IN_Conn_P_DstIP, N_IN_Conn_P_SrcIP). All
the feature components are represented as numerical
values and the target variables are 2 kinds a binary
one with value 0 (non-attack) and 1 (attack) for
testing the binary SVM classifier (attack detector);
and numerical one with values 0 (non-attack), 1
(DoS TCP attack), 2 (DoS UDP attack), 3 (DoS
HTTP attack), 4 (DDoS TCP attack), 5 (DDoS UDP
attack), 6 (DDoS HTTP attack), 7 (Keylogging), 8
(Data Exfiltration), 9 (OS Fingerprint), 10 (Service
Scan) for testing the SVM with multiclass outputs.
Within the training set the number of non-attack
records is 370, and in the test set 107.
All the records are gathered from internal
network setup with 4 simulated IoT devices, acting
as bots and generating malicious traffic,
corresponding to the attacks, described above. For
this purpose the Kali Linux is used on conventional
machines, a workstation with Windows 7, Mobile
station and a Server under the control of Ubuntu
operating systems play the role of attacked
machines. The information of established
connections is recorded by a separate monitoring
station, connected to the switch, through which all
other machines are communicating.
Ranking of the features based on their
informative significance, related to the various
classes separability, is being performed with the use
of the χ2 parameter, according to [14]:
󰇛󰇜
 , (1)
where Df is the number of the degrees of freedom, N
the sample size, Oi observed values, Ei expected
values for i = 1, 2, …, N. The number of degrees of
freedom is associated with the number of independent
values by any logical connection, that is they very
independently one from another. Most often, it is true
that [14]:
, (2)
where 1 represents the number of constraints, being
introduced independently when gathering all the
samples during testing. In that case, the random
variable χ will correspond to χ2-distribution and it
will be true that it is equal to the superposition of a
number of variables, e.g. M, following a normal
distribution [14]:
 . (3)
The results are given in Table 1.
Table 1. χ2 values for all features
mean
1357708.10
drate
srate
1324153.22
min
max
1168444.70
N_IN_Conn
_P_DstIP
state_number
1121064.98
seq
stddev
666877.36
N_IN_Conn
_P_DstIP
The probability distribution each feature to be
connected to a particular attack, being present with a
given value, is shown in Fig. 1. The first 6 cases
(Fig. 1 a-f) clearly indicate relatively good
separability among the attacks over the range of
these features state_number, srate, max, mean,
stddev and min. For the N_IN_Conn_P_DstIP the
1st, 2nd and 10th attack are covered altogether with
their distributions almost over the entire range of
this feature, and also significant portions of that
range is covered of the distributions for other
attacks. For drate, the 9th and 10th attack are
somewhat hard to distinct. And then, for the seq and
N_IN_Conn_P_SrcIP the distribution by attack are
almost identical between these 2 features, but also
overlapped for each one of them within the typical
range of their change. Given the almost 59 times
difference in the χ2 parameter between the most
informative mean feature and N_IN_Conn_P_DstIP,
it is reasonable to try a classification without that
feature and also the seq feature, since their
distributions are very similar. In other words, in our
experimentation we perform classification tests once
with the full set of 10 features, and a second time
with only 8 features in order to evaluate the
efficiency of this reduction by classification
accuracy and execution time.
a state_number b srate (x1e+06)
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
3
Volume 19, 2022
c max d mean
e stddev f min
g N_IN_Conn_P_DstIP h drate (x1e+03)
i seq j N_IN_Conn_P_SrcIP
k color legend of the attacks
Fig. 1: Probability of different kinds of attacks,
given particular value of a feature
2.2 SVM Description
SVM is supervised training algorithm [12], which
for the purposes of the current study produce a
model, capable of discriminating samples from
malicious network activities from those of the
normal traffic, and also as a separate model
classifying various types of DDoS attacks. In the
first case, which could also be viewed as binary
classification problem, the following simplified
graphical representation (Fig. 2) may be used.
A hyperplane needs to be found, which will
separate in the most discriminating fashion the
clusters of samples, corresponding to the different
classes, in this instance the attack samples and
those, calculated from the normal traffic (non-attack
samples). All samples, which are located in minimal
distance to the hyperplane, are known as support
vectors, while the distance itself is called a margin.
The very idea of the algorithm is to find a
hyperplane, which maximizes the margin.
Fig 2: SVM operational principle
In N dimensions, the hyperplane equation could
be expressed as [12]:
 , (4)
where wi = (w0i, w1i, …, wNi)T is a vector of weights, b
is a bias, equal to w0, and Xi = (x1i, x2i, …, xNi) is a
vector, representing an input sample to be
discriminated from the samples, belonging to the other
cluster. If the following condition is met [12]:
󰇛 󰇜 , (5)
then Xi would be associated with the correct class. If
all vector points from both classes are linearly
separable, then the hyperplane, satisfying the above
relations will completely separate the classes, but if a
new point comes to a class, and falls on the other side
of the hyperplane it will be incorrectly classified.
This is known as SVM with hard margin.
In order to overcome the limitations of the strict
rule from above, a slack variable ξ is put in (5) [12]:
󰇛 󰇜 , (6)
and there will be correct classification only when ξi =
0. For every case ξi > 0, ξi represents the error of
classification, which in average after all classifications
will be [12]:
 . (7)
Then, naturally the following objective function
emerges [12]:


 , (8)
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
4
Volume 19, 2022
which must hold true, when (6) is satisfied for all i =
1, 2, …, n, and all input samples are correctly
classified. This is known as SVM with soft margin.
The loss function is zero, given Zi = yi(wTXi + b) 1,
and increasing with the ever stronger condition Zi < 1
[12]. So the loss could be derived from max(0, 1-Zi).
Using Lagrange multiplier it becomes possible
project data from low-dimensional space to higher
number of dimensions in order to get better
separability of samples from different classes, which
is known as the SVM dual form [12]:



 , (9)
which should be satisfied, given αi 0 for i = 1, 2, …,
n, while
 . Every Xi represents a support
vector when αi 0 and not being such, given αi = 0.
The dependence of finding the solution for the dual
form of SVM falls over α, since intermediate results
depend on the scalar product of vector pairs, including
the bias b. An ease into the process of finding the dot
products is the introduction of kernel and perform all
calculations in another space, with higher number of
dimensions than the initial one [12]:



 (10)
for 0 αi C, when i = 1, 2, …, n and
 .
In our study we use 4 different kernel functions. A
linear one, defined by [12]:
󰇛 󰇜
, (11)
where K is the kernel function, and X1 and X2 are
input vectors.
A polynomial kernel could be expressed as [12]:
󰇛 󰇜󰇛
󰇜, (12)
where d is degree of the kernel, c a constant, and g
coefficient of proportion (gamma).
A Gaussian Radial Basis Function (RBF) as a kernel
is another option, which could be represented as [12]:
󰇛 󰇜 󰇛 󰇜, (13)
where |X1 X2| is the Euclidean distance between the
vectors X1 and X2. With the change of g from small to
large values the general observed effect is that the
classifier goes from underfitting to overfitting, passing
through some optimal configuration
The sigmoidal function as a kernel corresponds to the
following equation [12]:
󰇛 󰇜 󰇛
󰇜. (14)
2.3 SVM Optimization Procedure
During experimental testing the following
evaluation parameters are found from both the
validation process over the full training set and
classification over the full test set:
AUC the Area under ROC the Receiver
Operating Curve;
CA Classification Accuracy the ratio of
correctly marked samples with regard to all
input samples;
Precision the part of the actual truly
classified instances among all marked as
positive instances;
Recall the part of the actual truly
classified instances with regard to the whole
number of positive instances in the dataset
of the same type (class);
F1 harmonic mean, taking as input the
precision and recall parameters;
Specificity the part of the actual marked
true negatives, related to all negative
samples from the input;
LogLoss it is the loss value, for which the
cross-entropy function is used, and it
accounts for the uncertainty level of the
prediction being made (to which class a test
sample belongs); it depends on the variation
degree from the actual class.
Train Time the full time needed to train
the classifier;
Test Time the full time, necessary to make
classification over the testset after the
classifier has been completely trained.
Confusion Matrix a square matrix
representation of the predicted samples by
class (as rows) and the actual class of each
sample (as columns).
The general optimization procedure that we
propose here is presented in Fig. 3. According to the
recommendations, given in [13], the initial value for
the Cost parameter of the SVM is C = 1, the
constant term c = 1, the degree d = 3, the Numerical
Tolerance NT = 10-3, the Iteration Limit IL = 105.
One of the major effects from the training is finding
the optimal value for g. As empirical rule [13], the
initial value for it may be set to 1/k, where k is the
number of components of the feature vectors, that is
0.1 in our case. The following categorical variable is
introduced, denoting the type of the SVM kernel t
= 1 for Linear, t = 2 for Polynomial, t = 3 for
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
5
Volume 19, 2022
RBF, and t = 4 for Sigmoid. The current iteration
during training is denoted with il.
Fig. 3: Optimization procedure for the SVM
3 Experimental Results
Experimental results are gathered using the IBM PC
compatible computer with Intel Xeon E5-1620
processor, comprising of 4 cores operating in hyper
threading mode at 3.5 GHz. The amount of cache at
first level is 256 kB, 1 MB at the second and 10
MB at the third one, while the size of the RAM is
64 GB. Testing is being performed within the
Orange v. 3.28 application for machine learning
under the control of Microsoft Windows 10
Professional operating system.
Processing times for the training and testing
phase of the SVM, which performs simple
discrimination of the traffic to normal (Class 0) and
attack (Class 1), are given in Table 2.
The time periods from Table 2 correspond to
maximum number of 100 000 training iterations, set
in advance as a limit. In an additional experiment
SVM for binary discrimination is tested at 1
000 000 maximum iterations only for the RBF
kernel. It turns out that the training time is 62 620 s,
the validation time 115 s, and the testing time 31
s, using 10 features. Obviously, the criteria for
terminating the learning process is met long before
reaching the iterations number limit, almost equal to
100 000 iterations.
Table 2. SVM processing times for detecting
malicious network activities vs. normal traffic
SVM type
Features
number
Training
time, s
Validation
time, s
Testing
time, s
Linear
8
8 776
68
15
10
11 760
134
34
Polynomial
8
21 341
85
26
10
12 677
143
36
RBF
8
76 539
228
57
10
71 458
272
70
Sigmoid
8
137 980
218
54
10
23 425
145
32
The processing times when classifying multiple
attacks with a set limit of 1 000 000 iterations for
the SVM, using RBF, which turns out to be the best
option among the tested kernels, as the experimental
results from below demonstrate, are given in Table
3.
Table 3: SVM processing times for classifying various attacks
SVM
type
Features
number
Training
time, s
Validation
time, s
Testing
time, s
RBF
8
1124612
55 543
32 558
10
508 111
133 136
33 237
The detection rate of attacks vs. non-attacks,
measured both during the testing and training phase
are given in Table 4.
Table 4. Detected attacks as proportion of the actual
attacks
SVM
type
Features
number
Non-attacks, %
Attacks, %
Train
Test
Train
Test
Linear
8
2.7
3.7
100.0
100.0
10
2.7
3.7
100.0
100.0
Polyno-
mial
8
10.5
8.4
100.0
100.0
10
33.8
29.9
100.0
100.0
RBF
8
11.9
11.2
100.0
100.0
10
35.7
33.6
100.0
100.0
Sigmoid
8
0.8
0.9
100.0
100.0
10
2.2
1.9
100.0
100.0
All evaluation parameters of the detection
efficiency, using all 10 features on the train set as a
full validation for the 4 kernels (Fcn.) of the SVM,
are visible in Table 5. Class (Cl.) 0 corresponds to
non-attack, and 1 to attack.
Start
f=8
C=1, c=1, d=3,
NT=10-3, IL=105
T=1, il=1
Train SVM
Input Trainset Trf
Input Testset Tff
f={8, 10}
Ccur>=C?
il = il + 1
il>=IL?
Train Confusion Matrix
Test SVM
Test Confusion Matrix
f==8?
f=f+2
topt = max{Scoret}
End
Yes
No
Yes
No
Yes
No
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
6
Volume 19, 2022
Table 5. Attack detection efficiency on the train set,
using 10 features
Cl.
Fnc
AUC
CA
F1
Pre-
cision
Recall
Log-
loss
Specifi-
city
0
Lin.
0.9866
0.9998
0.0526
1.0
0.0270
0.0010
1.0
Pol.
0.9977
0.9999
0.4882
0.8802
0.3378
0.0020
0.9999
Rbf
0.8322
0.9999
0.5217
0.9705
0.3567
0.0010
0.9999
Sig.
0.6936
0.9998
0.0295
0.0465
0.0216
0.0022
0.9999
1
Lin.
0.9866
0.9998
0.9999
0.9998
1.0
0.0010
0.0270
Pol.
0.9977
0.9999
0.9999
0.9999
0.9999
0.0020
0.3378
Rbf
0.8322
0.9999
0.9999
0.9999
0.9999
0.0010
0.3567
Sig.
0.6936
0.9998
0.9999
0.9998
0.9999
0.0022
0.0216
The detection efficiency results for 10 features
when working with the test set are given in Table 6.
Table 6. Attack detection efficiency on the test set,
using 10 features
Cl.
Fnc
AUC
CA
F1
Pre-
cision
Recall
Log-
loss
Specifi-
city
0
Lin.
0.9809
0.9998
0.0720
1.0
0.0373
0.0017
1.0
Pol.
0.9972
0.9998
0.4475
0.8888
0.2990
0.0024
0.9999
Rbf
0.8121
0.9999
0.5
0.9729
0.3364
0.0013
0.9999
Sig.
0.6973
0.9997
0.0246
0.0363
0.0186
0.0024
0.9999
1
Lin.
0.9809
0.9998
0.9999
0.9998
1.0
0.0017
0.0373
Pol.
0.9972
0.9998
0.9999
0.9998
0.9999
0.0024
0.2990
Rbf
0.8121
0.9999
0.9999
0.9999
0.9999
0.0013
0.3364
Sig.
0.6973
0.9997
0.9998
0.9998
0.9999
0.0024
0.0186
Complete validation over the train set for the
detection capabilities of the SVM, using 8 features,
leads to the results from Table 7.
Table 7. Attack detection efficiency on the train set,
using 8 features
Cl.
Fnc
AUC
CA
F1
Pre-
cision
Recall
Log-
loss
Specifi-
city
0
Lin.
0.9846
0.9998
0.0526
1.0
0.0270
0.0017
1.0
Pol.
0.9250
0.9998
0.1884
0.8863
0.1054
0.0021
0.9999
Rbf
0.7235
0.9998
0.2110
0.9361
0.1189
0.0011
0.9999
Sig.
0.6076
0.9997
0.0097
0.0122
0.0081
0.0020
0.9999
1
Lin.
0.9846
0.9998
0.9999
0.9998
1.0
0.0017
0.0270
Pol.
0.9250
0.9998
0.9999
0.9998
0.9999
0.0021
0.1054
Rbf
0.7235
0.9998
0.9999
0.9998
0.9999
0.0011
0.1189
Sig.
0.6076
0.9997
0.9998
0.9998
0.9999
0.0020
0.0081
The unknown samples from the test set cause
the detection results, again for 8 features, shown in
Table 8.
Table 8. Attack detection efficiency on the test set,
using 8 features
Cl.
Fnc
AUC
CA
F1
Pre-
cision
Recall
Log-
loss
Specifi-
city
0
Lin.
0.9781
0.9998
0.0720
1.0
0.0373
0.0019
1.0
Pol.
0.9157
0.9998
0.1525
0.8181
0.0841
0.0024
0.9999
Rbf
0.6884
0.9998
0.2016
1.0
0.1121
0.0014
1.0
Sig.
0.6109
0.9997
0.0107
0.0126
0.0093
0.0023
0.9998
1
Lin.
0.9781
0.9998
0.9999
0.9998
1.0
0.0019
0.0373
Pol.
0.9157
0.9998
0.9999
0.9998
0.9999
0.0024
0.0841
Rbf
0.6884
0.9998
0.9999
0.9998
1.0
0.0014
0.1121
Sig.
0.6109
0.9997
0.9998
0.9998
0.9998
0.0023
0.0093
When working with the SVM as a classifier of
multiple attacks, using only the RBF kernel, the
relative amount of correctly discovered attack
instances is given in Table 9.
Table 9. Correctly classified attacks as portion of
the actual attacks in %
Attack
Train set
8 feats.
Test set
8 feats.
Train set
10 feats.
Test set
10 feats.
0
27.0
26.2
66.5
66.4
1
71.6
71.7
96.7
96.6
2
98.9
98.8
99.0
99.0
3
8.8
10.0
25.8
26.6
4
94.0
93.9
91.7
91.7
5
95.0
95.0
97.1
97.1
6
0.0
0.0
4.2
2.0
7
22.0
35.7
18.6
35.7
8
0.0
0.0
0.0
0.0
9
4.7
5.1
14.6
15.5
10
96.3
96.2
95.5
95.6
The RBF kernel is used for that purpose,
because it yields the highest detection rate with
comparison to the other 3 types of kernels, as shown
in Tables 4-8.
The confusion matrices from classifying attacks
over the test set at 10 and 8 features are depicted in
Fig. 4.
a
b
c
Fig. 4: Confusion matrix from classification if the
test set at a 10 features, b 8 features, c- color
legend of the classes
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
7
Volume 19, 2022
Classification efficiency at 10 features is shown
in Table 10 for the train set.
Table 10. Classification efficiency of SVM over all
types of attacks at 10 features after full validation
over the training set
Cl.
AUC
CA
F1
Pre-
cision
Recall
Log-
loss
Specifi-
city
0
0.9992
0.9999
0.6852
0.7068
0.6648
0.0003
0.9999
1
0.9949
0.9739
0.9255
0.8879
0.9666
0.0634
0.9753
2
0.9988
0.9895
0.9816
0.9735
0.9898
0.0362
0.9894
3
0.9993
0.9996
0.4002
0.8970
0.2576
0.0009
0.9999
4
0.9959
0.9716
0.9451
0.9747
0.9173
0.0690
0.9913
5
0.9981
0.9895
0.9796
0.9887
0.9706
0.0375
0.9961
6
0.9994
0.9997
0.0805
1.0
0.0419
0.0006
1.0
7
0.9999
0.9999
0.2972
0.7333
0.1864
4.66e-5
0.9999
8
0.9999
0.9999
N/A
N/A
N/A
2.45e-5
1.0
9
0.9942
0.9954
0.2381
0.6502
0.1458
0.0115
0.9996
10
0.9986
0.9930
0.8456
0.7585
0.9553
0.0142
0.9938
Applying the test set as input to the RBF SVM
classifier at 10 features, we get the results from
Table 11.
Table 11. Classification efficiency of SVM over all
types of attacks at 10 features after processing the
test set
Cl.
AUC
CA
F1
Pre-
cision
Recall
Log-
loss
Specifi-
city
0
0.9992
0.9999
0.6826
0.7029
0.6635
0.0003
0.9999
1
0.9949
0.9737
0.9252
0.8876
0.9661
0.0637
0.9753
2
0.9988
0.9895
0.9816
0.9736
0.9898
0.0363
0.9894
3
0.9975
0.9996
0.4123
0.9195
0.2657
0.0009
0.9999
4
0.9958
0.9715
0.9448
0.9744
0.9170
0.0692
0.9912
5
0.9981
0.9895
0.9797
0.9887
0.9708
0.0375
0.9961
6
0.9994
0.9997
0.0386
1.0
0.0197
0.0006
1.0
7
0.9999
0.9999
0.5263
1.0
0.3571
3.84e-5
1.0
8
0.9936
0.9954
0.2511
0.6631
0.1549
0.0115
0.9996
9
0.9987
0.9930
0.8458
0.7582
0.9564
0.0140
0.9938
10
0.9986
0.9930
0.8456
0.7585
0.9553
0.0142
0.9938
Using just only the 8 features cause the RBF
SVM to produce a classification result over the train
set, as shown in Table 12.
Table 12. Classification efficiency of SVM over all
types of attacks at 8 features after full validation
over the training set
Cl.
AUC
CA
F1
Pre-
cision
Recall
Log-
loss
Specifi-
city
0
0.9853
0.9998
0.3816
0.6493
0.2702
0.0005
0.9999
1
0.9852
0.9386
0.7966
0.8971
0.7163
0.1269
0.9834
2
0.9970
0.9839
0.9719
0.9558
0.9885
0.0564
0.9820
3
0.9677
0.9996
0.1569
0.7375
0.0878
0.0021
0.9999
4
0.9887
0.9366
0.8877
0.8412
0.9395
0.1316
0.9355
5
0.9965
0.9839
0.9682
0.9870
0.9502
0.0556
0.9956
6
0.9523
0.9997
N/A
N/A
N/A
0.0014
1.0
7
0.9884
0.9999
0.3421
0.7647
0.2203
0.0001
0.9999
8
0.9909
0.9999
N/A
N/A
N/A
5.43e-5
1.0
9
0.9882
0.9952
0.0894
0.7784
0.0474
0.0138
0.9999
10
0.9960
0.9905
0.8022
0.6874
0.9630
0.0173
0.9910
The test set in the same time at 8 features has
been classified to an extent, represented by the
parameters from Table 13.
Table 13. Classification efficiency of SVM over all
types of attacks at 8 features after processing the test
set
Cl.
AUC
CA
F1
Pre-
cision
Recall
Log-
loss
Specifi-
city
0
0.9857
0.9998
0.3708
0.6363
0.2616
0.0006
0.9999
1
0.9851
0.9385
0.7966
0.8964
0.7167
0.1271
0.9832
2
0.9968
0.9838
0.9718
0.9558
0.9883
0.0574
0.9821
3
0.9707
0.9996
0.1780
0.8333
0.0996
0.0020
0.9999
4
0.9886
0.9365
0.8873
0.8410
0.9391
0.1318
0.9356
5
0.9965
0.9838
0.9682
0.9868
0.9503
0.0562
0.9955
6
0.9512
0.9997
N/A
N/A
N/A
0.0015
1.0
7
0.9878
0.9999
0.5263
1.0
0.3571
0.0001
1.0
8
0.9882
0.9952
0.0950
0.7922
0.0505
0.0138
0.9999
9
0.9960
0.9905
0.8008
0.6858
0.9622
0.0173
0.9910
10
0.9960
0.9905
0.8022
0.6874
0.9630
0.0173
0.9910
4 Discussion
The fastest SVM detector of DDoS attacks in
terms of learning time is the one, using a Linear
kernel, followed by that with Polynomial
kernel, 1.08 times at 10 features (Table 2). Then
follows the classifier with Sigmoid kernel,
around 2 times slower than the linear one and
the last is the implementation with the RBF
kernel around 6 times slower. It is worth
noting that for all detectors, apart the linear one,
the training time at 8 features is longer than that
at 10 features the reduction of the
information, included in the training set affects
the number of iterations until reaching the target
value of the cost function or until using the
whole designated learning period value. Similar
are the relations of the time periods during the
validation phase. The RBF based SVM is the
slowest one with almost 2 times longer
classification process than the fastest Linear
SVM. This tendency is preserved at the testing
phase as well. With the exception of the
Sigmoid SVM, for all other implementations it
takes less time to classify at 8 features, than at
10, e.g. more than twice a difference for the
Linear SVM.
The most accurate SVM detector of DDoS
attacks is the one with RBF kernel, achieving 35.7
% at 10 features detection rate for the non-attack
samples (Table 4) during validation and 33.6%
during testing. More than 3 times lower is the
detection rate for the same samples when applying
only 8 features. All attack samples in the same time
are being completely correctly discriminated during
both phases. The latter is also true for the rest of the
detectors with the Sigmoid SVM detecting just 0.8
% of the attack samples during training and 0.9% -
during testing. Only the Linear SVM yields identical
detection rate when operating over 8 or 10 features,
which means that if is being applied in practice, it
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
8
Volume 19, 2022
would be preferable to work with 8 features, due to
the twice faster classification. These results led to
the conclusion that further testing with classifiers,
capable of discriminating all 10 types of attacks
should be done with a SVM, using RBF kernel. All
observations from above are also supported by the
evaluating parameters from Tables 5-8.
Classifying all 10 DDoS attacks (indexes from 1
to 10) and recognizing non-attack samples in the
same time (index 0) took the RBF SVM almost
twice longer, using 8 features rather than 10, during
training (Table 3). In the same time, 8 feature lead
to more than twice faster validation with
comparison to the 10 feature implementation. The
most accurately spotted attack is DoS UDP flood
(Table 9), followed by the DDoS UDP flood, DoS
TCP flood, Service Scan, and so on. The most hard
to discover attacks are the Data Exfiltration and
DDoS HTTP flood. The most probable reason for
this is the considerably smaller number of samples
for these attacks, present in the training set,
compared to the amount of samples for the rest
types of attacks. Nevertheless, the proportion of data
exchanged during the various tested attacks
corresponds to real-world scenarios and the
observed dependency should be taken as inherited
peculiarity of the single SVM classifier itself.
Obviously, to get as close as possible detection rate
for these rarely spot types of attacks, one possible
direction for future work it would be to construct a
cascade of classifiers. The variation between the
number of discovered attacks between the phases of
training and testing is negligible. When using 8
features, differences in detection accuracy for some
of the attacks, compared to that for 10 features, goes
as high as 3 times, as it is in the case of OS
Fingerprint, or around 40% for the non-attack
samples (Table 9).
The most mismatched non-attack samples, using
10 features (Fig. 4 a), are recognized as DoS TCP
flood (25.2%), the DoS TCP attack with DDoS
TCP flood (3.0%), DoS UDP flood with DDoS
UDP flood (1.0%), DoS HTTP flood with DDoS
TCP flood (45.2%), which is with 81.6% higher
than the correctly found samples, DDoS TCP flood
with DoS TCP flood (7.5%), DDoS UDP flood
with DoS UDP flood (2.9%), DDoS HTTP flood
with DDoS TCP flood (62.1%), close to 60% higher
than the number of the correctly recognized samples
for this particular attack, Keylogging with Service
Scan (57.1%), OS Fingerprint with Service Scan
(71.2%), again serious mismatch rate, and Service
Scan with DDoS TCP (2.0%). All this ratios could
be observed from the confusion matrix after
classification over the test set, representing the
proportion of the classified samples by attack from
the actual number of samples for the same attack, as
shown in Fig. 4 for 10 features in Fig. 4 a and for
8 features in Fig. 4. b. Using 8 features, lead to
increase of the proportion of mismatches with closes
incorrect type of attack, as follows: twice for non-
attack samples, 9 times for the DoS TCP flood, 1.2
times for the DoS UDP flood, 1.08 times for the
DoS HTTP flood, 1.5 times decrease for the DDoS
TCP flood, 1.7 times for the DDoS UDP flood, 1.2
times decrease for the DDoS HTTP flood, 57.1%
decrease for the Keylogging, and 1.7 times decrease
for the Service Scan (Fig. 4 b). Apart from
worsening of the classification accuracy for some of
the attacks, such as the DoS TCP flood or the non-
attack samples, there is also a positive trend for
other types of attacks, such as the Keylogging.
Decrease of the information redundancy in the
training set at 8 features, compared to 10, obviously
preserves better some of the relations for attacks,
which have smaller intensity as per the exchanged
data over the network, such as the Keylogging. It
would be practical to use this feature set, although
considerably more inaccurate for attacks with high
intensity of the generated traffic, for some more rare
activities, when specifically searching for them in a
monitored network. All these results are also
supported by the evaluating parameters, shown in
Tables 10-13. For some of the attacks with really
small number of instances in the training and the
testing set, some of the parameters are hard to
calculate, as the denominator of the equations for
them, tends to be very small, almost equal to 0, so
they are marked I the tables with N/A.
At the end of the discussion section, we make a
comparison with another implementation of a binary
SVM classifier (detector) of DDoS attacks,
proposed by other authors in [11]. It is tested over
the same dataset with the same 10 features as in this
study and it has the cost parameter being put to C =
1, using a Linear kernel, and having a training time
limit of 100 000 iterations. The confusion matrices
for this classifier and the best of our binary SVM
classifiers (10 features, RBF kernel, 100 000
iterations limit, C = 1) are shown in Table 14 as
proportion of the detected samples to all actual of
that type ones, given in %.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
9
Volume 19, 2022
Table 14. Confusion matrices of compared SVM
detectors
Ours, in %
Proposed in [11], in %
Attack
0
1
Attack
0
1
0
33.6
66.7
0
100.0
0
1
0.0
100.0
1
11.63
88.37
One of the advantages of the SVM detector,
proposed in [11], is the higher detection rate of non-
attack samples, practically 100%. On the other hand,
our implementation achieves 100% correct
identification of all attacks, while the SVM from
[11] achieves 88.37% correct classifications. Other
evaluating parameters, denoting the detection
efficiency of both classifiers, are presented in Table
15.
Table 15. Evaluation parameters of compared SVM
detectors
Parameter
Ours
Proposed in [11]
Accuracy
0.9999
0.8837
Precision
0.9998
1
Recall
0.9999
0.8837
F1-measure
0.9998
0
Taken on average among both classes 0 (non-
attack) and 1 (attack), given also the size of the test
set of close to a million samples, our
implementation of a SVM detector show better
performance with the exception of the Precision
parameter, which is 0.0002 smaller, but this is a
difference, which makes both classifiers by this
criterion relatively equal. Nonetheless, more work is
needed to make the RBF SVM detector more
efficient in terms of detecting non-attack samples
better. One of the directions for future work is to try
a combined type of a classifier, which could achieve
better accuracy. Another possible path for further
research is the introduction of a sampling, which
will increase the number of non-attack samples
artificially an interpolation, which will lead to
equalization of the number of normal traffic samples
and those from DDoS attacks (floods), which
always prevail the normal ones and create a
disbalance, into which the non-attack samples is
harder to find.
5 Conclusion
In this paper two types of SVM classifiers for DDoS
attacks are presented binary and multiclass ones,
covering 10 type of the most popular malicious
activities, carried out often with the help of IoT
botnets. The most accurate detector is using a RBF
kernel, which is also thought as the most proper
solution for the multiclass classification. The fastest
SVM classifier is the Linear one, followed by the
Polynomial, the Sigmoid and the RBF at the end,
which holds true for both most of the cases of
training and testing. Training with 8 features turns
out slower in most of the cases than with 10
features, but testing may be significantly faster with
8 features for some of the kernels, used in SVM. In
the multiclass classification, the 10-feature set leads
to observable higher accuracy, than the 8-feature
set. This effect is less expressed in the binary
classification, but still holds true as a general trend.
Given achieved accuracies, both the binary and
multiclass SVMs, presented in this study in thier
optimal configurations, are thought to be applicable
in real-world monitoring systems against DDoS
attacks. Still, more work is needed, especially into
increasing the accuracy of the binary SVM in to
discovering non-attack samples over the background
of ongoing DDoS attack with its characteristic high
intensity. Various strategies could be applied, which
include cascade of classifiers, intelligent sampling
of the training set and others. All these will be
considered during the future work on the problem.
References:
[1] Behal, S., Kumar, K. Trends in Validation of
DDoS Research. Procedia Computer Science,
Vol. 85, 2016, pp. 7-15.
[2] Ye, J., Cheng, X., Zhu, J., Feng, L., Song, L. A.
DDoS Attack Detection Method based on SVM
in Software Defined Network. Security and
Communication Networks, Vol. 2018, 2018,
Article ID 9804061.
[3] Yusof, A. R. A., Udzir, N. I., Selamat, A. An
Evaluation on KNN-SVM Algorithm for
Detection and Prediction of DDoS Attack. In
International Conference on Industrial,
Engineering and Other Applications of Applied
Intelligent Systems, Springer, Cham, August
2016, pp. 95-102.
[4] Daneshgadeh, S., Kemmerich, T., Ahmed, T.,
Baykal, N. An Empirical Investigation of
DDoS and Flash Event Detection using
Shannon Entropy, KOAD and SVM combined.
In 2019 International Conference on
Computing, Networking and Communications
(ICNC), IEEE, 2019, pp. 658-662
[5] Khuphiran, P., Leelaprute, P., Uthayopas, P.,
Ichikawa, K., Watanakeesuntorn, W.
Performance Comparison of Machine Learning
Models for DDoS Attacks Detection. In 2018
22nd International Computer Science and
Engineering Conference (ICSEC), IEEE, ,
November 2018, pp. 1-4.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
10
Volume 19, 2022
[6] Ali, J., Roh, B. H., Lee, B., Oh, J., Adil, M. A
Machine Learning Framework for Prevention
of Software-Defined Networking controller
from DDoS Attacks and Dimensionality
Reduction of Big Data. In 2020 International
Conference on Information and
Communication Technology Convergence
(ICTC), IEEE, October 2020, pp. 515-519.
[7] Adhikary, K., Bhushan, S., Kumar, S., Dutta,
K. Hybrid Algorithm to Detect DDoS Attacks
in VANETs. Wireless Personal
Communications, Vol. 114, No. 4, 2020, pp.
3613-3634.
[8] Nazih, W., Hifny, Y., Elkilani, W., Abdelkader,
T., Faheem, H. Efficient Detection of Attacks
in SIP based VoIP Networks using Linear l1-
SVM Classifier. International Journal of
Computers Communications & Control, Vol.
14, No. 4, 2019, pp. 518-529.
[9] Kajal, A., Nandal, S. K. ABC-ANN-SVM
Hybrid Approach to Enhance Cyber Security
against Malware, DDoS Attacks. Journal of
Critical Reviews, Vol. 7, Issue 19, 2020, pp.
4557-4570.
[10] Arshi, M., Nasreen, M. D., Madhavi, K. A
Survey of DDOS Attacks using Machine
Learning Techniques. In E3S Web of
Conferences, Vol. 184, 2020, p. 01052.
[11] Koroniotis, N., Moustafa, N., Sitnikova, E.,
Turnbull, B., Towards the Development of
Realistic Botnet Dataset in the Internet of
Things for Network Forensic Analytics: Bot-
IoT dataset. Future Generation Computer
Systems, Vol. 100, November 2019, pp. 779-
796.
[12] Wilmott, P., Machine Learning: An Applied
Mathematics Introduction, Panda Ohana
Publishing, 2019.
[13] SVM, Orang Visual Programming, Orange
Data Mining,
https://orange3.readthedocs.io/projects/orange-
visual-
programming/en/latest/widgets/model/svm.htm
l, last accessed on August 4th, 2021.
[14] Nikulin, M. S., Chimitova, E. V. Chi-squared
Goodness-of-fit Tests for Censored Data,
Wiley, 2017.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2022.19.1
Vanya Ivanova,
Tasho Tashev, Ivo Draganov
E-ISSN: 2224-3402
11
Volume 19, 2022