Normalization Technique and Weight Adjustment Analysis for
Keystroke Vector Dissimilarity Authentication
TANAPAT ANUSAS-AMORNKUL*, NAPHAT BUSSABONG
Department of Computer and Information Science,
King Mongkut’s University of Technology North Bangkok,
1518 Pracharat 1 Road, Wongsawang, Bangsue, Bangkok 10800,
THAILAND
*Corresponding Author
Abstract: - A keystroke dynamics authentication uses keystroke rhythm for each user on a keyboard to verify a
real user. The idea is that each user has a unique keystroke rhythm such that it can be determined the identity of
a user. To verify a user, a keystroke vector dissimilarity technique was proposed to use keystroke features as a
vector and calculate a weight using SoftMax+1 to overcome the Euclidean distance problem. However, the
weight has yet to be analyzed in detail. Therefore, this paper aims to find a normalization technique and a
weight adjustment to enhance the accuracy of the keystroke vector dissimilarity technique. The normalization
techniques and activation functions analyzed in this study are Euclidean norm, Mean normalization, Min-max
normalization, Z-score normalization, SoftMax function, and ReLU function. The weight adjustment varies
from w+1000 to 1000-w. The results show that the Mean and Min-max normalizations with 10-w as a weight
gave the same results at 96.97% accuracy and 3.03% error, which are better than the previous work.
Key-Words: - Keystroke rhythm, keystroke dynamics authentication, keystroke vector dissimilarity, Euclidean
distance, normalization technique analysis, weight adjustment analysis.
Received: March 8, 2024. Revised: July 2, 2024. Accepted: August 3, 2024. Published: September 23, 2024.
1 Introduction
A keystroke dynamics authentication is a biometric
authentication using keystroke rhythm on a
keyboard to verify a real user. Each user has their
keystroke rhythm or keystroke dynamics, which can
classify a real user from a fake user. There are many
techniques to verify a keystroke rhythm. One
example is to use a statistical method using a
Euclidean distance, which was used to measure the
keystroke distance between a real user and a fake
user, [1]. However, the Euclidean distance does not
consider a sequence of input keys such that a fake
user can be a real user even if the keystroke is
different. For example, a fake user may type faster
than a real user in the beginning but slower in the
end, but the Euclidean distance between the real
user and the fake user is small, and it is determined
as the real user. In [1], a keystroke vector
dissimilarity was proposed to enhance the keystroke
dynamics authentication (KDA) by using a
keystroke vector and a weight to reduce a flaw in
Euclidean distancehowever, the weight needed to
be thoroughly analyzed to improve the
authentication performance. The contribution of this
study extends the limited research on the
normalization techniques and weight adjustments
for the keystroke vector dissimilarity technique.
In this study, normalization techniques and
weight adjustment are analyzed to improve the
accuracy of [1]. A dataset from [1] is compared with
the previous work. The normalization techniques
and activation functions are Euclidean norm, Mean
normalization, Min-max normalization, Z-score
normalization, SoftMax function, and ReLU
function, and a weight is adjusted from w+1000 to
1000-w. The objective of this paper is to obtain a
normalization technique and a weight adjustment
such that the accuracy of the keystroke vector
dissimilarity technique is improved. Therefore, a
fake user has less chance to pass the authentication
process, and a system is safe from several mishaps,
such as information leaks and modifications, and
privilege escalation.
The organization of this paper is as follows. The
next section presents the background of keystroke
dynamics, normalization techniques, and activation
functions. In addition, the literature review is also
explained. Section 3 presents the overview of
keystroke vector dissimilarity, which is used as a
model to analyze the normalization techniques and
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.23
Tanapat Anusas-Amornkul, Naphat Bussabong
E-ISSN: 2224-2678
206
Volume 23, 2024
weight adjustment. Performance metrics and
datasets used in this study are presented. Next, the
experimental results are shown and analyzed in
detail with an example. The last section is the
conclusion, which summarizes the study results and
future work.
2 Background and Literature Review
This section presents background on keystroke
dynamics features and normalized techniques, along
with the literature reviews in this research area.
2.1 Background
Fundamental keystroke dynamics features are
presented in this section to give basic concepts of
standard keystroke dynamics features. Moreover,
normalization techniques and activation functions
are explained to provide basic knowledge on how to
normalize data in the study.
2.1.1 Keystroke Dynamics Features
Keystroke dynamics is a keystroke rhythm of a user
on a keyboard. Each user has a unique keystroke
rhythm that can be used to verify the user’s identity.
To collect the keystroke dynamics, keystroke
features are explained as follows.
Interkey time (I) is the time duration from
releasing one key to pressing a following
key.
Hold time (H) is the time duration from
pressing a key to releasing that key.
Latency (L) is a combination of Interkey
time and Hold time or the time duration
from pressing a key to pressing a following
key.
The standard keystroke features are illustrated in
Figure 1.
P A S S
Time
Hold
time
Interkey
time Latency
Fig. 1: Keystroke dynamics features
2.1.2 Normalization Techniques and
Activation Functions
This study investigates four normalization
techniques and two activation functions to enhance
the keystroke dynamics authentication performance.
The techniques and functions in this research are
described in detail as follows.
2.1.2.1 Euclidean Norm
An important component of Euclidean norm, or L2
normalization, is the length or magnitude of a vector
in mathematics using (1).

 (1)
where V is an n-dimension vector.
For the Euclidean norm, each value in the
vector is divided by its length, as shown in (2).
󰇛󰇜
 (2)
where n is the dimension of the vector. The
Euclidean norm outputs are between 0 and 1.
2.1.2.2 Mean Normalization
Mean normalization is calculated using a mean as a
reference for data scaling, as shown in (3).
󰆒 
󰇛󰇜󰇛󰇜 (3)
where
is a mean, max(x) is the maximum value in
a vector x, and min(x) is the minimum value in a
vector x. The outputs from the Mean normalization
are between -1 and +1.
2.1.2.3 Min-max Normalization
Min-max normalization is used to scale data using
minimum value as a reference, as shown in (4).
󰆒 󰇛󰇜
󰇛󰇜󰇛󰇜 (4)
The outputs from the Min-max normalization are
between 0 and +1.
2.1.2.4 Z-score Normalization
Z-score normalization is applied to adjust data to
exhibit a mean of 0 and a standard deviation of 1, as
shown in (5).
󰆒
(5)
where
is a standard deviation. The outputs from
the Z-score normalization are between negative and
positive values.
2.1.2.5 SoftMax function
SoftMax is a mathematical function commonly
employed in Deep learning for classification. This
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.23
Tanapat Anusas-Amornkul, Naphat Bussabong
E-ISSN: 2224-2678
207
Volume 23, 2024
function transforms data into probability values, as
shown in (6).
󰇛󰇜
 (6)
where
(z)i is a SoftMax function, z is an input
vector, ezi is a standard exponential function, K is a
number of an input vector, and
 is a
summation of all standard exponential functions.
The total sum of the SoftMax function outputs is 1.
2.1.2.6 ReLU (Rectified Linear Unit) function
ReLU is a mathematical function used in Deep
learning for its simplicity with no negative value. It
transforms data from 0 to maximum value, as shown
in (7).
󰆒󰇛󰇜󰇛󰇜 (7)
where V is an input vector, and vi is an ith vector
element.
2.2 Literature Review
Keystroke dynamics authentication research can be
classified into two main research techniques:
statistical and machine learning. The reviews for
each technique are as follows.
2.2.1 Statistical Techniques
For keystroke dynamics authentication, statistical
techniques were proposed because of less data
usage. It means that the amount of collected
keystroke data is small and practically useful for
real-world applications. [1], proposed a Keystroke
Vector Dissimilarity using a vector to represent
keystroke dynamics features. The technique
proposed to use a Master vector profile and an SD
vector profile to verify a user. The SD vector profile
served as a weight vector to assign the importance
of each feature (or position in a vector) differently.
The proposed work gave 96.67% accuracy and
3.33% EER.
[2], proposed using dynamic time warping for
keystroke dynamics authentication with the CMU
dataset. The result showed that the proposed
technique gave a False Accept Rate (FAR) of
39.9%, False Reject Rate (FRR) of 3.3%, and EER
of 17.6%. [3], proposed using the Generalized
Fuzzy model (GFM), a combination of Mamdani-
Larsen and Takagi-Sugeno fuzzy models for
keystroke dynamics authentication. It was tested
with the CMU dataset and the GFM model gave the
best performance at 7.86% EER.
[4], proposed a simple statistical method using an
average and a standard deviation to create a profile
from crowdsourcing. The best results were obtained
in 2 cases. The first case gave FAR at 0.0% and
FRR at 2.54%. The second case gave FAR at 0.0%
and FRR at 2.87%.
2.2.2 Machine Learning Techniques
Machine learning techniques for KDA are actively
used in the research area. The techniques also gave
high accuracy, but they require a large amount of
data for training.
[5], proposed using Chinese keystroke dynamics
for continuous authentication by selecting 5 timing
features and using one-class SVM for
authentication. The accuracy was 62.3%. [6],
proposed to use the Artificial Bee Colony algorithm
to classify a user using their dataset. The keystroke
dynamics features in this work were pressure, dwell
time, and flight time. The best accuracy was
90.63%, using Dwell time and pressure as the
features.
[7], studied four machine learning models, i.e.
KNN, SVC, Random Forest, and XGBoost using
CMU dataset. The XGBoost gave the best
performance at 93.59% accuracy. [8], created their
application to collect keystroke data and tested with
several classifier algorithms. The best algorithm was
the tree-based algorithm, and the accuracy was 94%.
[9], studied five machine learning algorithms,
i.e., Manhattan Distance, Manhattan Filtered
Distance, Manhattan Scaled Distance, Gaussian
Mixture Model, and Support Vector Machine, for
biometric authentication with CMU dataset. The
Manhattan Scaled Detector statistical algorithm was
the best algorithm, which gave EER 11.76%. [10],
proposed to transform behavioral biometrics (time
series) into three-dimensional (3D) images to test 6
deep learning algorithms. The best algorithm was
the GoogleNet model, which gave EER 4.49%.
Table 1. Literature work summary
Reference
Dataset
EER
FAR
FRR
Stat [1]
own
3.33%
3.33%
3.33%
Stat [2]
CMU
Dataset
17.6%
39.9%
3.3%
Stat [3]
CMU
Dataset
7.86%
-
-
Stat [4]
own
-
0%
2.54%
ML [5]
own
-
-
-
ML [6]
own
-
-
-
ML [7]
own
-
-
-
ML [8]
own
-
-
-
ML [9]
CMU
Dataset
11.76%
-
-
ML [10]
own
4.49%
-
-
Note: Stat is a statistical technique, and ML is a machine
learning technique.
Table 1 shows the summary of the literature
works using accuracy, EER, FAR, and FRR as
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.23
Tanapat Anusas-Amornkul, Naphat Bussabong
E-ISSN: 2224-2678
208
Volume 23, 2024
performance metrics to compare with other works.
Some metrics cannot be determined in their papers.
Datasets were from different sources, and each
source collected keystroke dynamics features
differently. The results are highly challenging to
compare against one another. The highest accuracy
is 96.67%, with 3.33% error in [1] with their own
dataset. However, this study obtains the dataset
from [1]. The results can be compared fairly with
the same dataset.
3 Keystroke Vector Dissimilarity
The concept of the keystroke vector dissimilarity
technique was proposed in [1]. However, this
research aims to analyze the normalization
techniques and weight adjustment of the keystroke
vector dissimilarity technique. An overview of the
keystroke vector dissimilarity is presented to explain
how the technique works.
3.1 Overview
The keystroke vector dissimilarity was proposed to
overcome a weakness of Euclidean distance by
using a keystroke vector and a standard deviation
(SD) vector as weights to verify a user. The
overview diagram for the keystroke vector
dissimilarity technique is shown in Figure 2.
Calculate mean
for each feature
Calculate
standard deviation
for each feature
Normalization and
Weight adjustment
Calculate
Euclidean distance
X
Master
Vector
SD
Vector Profile
Modified Input
Keystroke Vector
Master
Vector Profile
BAvDis s , BStdDiss ADi ss
X
X
SD Vector
Input Keystroke
Vector
10 sets of Username
Keystroke Vector
Calculate
Euclidean norm
for each feature
Calculate Euclidean
distance for each set
with Master Vector
Profile and Calculate
mean and standa rd
deviation
Compare
Real user Fake user
0
1
Calculate
Euclidean norm
for each feature
Fig. 2: Keystroke vector dissimilarity diagram
This technique consisted of 5 main components,
i.e., 1) a keystroke vector, 2) a vector normalization,
3) a master vector and SD vector profile, 4) a master
vector profile and modified input keystroke vector,
and 5) a verification process. A keystroke vector is
created from keystroke dynamics features: Interkey
time and Latency for each key to create a vector
using a sequence of input keystrokes. A vector
normalization was used to normalize the standard
deviation for each feature in a vector, and it is used
as a weight for each feature. This vector has to be
adjusted so that weights are not zero.
A Master Vector is a vector created from a
means of 10 real user inputs to create a master
vector profile by multiplying it with an SD vector
profile, which weighs each element in a vector to
overcome the weakness of the Euclidean distance.
When a user inputs his username on a keyboard, an
input keystroke vector is created and calculated the
Euclidean norm for each feature. Then, the vector is
multiplied by the SD vector profile to create a
modified input keystroke vector compared with a
Master vector profile. A verification process is done
using the Euclidean distance and a Six Sigma
technique. If the distance between a new input
vector and a Master vector profile is in the range of
a real user, the verification is passed, and vice versa.
3.2 Performance Metrics
In this study, standard keystroke dynamics
performance metrics are False Accept Rate (FAR),
False Reject Rate (FRR), and Equal Error Rate
(EER). Moreover, the machine learning
performance metrics, i.e., Accuracy (Acc.) and
Error (Err.), are used in this study when EER cannot
be evaluated in some cases. Therefore, Accuracy is
the main performance metric of this study for
comparing results.
3.2.1 False Accept Rate (FAR)
False Accept Rate (FAR) is the percentage of fake
user inputs accepted as a real user. The FAR
formula is displayed in (8).
 
    (8)
3.2.2 False Reject Rate (FRR)
False Reject Rate (FRR) is the percentage of real
user inputs rejected as a fake user. The FRR formula
is displayed in (9).
 
  (9)
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.23
Tanapat Anusas-Amornkul, Naphat Bussabong
E-ISSN: 2224-2678
209
Volume 23, 2024
3.2.3 Equal Error Rate (EER)
The Equal Error Rate (EER) is a cross point
between FAR and FRR: the lower the EER, the
higher the accuracy of a biometric system.
3.2.4 Accuracy
Accuracy (Acc.) is a percentage of predictions that
match the actual results. The accuracy formula is
displayed in (10).
 
  (10)
3.2.5 Error
Error (Err.) is a percentage of predictions that do not
match the actual results. It is the opposite of
Accuracy. The error formula is displayed in (11).
  (11)
3.3 Dataset
In this study, a dataset used for the experiments is
from [1]. The dataset was collected from 18 users,
15 real users, and 3 fake users. Each real user typed
their username (not a password) on a keyboard in 2
sets. For each set, a user typed his username 10
times. The total data for real users is 300 records. A
fake user typed the usernames of real users 30 times
for each user. The total data for fake users is 1,350
records. The total data is 1,650 records in this
dataset.
4 Normalization Technique and
Weight Analysis
From the previous section, the overview of the
keystroke vector dissimilarity technique is
explained. Figure 2 shows the overall components
of the technique, and the component that will be
analyzed in this study is the normalization and
weight adjustment component, as shown in Figure
3. The main reason to examine this component is
that its function is to determine weights for creating
an SD vector profile to solve the Euclidean distance
problem. The previous research in [1] only proposed
to use the SoftMax function as a normalization
technique and SoftMax outputs (w) +1 as weights.
The accuracy and EER were 96.67% and 3.33%,
respectively.
This study aims to analyze other normalization
techniques and weight adjustment values to improve
the accuracy of the keystroke vector dissimilarity
technique. However, the previous analysis of SD
vector values found that the standard deviation
value for each keystroke feature was very small.
The SD vector must first be normalized so that the
values can give significant weight to the keystroke
vectors.
In this section, the normalization techniques and
weight adjustment are under-investigated to study
the performance and how to calculate a proper
weight for the keystroke vector dissimilarity
technique. The normalization techniques in this
study are Euclidean norm, Mean normalization,
Min-max normalization, Z-score normalization,
SoftMax function, and ReLU function. It is
important to note that the output from a
normalization technique is represented as “w” in
this study.
Normalization
(w)
Weight adjustment
SD
Vector Profile
SD Vector
Fig. 3: Normalization and Weight adjustment.
In addition, weight adjustment is an important
component, such that the weight is not zero when
multiplying with the master profile or a modified
input keystroke vector, and the effect of that
element is still important for the verification
process. The weight adjustment value can be added,
subtracted, or multiplied to make a proper weight
for creating an SD vector profile. Sometimes, the
outputs from a normalization technique are small,
such that multiplying by 10, 100, or 1000 is a
reasonable weight adjustment, and the addition or
subtraction by 1 is crucial to ensure that weights are
not zero. Several weight adjustment values are
investigated such as w+1, w+1000, (1000w)+1, 1-
w, 1000-w, and 1-(1000w).
The overview of an SD vector as the input to a
normalization technique is that all elements in the
vector are standard deviations, always positive
numbers. Each standard deviation value is
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.23
Tanapat Anusas-Amornkul, Naphat Bussabong
E-ISSN: 2224-2678
210
Volume 23, 2024
calculated from 10 username inputs. The standard
deviations for all elements in the SD vector are
small. A component of the SD vector with a high
standard deviation means that a user is unfamiliar
with that feature, and the weight should be small.
This hypothesis is shown later in this section.
Table 2, Table 3, Table 4, Table 5, Table 6 and
Table 7 show the results from different
normalization techniques and activation functions:
Euclidean norm, Mean normalization, Min-max
normalization, Z-score normalization, SoftMax
function, and ReLU function, respectively. The
weights are varied, and the performance metrics are
accuracy (Acc.), error (Err.), FAR, FRR, and EER.
In the Tables, some values of EER cannot be
determined because the cross point between FAR
and FRR was not found. The bold row in the Tables
gives the best performance compared to other table
rows.
The results from the Euclidean norm are
presented in Table 2. The outputs from the
Euclidean norm are ranging between 0 and 1. The
best weight for the Euclidean norm is 10-w with
96.91% accuracy and 3.09% error.
Table 2. Euclidean norm results
Weight
Adjustment
Acc.
Err.
FAR
FRR
EER
w+1
95.33%
4.67%
4.67%
4.67%
-
w+10
96.67%
3.33%
3.33%
3.33%
3.33%
w+100
96.67%
3.33%
3.33%
3.33%
3.33%
w+1000
96.67%
3.33%
3.33%
3.33%
3.33%
(10*w)+1
91.09%
8.91%
8.89%
9.00%
-
(100*w)+1
90.00%
10.00%
10.00%
10.00%
10.00%
(1000*w)+1
90.06%
9.94%
9.93%
10.00%
-
1-w
96.67%
3.33%
3.33%
3.33%
3.33%
10-w
96.91%
3.09%
3.11%
3.00%
-
100-w
96.67%
3.33%
3.33%
3.33%
3.33%
1000-w
96.67%
3.33%
3.33%
3.33%
3.33%
1-(w*10)
88.06%
11.94%
11.93%
12.00%
-
1-(w*100)
89.94%
10.06%
10.07%
10.00%
-
1-(w*1000)
90.06%
9.94%
9.93%
10.00%
-
Table 3. Mean normalization results
Weight
Adjustment
Acc.
Err.
FAR
FRR
EER
w+1
93.52%
6.49%
6.52%
6.33%
-
w+10
96.49%
3.52%
3.48%
3.67%
-
w+100
96.67%
3.33%
3.33%
3.33%
3.33%
w+1000
96.67%
3.33%
3.33%
3.33%
3.33%
(10*w)+1
87.39%
12.61%
12.59%
12.67%
-
(100*w)+1
86.24%
13.76%
13.78%
13.67%
-
(1000*w)+1
86.24%
13.76%
13.85%
13.33%
-
1-w
96.67%
3.33%
3.33%
3.33%
3.33%
10-w
96.97%
3.03%
3.04%
3.00%
-
100-w
96.73%
3.27%
3.26%
3.33%
-
1000-w
96.67%
3.33%
3.33%
3.33%
3.33%
1-(w*10)
87.70%
12.30%
12.30%
12.33%
-
1-(w*100)
86.18%
13.82%
13.85%
13.67%
-
1-(w*1000)
86.30%
13.70%
13.78%
13.33%
-
The outputs from Mean normalization range
between -1 and +1, but the outputs from Min-max
normalization range between 0 and 1. The best
weight for Mean and Min-max normalizations is
10-w with the same accuracy at 96.97% and error at
3.03%, as shown in Table 3 and Table 4. This is
because both normalization techniques have the
same denominator, but the numerators differ by
changing from a mean to a minimum value. The
overall results from Min-max normalization are
better than that from Mean normalization.
Table 4. Min-max normalization results
Weight
Adjustment
Acc.
Err.
FAR
FRR
EER
w+1
94.49%
5.52%
5.48%
5.67%
-
w+10
96.49%
3.52%
3.48%
3.67%
-
w+100
96.67%
3.33%
3.33%
3.33%
3.33%
w+1000
96.67%
3.33%
3.33%
3.33%
3.33%
(10*w)+1
90.55%
9.46%
9.48%
9.33%
-
(100*w)+1
90.06%
9.949%
9.93%
10.00%
-
(1000*w)+1
90.06%
9.94%
9.93%
10.00%
-
1-w
95.64%
4.36%
4.37%
4.33%
-
10-w
96.97%
3.03%
3.04%
3.00%
-
100-w
96.73%
3.27%
3.26%
3.33%
-
1000-w
96.67%
3.33%
3.33%
3.33%
3.33%
1-(w*10)
88.91%
11.09%
11.11%
11.00%
-
1-(w*100)
90.06%
9.94%
9.93%
10.00%
-
1-(w*1000)
90.06%
9.94%
9.93%
10.00%
-
For Z-score normalization, the outputs from this
normalization range between negative and positive
numbers, which can be less than -1 or more than 1.
The weights can be large such that the best weight is
100-w with 96.79% accuracy and 3.21% error, as
shown in Table 5.
Table 5. Z-score normalization results
Weight
Adjustment
Acc.
Err.
FAR
FRR
EER
w+1
89.46%
10.55%
10.52%
10.67%
-
w+10
95.70%
4.30%
4.30%
4.33%
-
w+100
96.67%
3.33%
3.33%
3.33%
3.33%
w+1000
96.67%
3.33%
3.33%
3.33%
3.33%
(10*w)+1
86.18%
13.82%
13.85%
13.67%
-
(100*w)+1
86.30%
13.70%
13.78%
13.33%
-
(1000*w)+1
86.24%
13.76%
13.85%
13.33%
-
1-w
93.03%
6.97%
6.96%
7.00%
-
10-w
96.67%
3.33%
3.33%
3.33%
3.33%
100-w
96.79%
3.21%
3.19%
3.33%
-
1000-w
96.67%
3.33%
3.33%
3.33%
3.33%
1-(w*10)
86.36%
13.64%
13.63%
13.67%
-
1-(w*100)
86.30%
13.70%
13.70%
13.67%
-
1-(w*1000)
86.30%
13.70%
13.78%
13.33%
-
The SoftMax function outputs are probabilities
ranging between 0 and 1, and the summation of all
weights has to be 1. Since the values of SD are
small when the SoftMax function calculates
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.23
Tanapat Anusas-Amornkul, Naphat Bussabong
E-ISSN: 2224-2678
211
Volume 23, 2024
weights, the outputs from the function are almost
similar for all positions in the vector. That makes
the performance for all weights look identical at
96.67% accuracy and 3.33% error, as shown in
Table 6.
The last function is the ReLU function, which
changes negative values to zeros but keeps the
values of positive numbers as they are. This
function does not normalize any values, but it
eliminates the negative values to solve the vanishing
gradients in deep learning. The best weight is 1-
(w*10) at 96.85% accuracy and 3.15% error, as
shown in Table 7.
Table 6. SoftMax function results
Weight
Adjustment
Acc.
Err.
FAR
FRR
EER
w+1
96.67%
3.33%
3.33%
3.33%
3.33%
w+10
96.67%
3.33%
3.33%
3.33%
3.33%
w+100
96.67%
3.33%
3.33%
3.33%
3.33%
w+1000
96.67%
3.33%
3.33%
3.33%
3.33%
(10*w)+1
96.67%
3.33%
3.33%
3.33%
3.33%
(100*w)+1
96.67%
3.33%
3.33%
3.33%
3.33%
(1000*w)+1
96.67%
3.33%
3.33%
3.33%
3.33%
1-w
96.67%
3.33%
3.33%
3.33%
3.33%
10-w
96.67%
3.33%
3.33%
3.33%
3.33%
100-w
96.67%
3.33%
3.33%
3.33%
3.33%
1000-w
96.67%
3.33%
3.33%
3.33%
3.33%
1-(w*10)
96.67%
3.33%
3.33%
3.33%
3.33%
1-(w*100)
96.67%
3.33%
3.33%
3.33%
3.33%
1-(w*1000)
96.67%
3.33%
3.33%
3.33%
3.33%
Table 7. ReLU function results
Weight
Adjustment
Acc.
Err.
FAR
FRR
EER
w+1
96.67%
3.33%
3.33%
3.33%
3.33%
w+10
96.67%
3.33%
3.33%
3.33%
3.33%
w+100
96.67%
3.33%
3.33%
3.33%
3.33%
w+1000
96.67%
3.33%
3.33%
3.33%
3.33%
(10*w)+1
96.67%
3.33%
3.33%
3.33%
3.33%
(100*w)+1
95.70%
4.30%
4.30%
4.33%
-
(1000*w)+1
91.33%
8.67%
8.67%
8.67%
-
1-w
96.67%
3.33%
3.33%
3.33%
3.33%
10-w
96.67%
3.33%
3.33%
3.33%
3.33%
100-w
96.67%
3.33%
3.33%
3.33%
3.33%
1000-w
96.67%
3.33%
3.33%
3.33%
3.33%
1-(w*10)
96.85%
3.15%
3.19%
3.00%
-
1-(w*100)
96.42%
3.58%
3.56%
3.67%
-
1-(w*1000)
86.30%
13.70%
13.70%
13.67%
-
Table 8. Summary
Technique
Weight
Adj.
Acc.
Err.
FAR
FRR
EER
Euclidean
Norm
10-w
96.91%
3.09%
3.11%
3.00%
-
Mean
10-w
96.97%
3.03%
3.04%
3.00%
-
Min-Max
10-w
96.97%
3.03%
3.04%
3.00%
-
Z-score
100-w
96.79%
3.21%
3.19%
3.33%
-
SoftMax
All
96.67%
3.33%
3.33%
3.33%
3.33%
ReLU
1-
(w*10)
96.85%
3.15%
3.19%
3.00%
-
The summary of the best weight adjustment for
each normalization technique and activation
function is shown in Table 8. The overall weights
are x-w, where x can be 1, 10, or 100, depending on
the technique. The best result is 96.97% accuracy
and 3.03% error for Mean and Min-max
normalizations with 10-w as weight adjustment.
4.1 Example Data
From the results in Table 1, Table 2, Table 3, Table
4, Table 5 and Table 6, numerical example data
from 1 user is explained in detail. This user has a
six-character username, and the SD vector contains
2 keystroke features for each key, and the total is 12
elements in the vector. The SD vector is shown as
follow.
SD vector
[0.00000, 0.00234, 0.00229, 0.00255, 0.00177,
0.00230, 0.00118, 0.00086, 0.00085, 0.00133,
0.00109, 0.00000]
The values in the SD vector are all positive
numbers with very small magnitudes. The SD vector
has to be normalized using one of six normalization
techniques. The outputs from all techniques are
shown as follows.
Euclidean norm
[0.00000, 0.41756, 0.40853, 0.45481, 0.31578,
0.40957, 0.21054, 0.15394, 0.15134, 0.23661,
0.19370, 0.00000]
Mean normalization
[-0.54096, 0.37714, 0.35728, 0.45904, 0.15336,
0.35958, -0.07803, -0.20249, -0.20821, -0.02071, -
0.11506, -0.54096]
Min-max normalization
[0.00000, 0.91810, 0.89824, 1.00000, 0.69432,
0.90054, 0.46293, 0.33847, 0.33275, 0.52025,
0.42590, 0.00000]
Z-score- normalization
[-1.55996, 1.08757, 1.03029, 1.32374, 0.44224,
1.03691, -0.22502, -0.58391, -0.60041, -0.05971, -
0.33179, -1.55996]
Softmax function
[0.08322, 0.08341, 0.08341, 0.08343, 0.08337,
0.08341, 0.08332, 0.08329, 0.08329, 0.08333,
0.08331, 0.08322]
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.23
Tanapat Anusas-Amornkul, Naphat Bussabong
E-ISSN: 2224-2678
212
Volume 23, 2024
ReLU function
[0.00000, 0.00234, 0.00229, 0.00255, 0.00177,
0.00230, 0.00118, 0.00086, 0.00085, 0.00133,
0.00109, 0.00000]
The minimum values from each output are in
bold, while the maximum values are underlined for
better visualisation. Since the outputs from
normalization techniques can be zero, the weights
must be manipulated to avoid the weights being
zero. In addition, the output values are small, and
the outputs or weights from each technique have to
be adjusted to give the best accuracy. The best SD
vector profiles for each technique with different
weight adjustments are presented as follows.
SD vector profile (Best results)
at 10-w as weight adjustment
Euclidean norm
[10.00000, 9.58244, 9.59147, 9.54519, 9.68422,
9.59043, 9.78946, 9.84606, 9.84866, 9.76339,
9.80630, 10.00000]
Mean normalization
[10.54096, 9.62286, 9.64272, 9.54096, 9.84664,
9.64042, 10.07803, 10.20249, 10.20821, 10.02071,
10.11506, 10.54096]
Min-max normalization
[10.00000, 9.08190, 9.10176, 9.00000, 9.30568,
9.09946, 9.53707, 9.66153, 9.66725, 9.47975,
9.57410, 10.00000]
Softmax function
[9.91678, 9.91659, 9.91659, 9.91657, 9.91663,
9.91659, 9.91668, 9.91671, 9.91671, 9.91667,
9.91669, 9.91678]
at 100-w as weight adjustment
Z-score normalization
[101.55996, 98.91243, 98.96971, 98.67626,
99.55776, 98.96309, 100.22502, 100.58391,
100.60041, 100.05971, 100.33179, 101.55996]
at 1-10w as weight adjustment
ReLU function
[1.00000, 0.97658, 0.97708, 0.97449, 0.98229,
0.97702, 0.98819, 0.99136, 0.99151, 0.98673,
0.98913, 1.00000]
The minimum values from each SD vector
profile are in bold, while the maximum values are
underlined. From all SD vector profiles, the
minimum value from the normalization techniques
gave maximum weight for all SD vector profiles.
The different values among weights should be low.
The Euclidean norm, Mean, and Min-max
normalizations gave high accuracy at 10-w as
weights. It means that the weight values should be
around 10. The Euclidean norm showed the least
standard variation at 0.158 compared to the Mean
and Min-max normalizations at 0.347. Therefore,
the standard deviation affects the performance of the
technique. From the results, standard deviations of
Mean and Min-max normalizations are the same,
and the accuracy is identical at 10-w weight
adjustment. The results also showed that the
SoftMax function is unsuitable for the keystroke
vector dissimilarity technique because the weights
are almost identical. It means that the importance of
each value is similar and cannot finally solve the
Euclidean distance problem. The z-score
normalization gave the highest standard deviation
(1.0) at 100-w as weights, but the accuracy was less
than that of the Mean and Min-max normalizations.
The ReLU function is not a good technique since
the values of the SD vector are all positive, and the
output from the ReLU function is the same as the
SD vector.
5 Conclusion
This study examines normalization techniques,
activation functions, and weight adjustments for the
keystroke vector dissimilarity technique proposed in
[1]. The study aims to enhance the accuracy of the
technique to improve the authentication process.
The normalization techniques and activation
functions include Euclidean norm, Mean
normalization, Min-max normalization, Z-score
normalization, SoftMax function, and ReLU
function. Weight adjustments were varied during the
experiments. The results indicate that Mean
normalization and Min-max normalization with 10-
w as a weight gave the same highest result,
achieving 96.97% accuracy and 3.03% error,
outperforming the previous work in [1], which
achieved 96.67% accuracy and 3.33% error.
The analysis steps can be applied to other
keystroke dynamic datasets for future research to
enhance accuracy. In addition, the verification
process for keystroke vector dissimilarity can be
modified to employ machine learning techniques
rather than the Six Sigma technique.
References:
[1] N. Bussabong and T. Anusas-amornkul,
Enhanced Keystroke Dynamics
Authentication Using Keystroke Vector
Dissimilarity, in 2023 15th International
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.23
Tanapat Anusas-Amornkul, Naphat Bussabong
E-ISSN: 2224-2678
213
Volume 23, 2024
Conference on Information Technology and
Electrical Engineering (ICITEE), Chiang Mai,
Thailand, 2023, pp. 223-228,
doi: 10.1109/ICITEE59582.2023.10317643.
[2] M. I. W. Pramana, Suhardi, N. B. Kurniawan
and J. Sembiring, Keystroke dynamics for
authentication using dynamic time warping, in
2017 14th International Joint Conference on
Computer Science and Software Engineering
(JCSSE), NakhonSiThammarat, 2017, pp. 1-5,
doi: 10.1109/JCSSE.2017.8025915.
[3] A. Bhatia, M. Hanmandlu, S. Vasikarla and B.
K. Panigrahi, Keystroke Dynamics Based
Authentication Using GFM, in 2018 IEEE
International Symposium on Technologies for
Homeland Security (HST), Woburn, MA,
USA, 2018, pp. 1-5, doi:
10.1109/THS.2018.8574195.
[4] A. Foresi and R. Samavi, User Authentication
Using Keystroke Dynamics via
Crowdsourcing, in 2019 17th International
Conference on Privacy, Security and Trust
(PST), Fredericton, 2019, pp. 1-3, doi:
10.1109/PST47121.2019.8949031.
[5] T.-L. Lin and Y.-S. Chen, A Chinese
Continuous Keystroke Authentication Method
Using Cognitive Factors, in 2019 IEEE
International Conference on Consumer
Electronics - Taiwan (ICCE-TW), Yilan,
2019, pp. 1-2, doi: 10.1109/ICCE-
TW46550.2019.8991902.
[6] P. M. Koh and W. K. Lai, Keystroke
Dynamics Identification System using ABC
Algorithm, in 2019 IEEE International
Conference on Automatic Control and
Intelligent Systems (I2CACIS), Selangor,
Malaysia, 2019, pp. 108-113, doi:
10.1109/I2CACIS.2019.8825073.
[7] S. Singh, A. Inamdar, A. Kore and A. Pawar,
Analysis of Algorithms for User
Authentication using Keystroke Dynamics, in
2020 International Conference on
Communication and Signal Processing
(ICCSP), Chennai, India, 2020, pp. 0337-
0341, doi:
10.1109/ICCSP48568.2020.9182115.
[8] N. Çevik, S. Akleylek and K. Y. Koç,
Keystroke Dynamics Based Authentication
System, in 2021 6th International Conference
on Computer Science and Engineering
(UBMK), Ankara, Turkey, 2021, pp. 644-649,
doi: 10.1109/UBMK52708.2021.9559008.
[9] R. Chandok, V. Bhoir and S. Chinnaswamy,
Behavioural Biometric Authentication using
Keystroke Features with Machine Learning, in
2022 IEEE 19th India Council International
Conference (INDICON), Kochi, India, 2022,
pp. 1-6,
doi:
10.1109/INDICON56171.2022.10040168.
[10] Y. B. W. Piugie, J. Di Manno, C. Rosenberger
and C. Charrier, Keystroke Dynamics based
User Authentication using Deep Learning
Neural Networks, in 2022 International
Conference on Cyberworlds (CW), Kanazawa,
Japan, 2022, pp. 220-227, doi:
10.1109/CW55638.2022.00052.
Contribution of individual authors to the
creation of a scientific article (ghostwriting
policy)
- Tanapat Anusas-amornkul gave concepts,
supervised, analyzed, and wrote this research.
- Naphat Bussabong implemented and conducted
the experiments for this research.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The authors have no conflicts of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.23
Tanapat Anusas-Amornkul, Naphat Bussabong
E-ISSN: 2224-2678
214
Volume 23, 2024