In recent decades, electronic communication became one

of the most important communication channels. Informal in-

person chats were replaced by online forums, posting on

social media and direct messaging. This opened the doors

of a new kind of harassment, online harassment [1]. Unlike

few decades ago, when harassment was contained in either

direct face-to-face insults or written communication (letters,

articles, street grafﬁti,etc.), now it is mainly conducted on

online platforms. This is why now it can have much broader

reach, and combined with the anonymity of online landscape,

but also it can keep individuals anonymous and therefore

not accountable. This can have a disastrous effect on more

vulnerable people and groups, especially younger ones who

did not develop appropriate coping mechanisms [2].

It is important to conduct research and development in the

ﬁeld of detection and prevention mechanisms that will keep

sure our online world more friendly, decent and less toxic.

In general, detection is difﬁcult due to two main reasons.

First, online platforms generate vast data streams each second,

so these mechanisms need to process large amounts of data.

Second difﬁculty is the constant change of online language.

All of these online trends (e.g. TikTok or Instagram trends and

slangs) change rapidly [3], and each one of them opens new

types of potential harassment. Since the culture changes quite

fast, harassment takes new form all the time - which in return

makes designing detection systems.

This is why the recent development of artiﬁcial intelligence

(AI) comes very handy. Instead of relying on explicit instruc-

tions, AI systems can adopt themselves with new data. That is

what makes them suitable for dealing with constantly changing

ways of online harassment. In general AI models consist of

architecture and hyperparameters that are chosen while they

are designed and parameters that are tuned during training

and are constantly updated with the ﬂow of new data. In this

paper the problem is tackled with natural language processing

(NLP) [4], where the selection of the hyperparameters is an

NP-hard problem. This is why metaheuristic is used concretely

Modified Metaheuristics Optimization for Cyberbullying Detection on

Online Data Science Platform

NEBOJSA BACANIN1a, LUKA JOVANOVIC2b, ILJA UZELAC BUJISIC2,

JELENA KALJEVIC3c, JELENA CADJENOVIC4d, MILOS ANTONIJEVIC1e,

MIODRAG ZIVKOVIC1f

1Faculty of Informatics and Computing

Singidunum University

Belgrade, SERBIA

2Faculty of Technical Sciences

Singidunum University

Belgrade, SERBIA

3Faculty for Health and Business Studies

Singidunum University

Belgrade, SERBIA

4Deprtmant of Business Economics

Singidunum University

Belgrade, SERBIA

a0000-0002-2062-924X, b0000-0001-9402-7391, c0000-0002-5135-8083,

d0000-0002-3406-3347, e0000-0002-5511-2531, f0000-0002-4351-068X

Abstract: —Online harassment detection faces significant challenges due to its expansive reach and anonymity. Addressing

this issue demands effective detection mechanisms capable of processing vast data streams and adapting to evolving online

language. Leveraging advancements in artificial intelligence, we propose a novel approach grounded in natural language

processing (NLP) and metaheuristic algorithms. Our methodology integrates term frequency-inverse document frequency

(TF-IDF) encoding and the AdaBoost algorithm for classification. To tackle the NP-hard problem of hyperparameter

selection, we introduce a modified crayfish optimization algorithm (COA), termed GI-COA. This paper represents a

pioneering effort in utilizing metaheuristic algorithms for hyperparameter selection in harassment detection models.

Through experimentation, we demonstrate the efficacy of our approach in fostering a safer online environment. The best

performing optimize models demonstrate and accuracy exceeding 77%.

Key-words:—Online harassment, Hyperparameter optimization, Metaheuristic algorithms, Natural language processing,

Hybridization,

Received: March 2, 2024. Revised: September 6, 2024. Accepted: October 4, 2024. Published: November 5, 2024.

1. Introduction

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.20

Nebojsa Bacanin, Luka Jovanovic, Ilja Uzelac Bujisic,

Jelena Kaljevic, Jelena Cadjenovic,

Milos Antonijevic, Miodrag Zivkovic

E-ISSN: 2224-2872

211

Volume 23, 2024

crayﬁsh optimisation algorithm (COA) [5] in the hyperparam-

eter selection process.

This paper tackles this problem using few novel approaches.

The ﬁrst question is how to handle data, i.e. how to code

words and their appearances in documents. To grasp this,

term frequency - inverse document frequency (TF-IDF) [6] is

implemented as the main metric. The classiﬁer that classiﬁes

documents as harassment or non-harassment is based on Ad-

aBoost [7] algorithm. The NP-hard problem of hyperparameter

tuning is tackled by a modiﬁcation of COA. This paper is a ﬁrst

bridge of using metaheuristical algorithms in hyperparameter

selection in models for harassment detection.

The rest of the paper starts with the overview of the general

works in NLP models and existing spam detection models.

This is followed by a short discussion about metaheuristic

algorithms in general. Brief descriptions of AdaBoost, TF-IDF,

COA and the modiﬁed version of COA, GI-COA are given.

After the experimental setup is covered, simulation outcomes

are obtained and discussed before the conclusion of the paper.

In general, work in harassment detection lies closely to spam

detection problems as a part of wider NLP research. In this

broad and fast-developing area some of the most important

research papers are [8] introducing attention-based transformer

models, [9] which introduced the original GPT model and

demonstrated the effectiveness of generative pre-training on a

diverse set of NLP tasks and [10] which introduced the more

powerful GPT-2 which demonstrated effectiveness on various

NLP tasks to the point that it sparked ethical considerations of

such models.Some relevant papers in spam detection include

BERT for Spam detection [11] which use BERT, bidirectional

encoder representation from transformers introduced in [12].

which is based on support vector machines. The SVM based

approach can be useful due to its effectiveness with dealing

with high-dimensional data but it can be computationally

expensive and requires careful selection of hyperparameters

and kernel functions which might be impractical. On the

other hand BERT based approaches show strong results on

vast NLP tasks including spam, and large text corpus it was

trained on makes it very generalizable. On the other hand, its

large number of hyperparameters makes the training process

expensive.

When using machine learning algorithms, some architecture

is chosen and then parameters are trained using learning data.

The difﬁculty of the process lies in selecting an appropriate

model, ﬁnding and processing training data and then tuning the

parameters using it. However, before the training process, it is

often necessary to deﬁne some parameters that will not change

during the training but will signiﬁcantly affect the performance

of the model. This values are referred to as hyperparameters.

The process of ﬁnding appropriate hyperparameters consists of

determining which parameters are hyperparameters, deﬁning

the search range for each of them, choosing a metric to

grade each choice, deﬁning the search strategy, tuning the

hyperparameters, selecting the best choice and then testing

the model with these hyperparameters on some independent

test set. Since grid search if done directly can be an NP-hard

problem, a metaheuristical approach is used to ﬁnd a good

choice of hyperparameters.

Literature presents many potential candidate algorithms

of optimizations. Some popular choices used by researchers

include established optimizers like the variable neighborhood

search (VNS) [13], GA [14], and sine cosine algorithms

(SCA) [15] are also examined. Metaphor-based optimizers,

including bat algorithm (BA) [16], whale optimization algo-

rithm (WOA) [17], and Harris hawk optimizer (HHO) [18],

are also popular choices. Some newer examples include the

Botox optimization algorithm BOA [19] inspired by medical

treatment processes.

Optimizes have been successfully applied in several ﬁeld

such as cybersecurity where these algorithms tackle intrusion

detection [20], spam identiﬁcation [21] as well as fraud iden-

tiﬁcation [22]. However, hybrid algorithms have also proven

to be highly effective at tackling optimizations with notable

implementations showing interesting results in ﬁnance [23]

and medical ﬁelds [24]. With more recent examples tackling

problems in energy [25] and production forecasting [26].

The AdaBoost [7] optimizer widely considered a good

choice for optimization problems. This approach adopts en-

semble techniques and uses several simpler models to for-

mulate a decision. By applying weights to classiﬁer votes

better decisions can be made though a sort of consensus

method. Should a correct classiﬁcation be made weights of

classiﬁers are increased, otherwise weights are decreased.

Classiﬁer errors can be determined as per Eq.(1):

ϵt=PN

i=1 wi,t ·I(ht(xi)=yi)

i=1 wi,t

(1)

the term ϵtrepresents a weighted adjustment for a weak model,

where wi,t corresponds to the weight. The expression ht(xi)

refers to the prediction made by the weak classiﬁer, and yi

indicates the actual value or ground truth. The function I

converts the output to 1 if the classiﬁcation is positive, and

0 if it is negative.

Additional classiﬁers are built using the determined weights,

and this weight-adjustment process is repeated. Generally, a

large number of classiﬁers are accumulated to create accurate

models. Each sub-model is given a score, and these are

combined to form a linear model. The classiﬁer’s weight

within the ensemble can be computed using Eq. 2.

αt=1

2ln 1−ϵt

ϵt(2)

the term αtsigniﬁes the assigned weight for each weak

classiﬁer. This value is determined based on the error value,

ϵt, and deﬁnes the classiﬁer’s contribution to the ﬁnal result.

These weights are updated according to the following rule:

wi,t+1 =wi,t ·exp (−αt·yi·ht(xi)) (3)

2. Related Works

2.1 Adaboost

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.20

Nebojsa Bacanin, Luka Jovanovic, Ilja Uzelac Bujisic,

Jelena Kaljevic, Jelena Cadjenovic,

Milos Antonijevic, Miodrag Zivkovic

E-ISSN: 2224-2872

212

Volume 23, 2024

where wi,t represents the weight a given instance, αtdenotes

the weight of the weak classiﬁer yiis the ground truth value,

and ht(xi)signiﬁes the prediction.

It is difﬁcult to directly convert words into numbers in

a meaningful way. What matters is to somehow quantify

sentences such that from that quantiﬁcation there is some

insight about it. To do that, a measure called TF-IDF is

applied. That measure captures how relevant is each word in

a document where a document belongs to some collection of

documents - i.e. a corpus.

Intuitively speaking useful measure would be a one that

increases proportionally with each new appearance of a word

in a document, and also decreases with the frequency of

word appearance in corpus. This makes sense, since to extract

relevant words for a speciﬁc documents, the words that appear

in many documents should be penalised. These words will

not give any insight of this speciﬁc document that is being

analysed. Formally for a word w, document din a corpus C

of documents TF and IDF are:

TF =num of appearances of win d

num of words in d(4)

IDF = log2num of documents in C

num of documents in Ccontaining w(5)

Then TF-IDF is deﬁned by TF-IDF := TF*IDF. This is the

quantity that will be used for relevant words. Note that the

base of logarithm is not important since it just scales TF-IDF

by a constant. More on TF-IDF can be found in [27]

Crayﬁsh, also known as crawﬁsh or freshwater lobsters, are

crustecans resembling small lobsters that typically live in river

waters. The COA is a metaheuristic algorithm inspired by

behaviour of crayﬁsh. Behaviour consists of foraging, com-

petition and summer vacation stage. First two stages are the

exploitation stage while summer vacation is exploration stage

of COA. First all the variables used in COA are explained. Let

nbe the number of crayﬁshes, Tthe number of iterations ,d

the dimension of space in which the process takes place. The

position of crayﬁshes at time t, for 0≤t≤Tis an n×dmatrix

X(t)where X(t)

iis the d-dimensional vector representing the

location of i-th crayﬁsh at time tfor 1≤i≤n.

The n×dmatrices Lband Ubare lower and upper bounds

on position of crayﬁshes. At each time, the temperature temp

of the system is updated. Temperature deﬁnes in which of

the three possible states (foraging, competition and summer

vacation) the system is which is essential for knowing the

next position. The location of food (in cases when used)

will be Xfood,Xshade the location of the cave, fthe ﬁtness

function, that grades each position and pis a random variable

depending on temp. More details on these variables can be

found in [28]. The initial positions of crayﬁshes is chosen by

X(0) =Lb+(Ub−Lb)∗rand where rand is a random variable

uniformly distributed in [0,1]. (Essentially a random point is

picked in the d-dimensional space between the given lower

and upper bounds for each crayﬁsh.) In each new time-step,

ﬁrst the temperature temp is updated. Take:

temp = 20 + 15 ∗rand (6)

where rand is a uniform random variable in [0,1] that is

updated in each step.

Depending on whether temp > 30 or not there are two sub-

cases. Algorithm enters in ﬁrst case if temp > 30 (summer

resort or competition stage). Then the two cases are separated

: if rand < 0.5it is in summer resort stage, otherwise it is in

competition stage. In this case, crayﬁsh will approach to cave

for summer resort, instead of ﬁghting. Then the values of X

will be updated according to the following equation:

X(t+1)

i,j =X(t)

i,j + (2 −t

T)∗rand ∗(Xshade −X(t)

i,j )(7)

To see how we update Xshade in each iteration, check [28].

This means that crayﬁshes start ﬁghting for a cave. To capture

the ﬁght, another variable zis introduced:

z:= round(rand ∗(n−1)) + 1 (8)

Then:

X(t+1)

i,j =X(t)

i,j −X(t)

z,j +Xshade.(9)

Algorithm enters second case if temp ≤30. Here, the

crayﬁsh go to food which is located on XG=Xfood. The

food size Qis deﬁned as:

Q= 3 ∗rand ∗f(X(t)

f(Xfood).(10)

Depending on the size of the food, crayﬁsh either splits it

into two parts, or it moves to it and eats it directly. If Q > 2,

then Xfood becomes e

−1

Q∗Xfood and

X(t+1)

i,j =X(t)

i,j +Xfood ∗p∗(cos(2π∗rand)−sin(2π∗rand))

(11)

Otherwise, if Q≤2, then:

X(t+1)

i,j = ((X(t)

i,j −Xfood) + rand ∗X(t)

i,j )∗p(12)

This gives the idea on how Xevolves in time. The ﬁnal X

at time Tis the solution that is taken as the result of COA for

hyperparameter selection.

This was a brief overview of COA. To see more details, check

[28].

Although the initial COA performs well, it has some short-

comings when evaluated using standard CEC [29] functions.

To address these limitations and enhance COA’s effectiveness,

this study introduces hybridization techniques. Inspired by the

Genetic Algorithm (GA) [14], we propose a new algorithm

called the genetically inspired COA (GICOA).

In the GICOA algorithm, a unique mechanism is activated

after each iteration, selecting a random agent and merging

it with the best solution found so far. Their parameters

2.2 TF-IDF

3.1 The Original Crayfish Optimisation Algorithm

3. Methods

3.2 Generically Inspired COA

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.20

Nebojsa Bacanin, Luka Jovanovic, Ilja Uzelac Bujisic,

Jelena Kaljevic, Jelena Cadjenovic,

Milos Antonijevic, Miodrag Zivkovic

E-ISSN: 2224-2872

213

Volume 23, 2024

are uniformly blended using a control parameter denoted as

pc, which is empirically set to 0.1. Additionally, parameter

mutation is introduced: this process selects a random value

within a speciﬁed range, then either adds or subtracts half of

this value from the parameter based on the mutation direction

parameter md, also set to 0.1.

After generating a new solution, the worst-performing so-

lution in the swarm is replaced by the new agent. The evalu-

ation of the new solution is deferred until the next iteration,

keeping the computational complexity the same as the original

algorithm. To provide a comprehensive understanding, the

pseudocode is presented in Algorithm 1.

Algorithm 1 Pseudo code of the introduced GICOA.

Generate population P

while current;itteration is less than maxiumum;itterations do

Assess agents in P

for Each agent Xin Pdo

Update locations by applying the COA search

Generate a new solution NS, with based on introduced mechanism

Mutate genome of NS

Replace the worst solution in Pwith NS

end for

end while

return The best solution attained within P

To evaluate the performance of the introduced optimizer and

the overall viability of the introduced approach a modiﬁed

real world curated dataset is used acquired form Kaggle 1.

The parse Kaggle parsed portion of the dataset is used for

simulations conducted in this work. Several contemporary

optimizers are subjected to a comparative analysis. The op-

timizers included in the comparisons include the modiﬁed

algorithm alongside the original COA [5]. Well established

optimizers are also evaluated such as the VNS [13], GA [14]

and SCA [15]. Further metaphor based optimizers such as the

BA [16], WOA [17] and HHO [18] are evaluated. Finally

the recently introduced BOA [19] are assessed. Algorithms

are independently implemented according to the parameter

settings provided in the original works that introduced the

optimizers.

AdaBoost parameters are subjected to optimization with

constraints provided in Table I. The respective constrains have

been empirically determined. Each optimizer was allocated

ten iterations and a population size of eight agents each.

Optimizations are carried out in 30 independent runs. Standard

classiﬁcations metrics are used to evaluate the tested optimizes

including accuracy, precision, f1-score and recall. Finally the

Cohen’s kappa [30] metric is used as the indicator function

due to the imbalance observed in the dataset.

TABLE I

ADABOOST HYPERPARAMETERS SUBJECTED TO OPTIMIZATION.

Parameter constraints

count of estimators [10,20]

depth [1,5]

learning rate [0.01,2]

1https://www.kaggle.com/datasets/saurabhshahane/cyberbullying-dataset

TABLE II

OBJECTIVE FUNCTION OUTCOME FOR THE CONDUCTED SIMULATIONS.

Method Best Worst Mean Median Std Var

AB-GICOA 0.479523 0.464299 0.471482 0.471334 0.005674 3.22E-05

AB-COA 0.463551 0.440244 0.454147 0.455728 0.006455 4.17E-05

AB-VNS 0.464634 0.448140 0.458246 0.458168 0.004878 2.38E-05

AB-GA 0.473523 0.452341 0.460195 0.458979 0.006538 4.27E-05

AB-BA 0.475901 0.444988 0.457535 0.454765 0.009970 9.94E-05

AB-SCA 0.473954 0.451173 0.461481 0.461740 0.007691 5.92E-05

AB-WOA 0.471007 0.450067 0.459053 0.458005 0.005476 3.00E-05

AB-HHO 0.464634 0.453648 0.458976 0.458444 0.003423 1.17E-05

AB-BOA 0.473954 0.453393 0.463898 0.464007 0.006615 4.38E-05

TABLE III

CAPTION

Method Best Worst Mean Median Std Var

AB-GICOA 0.221340 0.226190 0.223821 0.223986 0.001777 3.16E-06

AB-COA 0.227072 0.232363 0.229773 0.229497 0.002061 4.25E-06

AB-VNS 0.226190 0.231922 0.229112 0.229056 0.002434 5.93E-06

AB-GA 0.223545 0.234127 0.229112 0.228395 0.003575 1.28E-05

AB-BA 0.223986 0.233686 0.229497 0.229277 0.003837 1.47E-05

AB-SCA 0.223986 0.233686 0.228560 0.228836 0.003436 1.18E-05

AB-WOA 0.223986 0.236332 0.228671 0.227734 0.003386 1.15E-05

AB-HHO 0.226190 0.235009 0.229497 0.229056 0.002736 7.48E-06

AB-BOA 0.223986 0.233245 0.226852 0.225970 0.002866 8.21E-06

Objective function outcomes for classiﬁers optimized by

each model are provided in Table II. Outcomes are show-

cased in terms of best, worst, mean and median outcomes.

Furthermore details on algorithms stability are provided in

the form of the standard deviation and variance in outcomes

over 30 independent runs. As shown the introduced optimizer

outperformed competing models across al metrics including

stability.

Indicator function outcomes for classiﬁers optimized by

each model are provided in Table III. Outcomes are show-

cased in terms of best, worst, mean and median outcomes.

Furthermore details on algorithms stability are provided in

the form of the standard deviation and variance in outcomes

over 30 independent runs. As shown the introduced optimizer

outperformed competing models across all metrics including

stability.

Comparisons in terms of stability are provided in Figure 1.

The introduced modiﬁed algorithm outperforms competing

optimizers including the original version, providing a high rate

of stability as well as the highest quality of results.

Further detailed comparisons in terms of accuracy, per-class

precision as well as macro and weight averages are provided

in Table IV. The introduced optimize show the highest rate of

accuracy, as well as macro and weighted averages. However,

certain algorithms outperform the introduced optimizer in

terms of precision and recall. This is to be somewhat expected,

as according to the NFL no single optimizer performs equally

well for all tasks and across all metrics. Hence experimentation

is mandated in order to determine the optimal combination op

problem and optimizer.

Further details on an optimizes ability to avoid local optima,

and sufﬁciently explore the search space for good solutions

can be discerned form the respective optimizers convergence

graphs. Graphs for tested algorithms are provide in Figure 2.

The introduced optimizer manages to avoid local optima that

constrains other optimizers with insufﬁcient exploration ability

attaining the best comparative outcomes in the ninth iteration.

4. Experimental Setup

5. Experimental Outcomes

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.20

Nebojsa Bacanin, Luka Jovanovic, Ilja Uzelac Bujisic,

Jelena Kaljevic, Jelena Cadjenovic,

Milos Antonijevic, Miodrag Zivkovic

E-ISSN: 2224-2872

214

Volume 23, 2024

AB-GICOA

AB-COA

AB-VNS

AB-GA

AB-BA

AB-SCA

AB-WOA

AB-HHO

AB-BOA

Algorithm

0.43

0.44

0.45

0.46

0.47

0.48

0.49

Objective

Cyberbullying Kaggle harassment (TF-IDF) - objective violin plot diagram

AB-GICOA

AB-COA

AB-VNS

AB-GA

AB-BA

AB-SCA

AB-WOA

AB-HHO

AB-BOA

Algorithm

0.222

0.224

0.226

0.228

0.230

0.232

0.234

0.236

error

Cyberbullying Kaggle harassment (TF-IDF) - error box plot diagram

Fig. 1. Objective and indicator function distributions.

TABLE IV

CAPTION

Method metric non-harassment harassment accuracy macro avg weighted avg

AB-GICOA precision 0.785420 0.758377 0.778660 0.771899 0.775941

recall 0.906993 0.540881 0.778660 0.723937 0.778660

f1-score 0.841840 0.631424 0.778660 0.736632 0.768083

AB-COA precision 0.779138 0.753623 0.772928 0.766380 0.770194

recall 0.907671 0.523270 0.772928 0.715471 0.772928

f1-score 0.838507 0.617669 0.772928 0.728088 0.761097

AB-VNS precision 0.778746 0.758242 0.773810 0.768494 0.771558

recall 0.910387 0.520755 0.773810 0.715571 0.773810

f1-score 0.839437 0.617450 0.773810 0.728443 0.761624

AB-GA precision 0.783118 0.756228 0.776455 0.769673 0.773692

recall 0.906993 0.534591 0.776455 0.720792 0.776455

f1-score 0.840516 0.626382 0.776455 0.733449 0.765456

AB-BA precision 0.786350 0.746141 0.776014 0.766245 0.772256

recall 0.899525 0.547170 0.776014 0.723347 0.776014

f1-score 0.839139 0.631350 0.776014 0.735244 0.766303

AB-SCA precision 0.784325 0.751313 0.776014 0.767819 0.772754

recall 0.903598 0.539623 0.776014 0.721610 0.776014

f1-score 0.839748 0.628111 0.776014 0.733930 0.765563

AB-WOA precision 0.781341 0.759494 0.776014 0.770417 0.773683

recall 0.909708 0.528302 0.776014 0.719005 0.776014

f1-score 0.840652 0.623145 0.776014 0.731899 0.764410

AB-HHO precision 0.778746 0.758242 0.773810 0.768494 0.771558

recall 0.910387 0.520755 0.773810 0.715571 0.773810

f1-score 0.839437 0.617450 0.773810 0.728443 0.761624

AB-BOA precision 0.784325 0.751313 0.776014 0.767819 0.772754

recall 0.903598 0.539623 0.776014 0.721610 0.776014

f1-score 0.839748 0.628111 0.776014 0.733929 0.765563

support 1473 795

0246810

Iterations

0.445

0.450

0.455

0.460

0.465

0.470

0.475

0.480

Objective

Cyberbullying Kaggle harassment (TF-IDF) - objective convergence graphs

AB-GICOA

AB-COA

AB-VNS

AB-GA

AB-BA

AB-SCA

AB-WOA

AB-HHO

AB-BOA

0246810

Iterations

0.222

0.224

0.226

0.228

0.230

0.232

0.234

0.236

error

Cyberbullying Kaggle harassment (TF-IDF) - error convergence graphs

AB-GICOA

AB-COA

AB-VNS

AB-GA

AB-BA

AB-SCA

AB-WOA

AB-HHO

AB-BOA

Fig. 2. Objective and indicator function convergence plots.

TABLE V

OPTIMIZER MODEL PARAMETER SELECTIONS MADE BY OPTIMIZERS.

Methods Number of estimators Depth Learning rate

AB-GICOA 19 2 1.327088

AB-COA 16 2 1.267089

AB-VNS 20 2 1.185109

AB-GA 20 2 1.377726

AB-BA 20 2 1.371097

AB-SCA 20 2 1.320228

AB-WOA 17 2 1.346582

AB-HHO 17 2 1.332248

AB-BOA 20 2 1.324441

Additional details on the performance of the best perform-

ing model optimized by the introduced modiﬁed algorithm

mare produced in Figure 3 where the confusion matrix and

PR curved are shown. Parameter selections made by each

optimizer for the respective best performing models are also

provided to encourage experimental repeatability in Table V.

Detecting online harassment poses substantial hurdles due

to the wide-ranging scope and the anonymity it often involves.

Overcoming these challenges requires robust detection meth-

ods capable of handling large volumes of data and adapting

to the ever-evolving nature of online communication. Through

the fusion of NLP and metaheuristic algorithms, particularly

leveraging TF-IDF encoding and AdaBoost algorithm, this

work proposes an data driven approach. To address the com-

plexity of hyperparameter selection, a further contribution is

given in the form of a introduced modiﬁcation to the COA,

dubbed GICOA. Experimental outcomes conducted with real

world data suggest that the best-performing models achieving

an accuracy surpassing 77%, thus contributing signiﬁcantly to

fostering a safer online ecosystem.

6. Conclusion

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.20

Nebojsa Bacanin, Luka Jovanovic, Ilja Uzelac Bujisic,

Jelena Kaljevic, Jelena Cadjenovic,

Milos Antonijevic, Miodrag Zivkovic

E-ISSN: 2224-2872

215

Volume 23, 2024

0.4 0.5 0.6 0.7 0.8 0.9 1.0

recall

0.0

0.2

0.4

0.6

0.8

1.0

precision

AB-GICOA Cyberbullying Kaggle harassment (TF-IDF) - PR curve

no harassment AP:0.784

harassment AP:0.620

micro AP: 0.749

no harassment

harassment

Predicted label

no harassment

harassment

True label

0.907 0.093

0.459 0.541

AB-GICOA Cyberbullying Kaggle harassment (TF-IDF) - confusion matrix

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fig. 3. Best performing model PR plot and confusion matrix.

Future works will focus on further reﬁning the proposed

approach. Additionally, further applications for the proposed

optimizer will be explored.

[1] R. Kowalski, “Cyberbullying,” in The Routledge international handbook

of human aggression, pp. 131–142, Routledge, 2018.

[2] M. O. Lwin, B. Li, and R. P. Ang, “Stop bugging me: An examination

of adolescents’ protection behavior against online harassment,” Journal

of adolescence, vol. 35, no. 1, pp. 31–41, 2012.

[3] A. E. Mackey, Evaluating the evolution of the English language as seen

in TikTok slang. PhD thesis, Wichita State University, 2023.

[4] K. Chowdhary and K. Chowdhary, “Natural language processing,”

Fundamentals of artiﬁcial intelligence, pp. 603–649, 2020.

[5] H. Jia, H. Rao, C. Wen, and S. Mirjalili, “Crayﬁsh optimization

algorithm,” Artiﬁcial Intelligence Review, vol. 56, no. Suppl 2, pp. 1919–

1979, 2023.

[6] H. Christian, M. P. Agus, and D. Suhartono, “Single document automatic

text summarization using term frequency-inverse document frequency

(tf-idf),” ComTech: Computer, Mathematics and Engineering Applica-

tions, vol. 7, no. 4, pp. 285–294, 2016.

[7] T. Hastie, S. Rosset, J. Zhu, and H. Zou, “Multi-class adaboost,”

Statistics and its Interface, vol. 2, no. 3, pp. 349–360, 2009.

[8] A. Waswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez,

L. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017.

[9] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving

language understanding by generative pre-training,” 2018.

[10] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al.,

“Language models are unsupervised multitask learners,” OpenAI blog,

vol. 1, no. 8, p. 9, 2019.

[11] V. S. Tida and S. Hsu, “Universal spam detection using transfer learning

of bert model,” arXiv preprint arXiv:2202.03480, 2022.

[12] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training

of deep bidirectional transformers for language understanding,” arXiv

preprint arXiv:1810.04805, 2018.

[13] N. Mladenovi´

c and P. Hansen, “Variable neighborhood search,” Com-

puters & operations research, vol. 24, no. 11, pp. 1097–1100, 1997.

[14] S. Mirjalili and S. Mirjalili, “Genetic algorithm,” Evolutionary algo-

rithms and neural networks: Theory and applications, pp. 43–55, 2019.

[15] S. Mirjalili, “Sca: a sine cosine algorithm for solving optimization

problems,” Knowledge-based systems, vol. 96, pp. 120–133, 2016.

[16] X.-S. Yang and A. Hossein Gandomi, “Bat algorithm: a novel approach

for global engineering optimization,” Engineering computations, vol. 29,

no. 5, pp. 464–483, 2012.

[17] S. Mirjalili and A. Lewis, “The whale optimization algorithm,” Advances

in engineering software, vol. 95, pp. 51–67, 2016.

[18] A. A. Heidari, S. Mirjalili, H. Faris, I. Aljarah, M. Mafarja, and

H. Chen, “Harris hawks optimization: Algorithm and applications,”

Future generation computer systems, vol. 97, pp. 849–872, 2019.

[19] M. Hub´

alovsk´

a, ˇ

S. Hub´

alovsk`

y, and P. Trojovsk`

y, “Botox optimization

algorithm: A new human-based metaheuristic algorithm for solving

optimization problems,” Biomimetics, vol. 9, no. 3, p. 137, 2024.

[20] M. Zivkovic, L. Jovanovic, M. Ivanovic, N. Bacanin, I. Strumberger, and

P. M. Joseph, “Xgboost hyperparameters tuning by ﬁtness-dependent

optimizer for network intrusion detection,” in Communication and

intelligent systems: Proceedings of ICCIS 2021, pp. 947–962, Springer,

2022.

[21] M. Salb, L. Jovanovic, M. Zivkovic, E. Tuba, A. Elsadai, and N. Ba-

canin, “Training logistic regression model by enhanced moth ﬂame

optimizer for spam email classiﬁcation,” in Computer networks and

inventive communication technologies: Proceedings of ﬁfth ICCNCT

2022, pp. 753–768, Springer, 2022.

[22] A. Petrovic, M. Antonijevic, I. Strumberger, L. Jovanovic, N. Savanovic,

and S. Janicijevic, “The xgboost approach tuned by tlb metaheuristics

for fraud detection,” in Proceedings of the 1st international conference

on innovation in information technology and business (ICIITB 2022),

vol. 104, p. 219, Springer Nature, 2023.

[23] N. Bacanin, M. Zivkovic, L. Jovanovic, M. Ivanovic, and T. A. Rashid,

“Training a multilayer perception for modeling stock price index predic-

tions using modiﬁed whale optimization algorithm,” in Computational

Vision and Bio-Inspired Computing: Proceedings of ICCVBIC 2021,

pp. 415–430, Springer, 2022.

[24] L. Jovanovic, M. Zivkovic, M. Antonijevic, D. Jovanovic, M. Ivanovic,

and H. S. Jassim, “An emperor penguin optimizer application for

medical diagnostics,” in 2022 IEEE Zooming Innovation in Consumer

Technologies Conference (ZINC), pp. 191–196, IEEE, 2022.

[25] N. Bacanin, L. Jovanovic, M. Zivkovic, V. Kandasamy, M. Antonijevic,

M. Deveci, and I. Strumberger, “Multivariate energy forecasting via

metaheuristic tuned long-short term memory and gated recurrent unit

neural networks,” Information Sciences, vol. 642, p. 119122, 2023.

[26] L. Jovanovic, D. Jovanovic, N. Bacanin, A. Jovancai Stakic, M. An-

tonijevic, H. Magd, R. Thirumalaisamy, and M. Zivkovic, “Multi-step

crude oil price prediction based on lstm approach tuned by salp swarm

algorithm with disputation operator,” Sustainability, vol. 14, no. 21,

p. 14616, 2022.

[27] H. Christian, M. Agus, and D. Suhartono, “Single document automatic

text summarization using term frequency-inverse document frequency

(tf-idf),” ComTech: Computer, Mathematics and Engineering Applica-

tions, vol. 7, p. 285, 12 2016.

[28] H. Jia, H. Rao, C. Wen, and S. Mirjalili, “Crayﬁsh optimization

algorithm,” Artiﬁcial Intelligence Review, vol. 1, p. 1, 09 2023.

[29] J. J. Liang, B. Y. Qu, and P. N. Suganthan, “Problem deﬁnitions and

evaluation criteria for the cec 2014 special session and competition

on single objective real-parameter numerical optimization,” Computa-

tional Intelligence Laboratory, Zhengzhou University, Zhengzhou China

and Technical Report, Nanyang Technological University, Singapore,

vol. 635, no. 2, p. 2014, 2013.

[30] M. J. Warrens, “Five ways to look at cohen’s kappa,” Journal of

Psychology & Psychotherapy, vol. 5, 2015.

5HIHUHQFHV

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.20

Nebojsa Bacanin, Luka Jovanovic, Ilja Uzelac Bujisic,

Jelena Kaljevic, Jelena Cadjenovic,

Milos Antonijevic, Miodrag Zivkovic

E-ISSN: 2224-2872

216

Volume 23, 2024

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.20

Nebojsa Bacanin, Luka Jovanovic, Ilja Uzelac Bujisic,

Jelena Kaljevic, Jelena Cadjenovic,

Milos Antonijevic, Miodrag Zivkovic

E-ISSN: 2224-2872

217

Volume 23, 2024