X-Ray Images Analytics Algorithm based on Machine Learning
VESKA GANCHEVA, IVAYLO GEORGIEV, VIOLETA TODOROVA
Technical University of Sofia,
BULGARIA
Abstract: - The rapid development of information technology has led to a huge amount of data generated by
large or complex systems and devices. Applications in information technology, medicine, and many other fields
generate large volumes of data that challenge analysts. Data mining analysis finds application in areas where
statistical and analytical methods and the models built through them are not sufficient. The paper discusses
sources of medical data, use cases, and data analysis in medicine, as well as methods and algorithms for data
analysis. The purpose and objectives of the study, presented in the paper are to propose an algorithm for
processing X-Ray images based on tools and techniques from the field of machine learning. The preprocessing
phase is concerned with image transformation, feature extraction, and the selection of training and testing
datasets. Preprocessing data enables the processing of data that would not otherwise be appropriate by adjusting
the data to the specifications established by each data retrieval procedure. Each feature is examined in the
second stage to identify and classify any potential patterns. In the final stage, the most effective model to
capture the pattern or behaviour of the data is chosen using a machine learning algorithm. The proposed
algorithm is verified using publicly available X-Ray image datasets consisting of four classes: Normal, Lung
Opacity, Pneumonia, and COVID-19. A medical image classification workflow was designed for verification.
In the experimental workflow, five algorithms in the field of machine learning are determined and
implemented: Logistic Regression, Naive Bayes, Random Forest, SVM, and Neural Network. In comparison to
the outcomes of Random Forest, Logistic Regression, Naive Bayes, and SVM, the findings of the experimental
analysis and results demonstrate that Neural Networks produce the greatest results, and these results can be
taken to be the most dependable.
Key-Words: - Covid-19, data analytics, image classification, machine learning, X-Ray images.
Received: May 24, 2022. Revised: February 18, 2023. Accepted: March 26, 2023. Published: April 28, 2023.
1 Introduction
Nowadays, the growth of computer simulations has
led to the accumulation of huge amounts of data.
Medicine is a fundamental field highly dependent
on big data technologies. This stimulates the
progress of data processing technologies and
methods.
Techniques for big data provide opportunities to
collect large sets of biological samples and store,
manage, and analyze the data, [1].
Machine learning algorithms allow the generation
of additional data other than the original input data,
thereby creating knowledge from data.
Rapid Learning Healthcare (RLHC) models using
artificial intelligence can detect data of varying
quality when compared to validated datasets. The
extracted result is processed through decision
support systems (DSS) to provide knowledge-based
healthcare.
1.1 Sources of Medical Data
Sources of big data in healthcare include smart
devices, genetic databases, government, and more
(Fig. 1).
Fig. 1: Sources of data for healthcare
Health data
Research
Patient's
data
Public
records
Electronic
health
records
Internet
of things
Governments
Other
clinical
data
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
136
Volume 20, 2023
1) Internet of Things (IoT):
• Wearable devices
• Smartphone applications
• Medical devices and sensors
2) Electronic Medical Records/Electronic Health
Records (EMR/EHR).
3) Other clinical data.
1.2 Big Data Use Cases and Data Analytics in
Medicine
The insights gained from analyses of data provide
medical professionals with knowledge not
previously available, [2], [3]. In medicine, data is
used throughout the entire healthcare cycle: medical
examinations, laboratory data, patient history, and
outcomes. Real-world applications demonstrate how
an analytics approach can improve processes,
improve patient care, and ultimately save lives.
1) Diagnostics
2) Modeling and forecasts
3) Real-time monitoring of the patient's vital
signs
4) Treatment of serious diseases
5) Population health
6) Preventive care
7) Electronic Health Records (EHRs)
8) Telemedicine
9) Real-time alerts
10) Data integration with medical images
1.3 Related Works
The respiratory system is the principal organ
affected by COVID-19, a viral illness. Numerous
medical experts in numerous sectors continue to
look for novel ways to identify and treat the disease.
The radiographic lung effects of COVID-19 have
received a lot of attention from researchers, [4].
Healthcare is increasingly utilizing artificial
intelligence and machine learning. The analyses
carried out using cutting-edge technology are
improving in precision and applicability across
many areas of medicine, facilitating quicker and
occasionally more accurate clinical diagnosis.
It is crucial to analyze X-ray images to diagnose
respiratory diseases, especially pneumonia brought
on by COVID-19 infection, [5], [6], [7], [8].
Machine learning and convolutional neural
networks, two disciplines of artificial intelligence
that deal with image analysis, are the foundation of
this type of diagnostic system. The most well-
known uses of convolutional neural networks and
machine learning include the detection of images,
image segmentation, and image classification.
The diagnosis of respiratory illnesses and the
detection of pneumonia brought on by the COVID-
19 disease are two areas where image classification
is directly applied. The network training process
requires significant system and time resources in
addition to a sufficient number of pre-classified X-
ray images.
A model for automatic COVID-19 detection
using raw chest X-ray images is designed, [9], to
give precise diagnostics for binary classification
(COVID vs. No-Findings) and multi-class
classification (COVID vs. No-Findings vs.
Pneumonia). For binary classes, the model provided
a classification accuracy of 98.08%, and for multi-
class scenarios, it produced an accuracy of 87.02%.
Five different algorithms for machine learning
(such as Support Vector Machine (SVM), K-Nearest
Neighbors, Random Forest, Naive Bayes Algorithm,
and Decision Tree) were used for classifying X-ray
images between COVID-19 and normal, [10].
Evaluation of results shows that SVM ensures the
highest accuracy from 96% among the remaining
four classifications (K-Nearest Neighbors и Random
Forest - 92% accurate, 90% accurate on the
algorithm Naive Bayes и 82% accurately on
Decision Tree).
The research described in the paper aims to
propose a workflow for automated machine
learning-based X-ray image processing, which
includes preprocessing phases such as image
transformation, feature extraction, selection of
training data sets and testing; classification of
images into four classes; and efficiency evaluation.
The paper is structured as follows: Methods for
data analytics, including types of data mining
regularities and data analysis tools, are discussed in
Section 2. Section 3 presents knowledge discovery
based on biomedical data analytics and a conceptual
model for biomedical data analytics. The proposed
algorithm for medical image classification selected
experimental datasets, and workflow for Covid-19
image analysis are explained in Section 4. Some
discussion of the results of this study is concluded in
Section 5.
2 Methods for Data Analytics
Data mining is the analysis of stored data
concerning extracting knowledge by uncovering
hidden relationships between ostensibly unknown
and unrelated quantities, [11], [12]. An important
feature of data mining is that it provides an
opportunity to process multidimensional arrays and
extract multidimensional dependencies while
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
137
Volume 20, 2023
automatically revealing exceptional situations - data
and dependencies that are not obvious in general
patterns. As a result, hypotheses are formed to
reveal relations between components and
parameters.
The need for a useful yet comprehensive sample
of processed data is summarized as follows:
• Unlimited volume of data;
• Great data heterogeneity and variety;
• Specific and understandable results;
Tools for data processing enable the possibility
of simply systematizing and using data.
Modern data mining technologies are built
around the idea of patterns or models that reflect the
complex relationships between data. These
templates are a collection of regularities that execute
the selection of data based on specified qualities and
present it in ways that are user-friendly and
acceptable.
The unpredictable nature of patterns discovered
is a key benefit of data mining analysis. As a result,
the patterns that are found should show unexpected
dependencies in the data that contribute to the so-
called hidden meanings. This led to the hypothesis
that "raw" data includes far deeper levels of
concealed knowledge that can only be exposed after
a thorough, in-depth analysis. Table I shows this
information. Data mining searches from top to
bottom to uncover deeply concealed data that OLAP
cannot reveal and analyze. Although OLAP systems
and data mining analyses are closely related, there
are important differences between the two
techniques.
Table 1. Data mining technologies
Type of used
technology
The
knowledge
contained in
stored data
Analytical tools used
Top-down
Surface
Language of simple
questions
Shallow
Operational Analytical
Processing
Bottom to top
Hidden
Data mining - extraction
of the data
The multidimensional analytical technology that
enables the efficient use of data kept in a data
warehouse is known as OLAP (Online Analytical
Processing). It typically comprises tools for
interactively analyzing data that has been gathered
for a particular user's needs after being retrieved
from numerous databases. The ability to exhibit data
in several portions makes OLAP technologies
substantially more complicated than conventional
relational databases.
Although data mining also entails technology
that allows for the detection of latent patterns and
relationships in various samples, it is also employed
for data analysis. Additionally, there are so-called
Data Marts, which are essentially local Data
Warehouses and hold subsets of aggregated data.
2.1 Types of Data Mining Regularities
Five types of regularities allow data mining analysis
to be implemented.
1 Association: It is applied in cases where
several events are related to each other.
2 Consistency: If there is a chain of events, it is
said to be a sequence.
3 Classification: Signs characterizing the groups
in which a given object is included are revealed.
This is done by analyzing the classifiable objects
and formulating certain rules.
4 Clustering: Homogeneous groups of data
having related characteristics are extracted from the
classification sets.
5 Prediction: The information found in the data
warehouse serves as the foundation for
contemporary projections using data mining
analysis. They serve as the foundation for the
creation of templates that reflect the dynamics of the
behavior of the goal indicator and allow for the
prediction of future system behavior. A data
warehouse is described as a collection of subject-
specific, integrated databases with time-sensitive
data that are organized to help with decision-
making. Both standalone and heavily aggregated
data are present in this dataset.
2.2 Data Analysis Tools
There is a wide range of tools to support data
analysis. This includes both commonly available
algorithms for visualization and machine learning,
as well as complex software packages operating
based on parallel processors. The use of the most
suitable tool for performing data mining analysis is
determined by the conditions and objectives of the
project. When choosing tools or algorithms,
flexibility is very important - how far the chosen
strategy can achieve the desired result.
There are numerous stages in the creation of data
analytics applications:
Step 1: The project's scope is set, identifying the
data that have to be gathered. The project has to be
designed to accomplish certain objectives.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
138
Volume 20, 2023
2nd step: Create databases. The necessary data
can be spread over numerous databases. To get rid
of inconsistencies, data from several applications
need to be combined and aggregated. The
techniques used to create samples from databases
based on specific properties should not be altered as
data analysis advances. This includes checking for
integrity and processing existing values. The
accuracy of data mining depends on the quality of
the information chosen as its basis.
Step 3: Quantify the data elements. Collaboration
is subject-domain experimentation that helps answer
questions and isolate the data elements that make
the most sense for business needs.
Step 4: Sample data analytics algorithms for
determining the relationship between data. It is
possible to use several different algorithms to obtain
the necessary dependencies. Some of them can be
used at the beginning of the process and others at
the end. Sometimes several parallel algorithms can
also be used to obtain data of different accuracy.
Step 5: Study the relationships manifested at the
previous stage of project implementation. At this
stage, the help of an expert in the relevant subject
area may be needed. He determines whether these
ratios are specific or general and indicates in which
area the analysis should continue.
Step 6: Presentation of the results in the form of
a report revealing the recalculations for all
interpreted relations. Such a report is beneficial
when the expert can apply a creative approach to
analyzing the data and its benefits.
During the development of a data mining project,
other factors also influence the type of end
application; availability and status of data
repositories; the volume of data, its variety, and
characteristics.
5. System for visualization of the data obtained
from the analysis.
The system occupies an important place. It
provides a graphical presentation of the obtained
data - graphs, diagrams, schemes, tables, etc. This is
done as the visualization system supports an
interface, allowing easy association of the analyzed
indicators with the various parameters of the
diagrams.
3 Biomedical Data Analysis
The process of knowledge discovery based on data
analysis consists of the following main stages,
presented in Fig. 2, [13], [14], [15]:
Fig. 2: Knowledge Discovery and results
interpretation
As in many cases the data are imperfect,
containing inconsistencies and abbreviations, they
cannot be directly applicable to start the data
retrieval process. The rapid increase in the
generation rate must also be considered when
analyzing data and their size in various academic
and scientific applications.
Most of the data collected requires more
complex analysis mechanisms. Data preprocessing
aims to adapt the data to the requirements set by
each data retrieval algorithm, which allows data to
be processed that would otherwise be inappropriate.
The data analytics conceptual model is presented
in Fig. 3. The overall process related to the
extraction and interpretation of data patterns
includes the repetition of several steps described
below, [16], [17], [18], [19]:
Fig. 3: A conceptual model for biomedical data
analytics
Defining the goal of the process for knowledge
discovery - defining the task and corresponding
prior knowledge and its application.
Defining scope, appropriate end-user knowledge,
and goals.
Creating a target dataset: choosing a dataset or
selecting variables, sets, or instances of data.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
139
Volume 20, 2023
Filtering and preparation: removal of redundant
or negative values; collecting the needed data for
modeling; the processing of missing data fields
Data set simplification by deleting unwanted
variables: finding suitable data presentation
futures about the task purpose; applying
measurement or conversion tools to reduce the
number of variables considered.
Combining the process of data discovery and
techniques for data extraction to determine the
purpose of the process: classification, clustering,
regression, etc.
Selecting the data extraction algorithm. This
includes appropriate models and parameters for
the overall process: selecting the method for
searching a model in the data; determining
appropriate models and parameters; and ensuring
compliance of a method for data extraction with
the general criteria.
Data extraction - searching for interesting models
such as clustering, regression, classification
rules, or trees.
Interpretation of knowledge derived from
models.
Using knowledge and integrating it into another
system for further action.
4 Materials and Methods
4.1 Algorithm for Medical Images
Classification
As in many cases the data are imperfect, containing
inconsistencies and abbreviations, they cannot be
directly applicable to start the data retrieval process.
The rapid increase in the rate of generation also has
to be taken into account when analyzing data and
their size in various academic and scientific
applications. Most of the data collected requires
more complex analysis mechanisms. Data
preprocessing aims to adapt the data to the
requirements set by each data retrieval algorithm,
which allows data to be processed that would
otherwise be inappropriate.
An image analysis algorithm using several
machine learning-based algorithms is proposed (Fig.
4). The preprocessing phase is concerned with
image transformation, feature extraction, and the
selection of training and testing datasets.
The most effective way to use the data is to
create a suitable model and address the specific
issue once it is determined. As a result of their
foundation in mathematical functions and intricate
calculations, the majority of machine learning
algorithms only accept numbers as input and output.
The input representations are consequently
transformed into numerical values.
The variables that will be used in the model are
chosen once the images being used have been
converted. The foundation of analysis and
classification models is selecting the appropriate
features.
Data preparation, cleansing, and selection;
knowledge discovery; decision making and
involving findings; and interpreting accurate
decisions from observed results are some of the
research approaches that are used to extract valuable
knowledge from a collection of data. The
preprocessing of the training and testing datasets is
one of the duties of the preprocessing phase.
Preprocessing of data includes feature extraction,
feature selection for relevance, and data cleaning for
correctness. Because it contains data that can be
used to train the system to recognize particular
patterns, feature selection is crucial. Using a training
dataset and machine learning techniques for
classification, the training phase tries to build a
repository of models.
The second phase involves analyzing every
feature to find and categorize any potential patterns.
In the final step, an machine learning algorithm is
used to select the best model to capture the behavior
or pattern of the data. The machine learning phase
utilizes many classification techniques and runs on
training and validation datasets. Following the
process of feature extraction and data set reduction,
the analytical model is designed. Different
categorization models are produced as a result, and
these models are then utilized to construct an
analytical workflow. A validation dataset is used to
validate the outcome.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
140
Volume 20, 2023
Fig. 4: Algorithm for processing medical images based on machine learning
4.2 Selecting a Dataset
The publicly accessible dataset that was utilized to
train and test the suggested approach consists of
numerous datasets with four distinct classes,
including Normal, COVID-19, Pneumonia, and
Lung Opacity, [20], [21], [22], [23]. Different data
sets are combined to produce each class. There are
3,616 images in the COVID-19 class total, which
were gathered from four various sources. A total of
2473 images were taken from the Padchest dataset,
[24]. It represents one of the biggest freely
accessible independent databases. Other data sets
include 183 chest X-ray images from the German
Medical School, [25]. SIRM, GitHub, Kaggle, and
Twitter were used to gather 560 chest X-ray images,
[26], [27], [28], [29]. Additionally, a dataset, [30],
with 400 gathered chest X-ray images is accessible
on GitHub.
There are 10192 images of healthy lungs in the
dataset of normal lung images, of which 8851 are
from RSNA, [31], and 1341 are from Kaggle, [32].
1345 images had viral pneumonia, [32], while 6012
images were classified as non-COVID-19 lung
infection (lung opacity), [31].
4.3 Workflow for Covid-19 Images Analysis
The proposed framework is used to classify the
selected medical images by applying different
machine-learning algorithms. Five machine learning
algorithms are selected to validate the proposed
approach: Random Forest, Logistic Regression,
Naive Bayes, Neural Network, and SVM. The
experimental workflow is implemented using the
Orange Data Mining tool.
The Category characteristic is defined as a target
for the classification. 66% of the data is designated
as the training data set, while the remaining data is
the test data set.
The designed workflow for medical image
analysis consists of five connected components and
is presented in Fig. 5. Four stages of image
classification are included: 1) Covid-19 image
loading using the component Import Images and
image visualization by the Image Viewer
component. 2) Feature extraction is performed
through the component Inception v.3 Image
Embedding. Metadata consists of category, image
name, size, width, and height. Additionally, for each
image, there are also 2048 features. 3) Distance
calculation for similarity assessment using Cosine
distance. 4) Hierarchical clustering by categories.
Fig. 6 presents hierarchical clustering. 5)
Performance evaluation of accuracy.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
141
Volume 20, 2023
Fig. 5: Workflow for Covid-19 images analysis
The experimental results obtained by the five
classification algorithms are shown in Fig. 7.
Fig. 6: Graphical representation of hierarchical
clustering
Fig. 7: Test and score of classification models
The confusion matrix represents the number of
cases in the actual and predicted classes. Through
the matrix, it is observed that specific cases are not
classified correctly. Fig. 8 (a), (b), (c), (d), (e) show
the confusion matrix generated by Random Forest,
Logistic Regression, Naive Bayes, Neural Network,
and SVM models. Accuracy ranges from 0 (lowest
accuracy) to 1 (highest accuracy). The Neural
Network classification algorithm achieves the best
result: 0.964.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
142
Volume 20, 2023
(a)
(b)
(c)
(d)
e)
Fig. 8: Generated classification matrix by methods:
(a) Logistic Regression; (b), Naïve Bayes; (c) SVM;
d) Random Forest; e) Neural Network
The ROC analysis compares the true positive
rate and the false positive rate. The results of testing
classification algorithms are input for ROC analysis.
The widget displays the convex hull and ROC
curves for each of the tested models. It acts as a
benchmark for contrasting categorization models.
On the graph, the true positive rate is shown on the
y-axis (sensitivity), while the false positive rate is
shown on the x-axis (1-specificity; a chance that
target=1 when true value=0). The classifier is more
accurate the more closely the curve resembles the
left-hand border and then the top border of the ROC
space. The widget can also choose the ideal
classifier and threshold based on the costs of false
positives and false negatives. Fig. 9 presents the
ROC curves through which the classifiers can be
observed and the results for the selected
classification models can be compared.
Fig. 9: ROC analysis of classification models
5 Conclusion
This paper presents an approach based on machine
learning algorithms for the classification of X-ray
images. An algorithm for X-ray medical image
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
143
Volume 20, 2023
processing based on machine learning is proposed.
The preprocessing phase is concerned with image
transformation, feature extraction, and the selection
of training and testing datasets.
Regarding system testing and validation
purposes, a workflow is developed to classify X-ray
images and determine accuracy and probability in
medical data analysis.
An open-source X-ray image dataset was utilized
to train and test the suggested approach consisting
of numerous datasets with four distinct classes,
including Normal, COVID-19, Pneumonia, and
Lung Opacity.
Experiments are performed based on Naive
Bayes, Random Forest, Logistic Regression, Neural
Network, and SVM methods and are aimed at
accuracy and probability in the analysis of medical
images.
The analysis done shows the best results in the
case of the Neural Networks classification algorithm
and can be assumed to be the most reliable in
comparison with the results in the cases of Random
Forest, Logistic Regression, Naive Bayes, and
SVM.
Acknowledgment:
The research presented in this paper is financially
supported by the Bulgarian Ministry of Education,
National Science Fund, grant KP-06-N37/24.
References:
[1] Mallappallil M., et al, A Review of Big Data
and Medical Research, SAGE Open Med, vol
8, 2020, doi: 10.1177/2050312120934839.
[2] Belle A., et al, Big Data Analytics in
Healthcare. Biomed Res Int, 2015, doi:
10.1155/2015/370194.
[3] Esfandiari N., et al, Knowledge Discovery in
Medicine: Current Issue and Future Trend,
Expert Systems with Applications, vol. 41,
2014,
https://doi.org/10.1016/j.eswa.2014.01.011.
[4] Soffer S., et al, Convolutional Neural
Networks for Radiologic Images: A
Radiologist’s Guide, Radiology, 2019, Vol.
290, No. 3,
https://doi.org/10.1148/radiol.2018180547.
[5] Borkowski A., et al, Using Artificial
Intelligence for COVID-19 Chest X-ray
Diagnosis, Federal practitioner: for the health
care professionals of the VA, DoD, and PHS,
2020, https://doi.org/10.12788/fp.0045.
[6] Reshi A., et al, An Efficient CNN Model for
COVID-19 Disease Detection Based on X-
Ray Image Classification, Complexity, vol.
2021, https://doi.org/10.1155/2021/6621607.
[7] Wang S., et al, A deep learning algorithm
using CT Images to Screen for Corona Virus
Disease (COVID-19). Eur Radiol 31, 6096–
6104, 2021, https://doi.org/10.1007/s00330-
021-07715-1.
[8] Arias-Garzón D., et al, COVID-19 Detection
in X-ray Images Using Convolutional Neural
Networks, Machine Learning with
Applications, Volume 6, 2021, 100138, ISSN
2666-8270,
https://doi.org/10.1016/j.mlwa.2021.100138.
[9] Ozturk T., et al, Automated Detection of
COVID-19 Cases Using Deep Neural
Networks with X-ray Images, Comput. Biol.
Med., 2020, doi:
10.1016/j.compbiomed.2020.103792.
[10] Imad M., et al, COVID-19 Classification
Based on Chest X-Ray Images Using Machine
Learning Techniques, J. Comput. Sci.
Technol. Stud., vol. 2, no. 2, pp. 01–11, 2020.
[11] Hlaing K. S., Thaw Y. M. K. K.,
Applications, Techniques and Trends of Data
Mining and Knowledge Discovery Database,
International Journal of Trend in Scientific
Research and Development, vol. 3, 2019, pp.
1604-1606.
[12] Pushp, Chand S., Knowledge Discovery and
Data Mining for Intelligent Business
Solutions, Advances in Data and Information
Sciences. Lecture Notes in Networks and
Systems, vol. 318, 2022,
https://doi.org/10.1007/978-981-16-5689-
7_18.
[13] Pareek M., Bhari P., A Review Report on
Knowledge Discovery in Databases and
Various Techniques of Data Mining, Open
Access International Journal of Science and
Engineering, 2020. pp. 79-82.
[14] Parihar A., Sharma S., Knowledge Discovery
and Data Mining Healthcare, International
Journal of Information Technology Insights &
Transformations, vol. 4, Issue 1, 2020.
[15] Borovska P., Gancheva V., Georgiev I.,
Platform for Adaptive Knowledge Discovery
and Decision Making Based on Big Genomics
Data Analytics, Bioinformatics and
Biomedical Engineering, Lecture Notes in
Computer Science, vol. 11466, Springer, pp.
297–308, https://doi.org/10.1007/978-3-030-
17935-9_27.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
144
Volume 20, 2023
[16] Janiesch C., Zschech P., Heinrich K., Machine
Learning and Deep Learning, Electronic
Markets, vol. 31, pp. 685-695.
[17] Sun S., et al, A Survey of Optimization
Methods from a Machine Learning
Perspective, IEEE Transactions on
Cybernetics, vol. 50, no. 8, 2019, pp. 3668-
3681.
[18] Sarker I. H., Machine Learning: Algorithms,
Real-World Applications and Research
Directions, SN Computer Science, 2021, 1-21,
https://doi.org/10.1007/s42979-021-00592-x.
[19] Mahesh B., Machine Learning Algorithms - a
Review, International Journal of Science and
Research, vol. 9, 2020, pp. 381-386.
[20] Chowdhury M., et al, Can AI Help in
Screening Viral and COVID-19 Pneumonia?,
IEEE Access, Vol. 8, 2020, pp. 132665 -
132676.
[21] Khan E., et al, Chest X-Ray Classification for
the Detection of COVID-19 Using Deep
Learning Techniques, Sensors, 2022, doi:
10.3390/s22031211.
[22] Rahman T., et al, Exploring the Effect of
Image Enhancement Techniques on COVID-
19 Detection using Chest X-Ray Images,
arXiv, https://doi.org/10.48550/
arXiv.2012.02238.
[23] COVID-19 Radiography Database,
https://www.kaggle.com/tawsifurrahman/covi
d19-radiography-database (accessed 17 April
2023).
[24] BIMCV-COVID19 Database,
https://bimcv.cipf.es/bimcv-projects/bimcv-
covid19/ (accessed 17 April 2023).
[25] COVID-19-Image-Repository,
https://github.com/ml-workgroup/covid-19-
image-repository/tree/master/png (accessed 17
April 2023).
[26] Chen R., et al, Risk Factors of Fatal Outcome
in Hospitalized Subjects with Coronavirus
Disease 2019 from a Nationwide Analysis in
China, Chest, 2020, 158, 97–105, doi:
10.1016/j.chest.2020.04.010.
[27] Weng Z., et al, ANDC: An Early Warning
Score to Predict Mortality Risk for Patients
with Coronavirus Disease 2019, Journal of
Translational Medicine, 328, 2020.
[28] Liu L., et al, Neutrophil-to-Lymphocyte Ratio
Predicts Severe Illness Patients with 2019
Novel Coronavirus in the Early Stage, Journal
of Translational Medicine, 2020, doi:
10.1186/s12967-020-02374-0.
[29] Huang I., Pranata I., Lymphopenia in Severe
Coronavirus Disease - 2019 (COVID-19):
Systematic Review and Meta-Analysis,
Journal Intensive Care, 2020, https://doi.org/
10.1186/s40560-020-00453-4.
[30] COVID-CXNet Dataset,
https://github.com/armiro/COVID-CXNet
(accessed 17 April 2023).
[31] RSNA Pneumonia Detection Challenge,
ttps://www.kaggle.com/c/rsna-pneumonia-
detection-challenge/data (accessed 17 April
2023).
[32] Chest X-ray Images (Pneumonia) Dataset,
https://www.kaggle.com/paultimothymooney/
chest-xray-pneumonia (accessed 17 April
2023).
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
-Veska Gancheva proposed the methodology and
algorithm.
-Ivaylo Georgie investigated methods for data
analytics.
-Violeta Todorova executed the experiments.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
The research presented in this paper is financially
supported by the Bulgarian Ministry of Education,
National Science Fund, grant KP-06-N37/24.
Conflict of Interest
The authors have no conflicts of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.16
Veska Gancheva,
Ivaylo Georgiev, Violeta Todorova
E-ISSN: 2224-3402
145
Volume 20, 2023