comparative analysis of these methodologies based
on acknowledged evaluation criteria. We
investigated the processing time of various
technologies using independent learning and
Apache Spark, given the criteria for effective real-
time analysis of smart city data. We proposed the
best model in terms of both processing time and
error rate in this research. The purpose of this
research is to use a variety of machine learning
models to provide an accurate and comparative
examination of air quality. The results of data
collection utilizing sensors at a certain site are
reviewed and compared to identify the highest
performing algorithm and accuracy.
The remainder of the paper is structured as
follows: The relevant work is briefly discussed in
Section 2 and how it differs from our submitted
study. The third section discusses machine learning
algorithms and how to use the model. The
proposed model, empirical evaluation method, and
data gathering process are all explained in Section
4. Section 5 contains the findings and their
commentary. Finally, Section 6 summarizes the
paper’s findings as well as future study directions.
2 Related Works
In this section, we mention some previous studies
related to our topic.
Martinez-Espana et al. [5] use of Machine
Learning algorithms to forecast particulate matter
concentration in the atmospheric air has been
discussed. Machine learning technologies such as
linear regression, ML Pregressor neighbors
regressor, Decision Tree regressor, and gradient
boosting regressor were used in the experiments.
The authors convincingly demonstrated that
Gradient Boosting Regression outperformed all
other techniques.
Hamami et al. [6] proposes air quality category
using class algorithms together with Logistic
Regression, KKN, decision Tree, and Random forest
set of rules. primarily based on experiment, decision
tree model has the first-rate accuracy to categories
air excellent degree as much as 100% with tuning
numerous hyper parameters.
Fernando et al. [7] have a look at has been to
find the maximum appropriate system studying
method for predicting accurate air quality index in
Colombo based upon PM2.five unique
awareness.PM2.five concentration in Colombo
were anticipated the usage of four correlated air
pollutant concentrations including SO2, NO2,
PM2.5, PM10. machine mastering techniques
consisting of k-Nearest Neighboring, more than
one Linear-Regression, Random forest, and support
Vector Machines have been used to educate and
examine the prediction models. Random forest
changed into identified because the excellent
appropriate prediction model after evaluating the
models, with over 85% extra accuracy.
Abirami et al. [8] predict AQI accurately with
utilizing the information set on one of a kind ML
version with proper pre-processing method for
locating nature regarding the air rests for the
maximum component signified with the aid of its
AQI (air high-quality index) value. It is tried to
gauge the air situation inside the bounds of a
definite region by using utilizing device learning
strategies like aid vector regression (SVR),
selection tree regression (DTR), a couple of linear
regression (MLR) and random forest regression
(RFR). RFR performed out the finest among all
regression examples.
Murugan et al. [9] implement machine learning
algorithms to discover the accuracy of the
prediction of particulate be counted, PM2.five
in air pollutants in smart towns . to test the
implementation of device studying on this
prediction, Multi-Layer Perceptron (MLP), and
Random forest are selected and in comparison
among those algorithms the usage of the Air
pollution dataset. The outcome of this studies is
that Random forest area gave the first-rate accuracy
in prediction of Particulate count number.
On this paper, Sinnott et al. [10] explored using
machine learning algorithms for prediction of
pollution and specifically PM2.five. Prediction of
pollutants activities is more and more essential in
primary cities due to the increased urbanization of
populations and associated effect on site visitors
volumes. statistics from a selection of
heterogeneous assets became used and concerned
collection and cleaning for use in machine learning
algorithms. Linear regression models, ANN models
and LSTMs had been all explored. It turned into
found that LSTM completed best and became able
to are expecting high PM2.5 values with affordable
accuracy. ANN and linear models have drawbacks
in prediction of high PM2.five values but they
provided reasonable standard performance.
WSEAS TRANSACTIONS on COMPUTERS
DOI: 10.37394/23205.2022.21.30
Kamel Maaloul, Lejdel Brahim