
The research in [2] describes the development of an
assembling technique of three machine learning models
which are “Random Forest, Naïve Bayes and Linear
SVM” towards crop recommendation resulting in an
improved crop productivity. The system is based on soil
specific physical and chemical characteristics such as
“NPK, pH”, soil type, pores in soil , average rainfall,
sowing season and temperature of the surface. This has
been focused on two crops which are “Kharif and Rabi”
resulting in an accuracy of 99.91%. The challenge in this
work is limited crop used for recommendation though
dataset of these crops is large. Secondly, though the
accuracy achieved is higher, there has been no correlation
seen among features for crop recommendation. Also, they
have worked as ensemble method for classification using
all four models. There has been no error computed in
terms of precision and recall for showing how accurately
classification is done in terms of True Positive and True
Negatives.
Researchers in [3] have worked towards deploying four
machine learning models which are “SVM, Decision Tree,
Naïve Bayes, Random Forest, KNN, Logistic
Regression”. This resulted in selection of appropriate crop
using the machine learning model with a classification
accuracy of 89.66% for “SVM” followed by “KNN,
Random Forest, Naive Bayes, Decision Tree and Logistic
Regression”. Followed by the crop selected, the system
would suggest the past that would affect the crop followed
by recommendation towards pest control. This work has
focused on different crops with their selection. The model
has not achieved higher accuracy as the number of
samples of each crop is less for some crops and high for
some crops. It does not have a consistent number of
samples of each crop. There is also possibility of data
imbalance which has not been explored and correlation of
soil features contributing for crop selection also not
evaluated. There has been no error computed in terms of
precision and recall for showing how accurately
classification is done in terms of True Positive and True
Negatives.
The research paper [4] proposes a methodology for
improving agricultural crop yield by accounting for the
soil micro and macronutrient levels to predict crop
suitability. To achieve this, fuzzy logic and rough set rule
induction were used to create rules for the dataset and
evaluate different algorithms for their accuracy. The
results of the evaluation found that the “LEM2” algorithm
gave the highest prediction accuracy of 89% for the
dataset without fuzzy logic, while the AQ algorithm
showed better accuracy than others with 3 linguistic
variables. The paper concluded that the proposed
methodology could be used in all situations, as it could
help farmers to determine the crop that best suits their soil
type and could reduce soil erosion, as farmers could shift
to lesser-water intensive crops when water availability is
low. Though the work has considered 23 crops with 16
features for crop selection, there has been no usage of
machine learning model deployed for selection of crops
with higher accuracy.
3. Crop Recommendation Using
Machine Learning
So, based on literature reviewed pertaining to machine
learning for crop recommendation system based on soil
features and climatic condition, we in this paper have
worked on dataset collected pertinent towards tropical
climate for crop recommendation for Arid land. The
reason for choosing dataset pertaining to tropical climate
condition is towards training the model with soil features
for the recommendation of crop. The model trained and
evaluated would be used for crop recommendation for
Arid land based on features collected in real time. The
availability of Arid land dataset for crop recommendation
is not available as agriculture in Arid land is really
challenging and is one of the primary focusses of
Kingdom of Saudi Arabia. So before going into the
results and analysis of machine learning model, we
investigate methodology of proposed work pertaining to
following models which are “Support Vector machine,
Decision tree, Random Forest, K-Nearest Neighbor and
Naïve Bayes”. The details about the model are explained
in brief.
3.1 Random Forest Algorithm
“Decision Tree” are created based on variety of
samples and averaging and majority voting are employed
for classification and regressing. It can handle categorical
variables in the case of classification. Our project provides
solutions to the multiclass problem. The principle of
“Bagging” is employed for “Random Forest”. “Random
Forest” also employed an ensemble method known as
“Bootstrap Aggregation”. Each sample is trained
independently producing results. The final decision is
based on majority voting where the results of all models
merged which is termed as aggregation Description
3.2 Decision Tree
In a “decision tree” algorithm, a root node provides the
optimal split. It is represented by the predictor variable
which helps to divide the data set into two or more
subsets. Then the entropy or Gini Index is used to measure
the homogeneity of a split. It measures the potential for
information gain when splitting a node. Then, each node
has a split of the predictor variable which yields the best
homogeneity. Building a decision tree continues until all
the nodes are pure, meaning all the nodes are
homogeneous and belong to the same class. The process
stops here, and the final tree is used to make predictions.
3.3 Naïve Bayes
The “Naïve Bayes” algorithm is a “supervised learning”
method for classification problems. This method makes
the prediction based on probability which we call it as
“probabilistic classifier”. The machine learning model is
based on “Bayes Theorem” which calculates the
likelihood of hypothesis of data.
3.4 Support Vector Machine
“Support vector machine (SVM)” is utilized for
classification and regression. “Hyperplane” in N-
International Journal of Environmental Engineering and Development
DOI: 10.37394/232033.2023.1.7
Batool Alsowaiq, Noura Almusaynid, Esra Albhnasawi,
Wadha Alfenais, Suresh Sankaranarayanan