A Review of Machine Learning Models to Detect Autism Spectrum

Disorders (ASD)

PRASENJIT MUKHERJEEa, SOURAV SADHUKHANb, MANISH GODSEc

aDept. of Technology

Vodafone Intelligent Solutions

Pune

INDIA

aDept. of Computer Science

Manipur International University

Manipur

INDIA

bDept. of Business Management

Pune Institute of Business Management

Pune

INDIA

cDept. of IT

Bizamica Software

Pune

INDIA

Abstract: - Autism Spectrum Disorder (ASD) is a neurodevelopmental condition that can manifest in a variety

of ways. One common characteristic is difficulty with communication, which may manifest as difficulty

understanding others or expressing oneself effectively. Social interaction can also be challenging, as individuals

with ASD may struggle to comprehend social cues or adapt to new situations. Many machine-learning models

have been developed or are in progress to detect ASD automatically. Three machine learning model-based

frameworks have been studied and elaborated on, each with a clear concept of the detection of ASD among

children and adults. This research paper has done a closer review of these frameworks and their datasets to

diagnose ASD automatically. In the first framework, deep learning models such as Xception, VGG19, and

NASNetMobile have been utilized for the detection of autism spectrum disorder (ASD). In addition, other

models such as XGBoost, Neural Network, and Random Forest have been employed in the second framework

to detect ASD from a clinical standard screening dataset for toddlers. Meanwhile, the third framework involves

traditional machine learning models that have been trained using the UCI dataset for ASD. The accuracy of

each model has been discussed and elaborated on.

Key-Words: - Deep Learning, Autism Spectrum Disorder, Machine Learning, ASD Detection, ML-based

Framework, Traditional Machine Learning

Received: June 23, 2023. Revised: August 11, 2023. Accepted: September 14, 2023. Published: October 5, 2023.

1 Introduction

Autistic children often have difficulty understanding

and responding to social cues, so they may not know

how to start and maintain conversations.

Additionally, they may have difficulty

understanding abstract concepts and may be more

comfortable with concrete concepts. They may also

have trouble interpreting sensory information, such

as touch or sound, which can lead to sensory

overload. Finally, autistic children may be obsessed

with certain topics or routines due to their difficulty

processing changes in their environment. These

difficulties have been attributed to the lack of

reliable and valid screening instruments, the wide

range of severity of ASD symptoms, and the overlap

of symptoms with other disabilities. Additionally,

early intervention can be expensive and may not

always be available, depending on the particular

situation of the family, as in [1]. ASD is a

neurodevelopmental problem of the brain that has a

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

177

Volume 22, 2023

wide range of symptoms and severity [2]. ASD has

been included in the International Statistical

Classification of Diseases and Related Health

Problems (ISCDRHP) under the category of mental

and behavioral disorders, as in [3]. The symptoms

may appear in the first year of a toddler with less

eye contact and poor responses, as in [4] [5] [6] [7].

People with autism may experience difficulties in

social communication, such as difficulty

understanding body language, facial expressions,

and the meaning of words. They may also struggle

with sensory processing, such as being over or

under-sensitive to certain sounds, textures, lights,

and tastes, as in [8]. It is characterized by

difficulties with social interaction and

communication, as well as restricted, repetitive

behaviors. It is typically diagnosed in early

childhood and can last throughout a person's

lifespan. Symptoms of autism spectrum disorder are

usually noticeable before the age of three and can

range from difficulty communicating and interacting

with others to repetitive behaviors and

hypersensitivity to certain stimuli. These symptoms

can vary greatly in severity and type between

individuals. Machine learning algorithms can be

used to analyze patterns in the behavior of children

with autism and detect any abnormalities that might

indicate the presence of autism. This can help

clinicians diagnose the condition earlier and begin

treatment sooner, which can improve the outcome

for the child. Autism Spectrum Disorder (ASD) is a

neurodevelopmental condition that affects social

communication, behavior, and sensory processing.

Early identification and intervention are crucial to

improving outcomes and quality of life for

individuals with ASD. Some of the core features of

ASD include difficulties with social interaction,

communication, and repetitive behaviors or

interests. These challenges can make it difficult for

individuals with ASD to form and maintain

relationships, understand social cues, and participate

in everyday activities. Although the exact causes of

ASD are not fully understood, research suggests that

a combination of genetic and environmental factors

may contribute to its development. While the

condition is more prevalent in males, it is important

to note that ASD affects individuals of all genders,

races, and ethnicities, as in [9]. Diagnosing ASD

can be a complex process that involves a thorough

evaluation of a person's behavior, communication,

and developmental history. However, access to a

timely and accurate diagnosis can be limited,

particularly for families in low-income

communities. This can lead to delays in accessing

appropriate services and support. Advances in

technology, such as machine learning algorithms,

have the potential to improve the accuracy and

speed of ASD diagnosis. By analyzing large

datasets, these algorithms can identify patterns and

features that are characteristic of the condition,

which may assist clinicians in making more accurate

and efficient diagnoses. While these tools are not

intended to replace clinical judgment, they may help

supplement traditional assessment methods and

increase access to diagnostic services for individuals

and families affected by ASD, as in [10].

AI techniques can be used to analyze large amounts

of data from various sources, such as genetics,

medical records, and environmental factors. With

AI, patterns can be identified and used to develop

predictive models for ASD, which can help identify

individuals at risk for the disorder and provide early

interventions. The challenge arises because high-

dimensional data has a large number of features and

variables, which can make it difficult to identify

meaningful patterns in the data. Furthermore, the

sheer size of the data can make it difficult to process

and analyze. As a result, the analysis of high-

dimensional datasets requires specialized algorithms

that can accurately identify patterns in the data.

These algorithms must also be computationally

efficient enough to process large amounts of data in

a reasonable amount of time, as in [8].

The proposed research work is a review of AI

applications to detect autism spectrum disorder

among children and adults. Three frameworks that

contain the machine learning models have been

discussed with the dataset. The dataset plays an

important role because each model uses datasets to

train for predictions after getting new data. The

facial images of ASD-detected children and general

children have been taken as primary sources of data.

Deep learning models like Xception, VGG19, and

NASNetMobile have been applied to detect ASD, as

in [11]. The second framework uses models like

XGBoost, Neural Network, and Random Forest to

detect ASD from the clinical standard ASD

screening dataset of toddlers, as in [12]. The third

framework uses traditional machine learning models

that are trained with the UCI dataset of ASD, as in

[13]. These frameworks and some other similar

models have been elaborated on in Section III. The

entire study is given in Section III, where the results

of each model and observation have been discussed

in Section IV and the application of the proposed

study has been included in Section V. The proposed

study ends with a conclusion in Section VI.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

178

Volume 22, 2023

2 Related Works

Autism Spectrum Disorder (ASD) is a significant

challenge for children's health today, and it has

become a key area of focus in healthcare research.

Many studies have explored the potential of

artificial intelligence (AI) to address this disorder

and other mental health-related issues. This section

highlights some notable AI-based research on

mental health that has been conducted.

By analyzing social media posts and biomedical

images, doctors can identify patterns in behavior

and physical symptoms that may be indicative of

ASD. This data can then be used to accurately

diagnose and treat the disorder. By recognizing

facial features associated with ASD, it is possible to

identify individuals with the disorder earlier in life.

This can lead to earlier diagnosis and intervention,

which can be beneficial for those affected by the

disorder. In addition, this system could also be used

to help identify individuals with ASD in social

media posts, which can help connect those with the

disorder with resources and support. Deep learning

techniques rely on accurately identifying key facial

features, such as eyes, nose, and mouth, and then

mapping those features to a template. This allows

the algorithm to recognize the face and identify the

landmarks associated with it. The exception model

achieved the highest accuracy result of 91%,

followed by VGG19 (80%) and NASNETMobile

(78%). The dataset used had a good variety of face

images of different backgrounds, angles, and

lighting conditions, which allowed the deep learning

models to accurately perceive and recognize

patterns and features of the faces. This enabled the

three models to detect a wide variety of faces, which

is why the exception model achieved the highest

accuracy result. The application is designed to

assess facial features from images of people's faces

and compare them to a database of images of people

with and without autism. The convolutional neural

network is trained to recognize the differences

between the two sets of images and categorize the

images accordingly. The Flask framework then

makes the application available online and allows

users to easily interact with the system, as in [11]. It

is also associated with difficulty processing sensory

information and difficulty with motor skills such as

handwriting or balancing. People with autism often

have difficulty understanding and responding to

social cues and may have difficulty forming

relationships with others. Because autism is a

spectrum disorder, it can manifest differently in

each individual. This means that the symptoms can

range from mild to severe, making it difficult to

distinguish between typical development and

autism. Furthermore, autism is often comorbid with

other mental health issues, which can make it even

more difficult to diagnose. Early screening and

treatment can help identify and address any

underlying health issues before they become severe.

This can help reduce the risk of long-term

symptoms as well as improve the overall quality of

life. The goal of this research is to develop an

automated pipeline that can quickly and accurately

identify the signs of autism in toddlers and to use

machine learning models to analyze the indicators

of autism and determine which are the most

significant for diagnosis. The dataset used for this

research was curated from the UC Irvine Autism

Spectrum Disorder dataset, which contains over

10,000 examples of autism-related features from

children aged 4-5. The neural network model was

designed to learn patterns from large datasets, while

the random forest model was designed to identify

relationships between variables. After they were

trained on the data, they were tested on a new

dataset to determine how accurately they could

identify the presence of autism. LightGBM is an

algorithm that measures the importance of each

feature in a dataset by assigning a score to each one.

We used this to identify which physical

characteristics had the highest scores, indicating that

they are most significant in giving rise to autism. To

arrive at this conclusion, the study used a

combination of genetic and physical features,

including facial features, to create a machine-

learning model to analyze the data. The model was

then tested and validated against a set of data

containing individuals with and without autism. The

results indicated that the model was highly accurate

at predicting the presence of autism, indicating the

importance of physical characteristics in identifying

autism. By catching signs of autism early, doctors

can intervene and help the patient learn coping skills

and manage the symptoms. This can help minimize

the impact of autism on their lives and increase their

quality of life, as in [12]. Background: Machine

learning algorithms, when applied to data collected

from patients with ASD, can help identify the

features of the disorder, such as social and

communication deficits, and thus enable more

accurate and efficient diagnosis. With the help of

machine learning, doctors will be able to better

identify, diagnose, and treat patients with ASD. This

is likely due to improved awareness and diagnosis

of ASD, as well as an increase in environmental

factors that can contribute to its development.

Additionally, advances in technology and medical

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

179

Volume 22, 2023

care have made it easier to identify the signs of ASD

and diagnose it in a timely manner. Early diagnosis

of ASD can have a major impact on the quality of

life of individuals with ASD, as early interventions

can be more effective and provide better outcomes.

This study seeks to provide a simple and accurate

way to classify ASD data, which can help with early

diagnosis. By randomly splitting the data and

running the experiments multiple times, we were

able to identify the best method for each dataset.

This allowed us to compare the performance of the

different methods and determine which one was the

most effective for each dataset. The accuracy of

SVM and RF was compared to the other models and

shown to be the highest. Additionally, the results

indicated that SVM was better at generalizing and

was more efficient in terms of training time, while

RF was better at handling imbalanced data. This is

likely because SVM is a discriminative classifier

that tries to classify the data points by finding the

optimal hyperplane that separates the two classes,

while RF is an ensemble method that uses a

collection of decision trees to achieve better

performance than a single decision tree.

Additionally, the RF method uses randomization to

create diversity among its decision trees, which

allows it to better handle imbalanced data. Random

Forest (RF) is an ensemble machine-learning

method that uses multiple decision trees to make

predictions. Because it combines multiple models

and considers the relationships between variables, it

has been shown to outperform other machine

learning methods when it comes to diagnosing ASD,

as in [13]. Early detection and intervention are

critical for helping children with autism get the most

out of therapy and other interventions. If screening

methods are easily implemented, it will allow for

early detection, enabling families to get their

children the help that they need as soon as possible.

It is believed that ASD is the result of a combination

of genetic, environmental, and biological factors.

Research suggests that there may be distinct

differences in the brain structure and function of

individuals with ASD, which may explain their

different behaviors and abilities. The logistic

regression model is used because of its ability to

accurately predict binary outcomes, such as whether

a child has autism or not. The algorithm will be used

to quickly process large amounts of data and make

accurate predictions based on the data in the dataset.

With machine learning, doctors can detect the

disorder more quickly and accurately by using

algorithms that look for patterns in the data. This

can help them identify the disorder earlier and

provide the necessary care to the toddler in a timely

manner, improving their quality of life. These

challenges include a lack of reliable data sets and

data infrastructure, limited access to skilled

personnel, and a lack of understanding of the legal

and ethical implications of AI-powered applications,

as in [14]. This is due to increased awareness of the

condition and improved diagnostic tools, as well as

a greater understanding of the condition and its

effects on individuals' lives. More research is being

done on the topic, leading to improved treatments

and therapies. Some people with ASD may have

difficulty with communication and forming

relationships, while others may have only mild

symptoms. Additionally, some people may have

associated medical issues, such as seizures or sleep

disturbances. Other common symptoms seen in

those with autism include limited or inappropriate

social interactions, difficulty with communication,

restricted and repetitive behaviors, and sensory

sensitivities. Diagnosis of autism can be done at any

age through observation of these behaviors, physical

examinations, cognitive testing, and genetic testing.

This is to allow for a more accurate diagnosis of

ASD as well as to enable early intervention to

ensure that the symptoms do not worsen. This is

done by using ML algorithms to analyze data such

as patient records, behavior, and medical history to

identify patterns that could indicate the presence of

ASD. LR and SVM are two popular machine

learning (ML) algorithms that can be used to

classify data. The performance measure helps to

compare the accuracy of the predictions made by the

model with each algorithm. This can help users

determine which algorithm provides more accurate

results in a shorter amount of time, which can help

them determine if they are suffering from ASD or

not, as in [15]. ASD can cause difficulties in

communication, social skills, and repetitive

behaviors. It is believed to be caused by a

combination of genetic and environmental factors

and can affect people in different ways. Early

intervention is key to reducing the effects of ASD,

as it can help children learn the skills they need to

better manage their symptoms and lead more

independent lives. It also provides an opportunity

for parents and caregivers to better understand their

children and find ways to cope with and manage the

disorder. This makes it difficult for physicians to

accurately identify ASD symptoms and recognize

them as being uniquely associated with ASD. As a

result, the diagnosis is often delayed or missed

entirely. Deep learning algorithms are able to

uncover complex patterns in large amounts of data

that may be too subtle or too complicated for a

human expert to detect. By utilizing these

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

180

Volume 22, 2023

algorithms, medical experts can be provided with

more accurate and timely diagnoses of ASD, which

can help improve treatment and outcomes for those

affected by the disorder. With a larger dataset,

machine learning algorithms can be used to develop

better models for diagnosing ASD. The algorithms

can also take into account subtle or nuanced

symptoms that may not be easily detected by

medical professionals, which can lead to more

accurate diagnoses. The hybrid approach is

beneficial because it combines the power of deep

learning to extract complex patterns from data with

the interpretability of XAI to explain why certain

features are more important than others in predicting

ASD. This helps to reduce the bias in the predictions

and makes the results more trustworthy. The

proposed framework combines data from both

parents and clinicians to create a more

comprehensive picture of a child's development.

This data can then be used to make more accurate

predictions about which children are likely to have

ASD traits, allowing clinicians to provide earlier

interventions and support, as in [16]. People with

ASD typically have difficulty with social

interaction, communication, and understanding

language. They may also show restricted or

repetitive behaviors, such as having difficulty

transitioning from one activity to another. Children

with autism can struggle with social interactions,

body language, and understanding facial

expressions. Early diagnosis can equip families with

the necessary resources and interventions to help

their child reach their full potential. With the

prevalence of ASD increasing in recent years, it is

becoming increasingly difficult for medical

professionals to diagnose the condition in children

without the help of automated methods. Automated

methods can quickly and accurately detect signs of

ASD in children, allowing medical professionals to

make more informed decisions about diagnosis and

treatment. We selected the AutoML method because

it has the ability to automate the process of building,

optimizing, and selecting the best-performing model

with minimal manual effort. In addition, AutoML

can also be used to identify important features in the

dataset, which can then be further used to improve

the accuracy of the machine-learning models. This

is due to the fact that AutoML automates the

process of selecting optimized feature combinations

and hyperparameters, allowing us to quickly

identify the optimal settings for our model. The

combination of these techniques allowed us to

achieve the highest accuracy with minimal effort, as

in [17]. ASD is caused by a combination of genetic

and environmental factors, including gene mutations

and exposure to toxins. People with ASD may also

have trouble forming social relationships, have

difficulty with communication and language, and

struggle with sensory sensitivity. MRI imaging

modalities have the capability to detect subtle brain

abnormalities that are associated with ASD, such as

changes in the brain’s structure, connectivity, and

even chemistry. This makes it an invaluable tool for

diagnosing and monitoring ASD. fMRI uses

magnetic fields and radio waves to measure blood

flow in the brain and identify any abnormalities or

discrepancies in brain activity. sMRI uses high-

resolution images to map the structure of the brain

and detect any abnormalities in the brain's anatomy.

These two modalities work together to help

clinicians diagnose ASD with greater precision.

These systems use AI to analyze brain images, such

as MRI and fMRI scans, to assess an individual's

brain structure and connectivity. The AI algorithms

can detect subtle differences in brain structures,

which can be used to diagnose ASD more accurately

and quickly by specialists. ML algorithms are used

to analyze the image data, identify the relevant

features, and detect any abnormalities that could be

indicative of ASD. DL applications are used to

further analyze the data and identify patterns that

may be indicative of ASD. This allows for more

accurate and reliable diagnoses. Deep learning (DL)

techniques employ large datasets of MRI images

and AI algorithms to create models that can detect

patterns in the images that are associated with ASD.

These models can then be used to automate the

diagnosis of ASD and provide more accurate and

timely results. We compare the accuracy and

training times of ML and DL models to show that

DL models can learn faster and achieve higher

accuracy. We also discuss the importance of feature

selection and data pre-processing in improving the

accuracy of the models. Finally, we suggest the

potential of combining AI techniques with MRI

neuroimaging to detect ASDs, as in [18]. It is

usually diagnosed during early childhood, and

symptoms can range from mild to severe. Common

characteristics of ASD are difficulty with social

interactions, difficulty with verbal and nonverbal

communication, difficulty with sensory integration,

and an overall difficulty in adapting to change. As a

result, many healthcare providers are looking for

more cost-effective ways to diagnose ASD, such as

through the use of screening tools that can help

identify the presence of ASD symptoms in a shorter

amount of time. Additionally, research has found

that early detection and intervention of ASD can

have a significant impact on the child's

development, so it is important to identify the

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

181

Volume 22, 2023

disorder as quickly as possible. These methods are

designed to provide a clearer picture of what is

happening in the person's life, allowing for a more

accurate diagnosis. The AQ and M-CHAT use

standardized questions about social interaction,

communication, and behavior to assess the

individual's level of autism spectrum disorder. The

user must be knowledgeable about the various items

that need to be screened and be able to identify any

discrepancies that could lead to inaccurate results.

The screening items must be designed in such a way

that they allow for accurate and efficient screening.

ML algorithms can process large amounts of data

quickly and efficiently. By taking advantage of such

algorithms, we can greatly reduce the time needed

to detect patterns, uncover trends, and identify

anomalies in the data. These patterns and trends can

then be used to make more accurate diagnoses,

leading to improved accuracy and efficiency in the

diagnostic process. RML is based on a combination

of rule-based and ML techniques, which allows it to

detect patterns in data that traditional ML

techniques cannot. Furthermore, it provides users

with interpretable rules that can be used to gain a

better understanding of the data as well as identify

potential areas for further research. This is likely

due to RML's ability to learn from the data and

identify patterns in the data that are not visible to

traditional ML methods. Additionally, RML's ability

to handle complex data and its ability to adjust to

new data as it comes in make it a powerful tool for

classification, as in [19].

3 Machine Learning Models in ASD

Detection

Today, artificial intelligence has established its

presence in all sectors, including healthcare. Autism

Spectrum Disorder (ASD) detection is a difficult

challenge in the healthcare domain. Early detection

of ASD is needed to start treatment to reduce all the

symptoms of ASD. ASD is not curable, but it is

possible to manage its symptoms. Parents have a

crucial role in detecting ASD at the early age of a

baby. In the detection of ASD, many types of

research have been done or are in progress to use

machine learning models and various kinds of

datasets. In this section, a discussion has been done

on the detection of ASD using machine-learning

models. The discussion has progressed according to

ASD detection cases. Each case has been described

with the proper dataset, machine learning models,

and model performance. The discussion about each

framework has been given in the next section. Our

primary aim is to understand the architecture of each

system where simulation and numerical stability

have been normalized, as in [11], [12], and [13].

3.1 ASD Detection Using Facial Images

Machine learning models are used to detect ASD

among children. Machine learning models like

Xception, VGG19, and NASNETMobile are very

advanced image-based machine-learning models.

Fig. 1. Framework of ASD Detection using ML

3.1.1 Dataset

The dataset [11] has been prepared using images of

autistic children and general children. The facial

Images have been captured, which are the main

input for the machine-learning models. The dataset

has been prepared with 2940 facial images, of which

half are of autistic children and the remaining half

are of general children. The images have been

collected from the social autism groups on

Facebook, as in [11].

3.1.2 Framework

A clear framework to detect ASD has been given,

where each section is described in Fig. 1. According

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

182

Volume 22, 2023

to Fig. 1, the given framework [11] shows that data

will be read from the dataset and split into the train

and test parts in the first and second steps. The

model will be prepared to train with training data.

After the completion of training, the model will be

fine-tuned to reduce the error in the next step. After

reducing the error, the model will be fitted and

tested with test data for validation. After completion

of this step, the model will be ready to predict using

new data from the user side. The predicted result of

the test data will be utilized to calculate the

accuracy, precision, recall, and confusion matrix in

the final step. Three advanced machine-learning

models have been used to detect autism from facial

images, as in [11]. The input dataset contains the

image data for training and testing these models.

The result of these models has been discussed in

section IV.

3.2 ASD Detection of Toddlers Using

Machine Learning Models

This work [12] has been proposed to detect ASD in

toddler children. The age range of toddler children

is between 12 months and 3 years. This is a good

time to detect ASD among children because early

detection helps to start ASD therapies according to

the need. AI already accepts this challenge to find

out the solution to early detection of ASD among

children. Many models have been developed that are

useful in the detection of ASD. XGBoost, neural

networks, and random forest models have been used

to detect ASD among toddlers, as in [12].

3.2.1 Dataset

The dataset [12] has been collected from

Kaggle.com, which is an open-source repository of

machine learning. The autism dataset was prepared

by the University of California, Irvine. This dataset

contains the screening data for toddlers. The dataset

contains 1054 records with 18 variables that point to

different attributes. 10 variables are questions that

determine ASD among toddlers. These 10 variables

are questions related to autism. The questions are set

from A1 to A10. If the answer to Questions A1 to

A9 is “sometimes”, "rarely," or "never," then the

value will be assigned as 1, and 0 will be the

opposite of these answers. If the answer to question

A10 is "always,", "usually,", or "sometimes,", then

the value will be assigned as 1, and 0 will be the

opposite of the answer. The scores of these

questions and other attributes have been used to

train the models for the prediction of ASD, as in

[12].

Fig. 2. Framework of ASD detection among

Toddlers using ML

3.2.2 Framework

Fig. 2 shows the framework of this system [12],

which is equipped with machine learning models

like XGBoost, neural networks, and random forests.

The data will be read from the dataset. The

preprocessing task will be applied when it requires

some cleaning in the second step. According to the

third step, the data will be split into training and

testing parts. Now, each model, like XGBoost,

neural networks, and random forests, will use

training data to train and understand the pattern, as

in [12]. After the completion of training, each model

will be evaluated using testing data. In the end,

models are ready to predict results according to the

user’s input. The Random Forest model has been

used with pre-optimization and post-optimization.

The XGBoost model is an ensemble model that is

equipped with many weak models. XGBoost stands

for ‘Extreme Gradient Boosting’ and is the most

popular machine learning model that accepts large

datasets, and its overall performance is good and

stable. The neural network has been developed by

inspiring the human brain. Neural networks are used

to solve complex machine-learning problems

because of their ability to compute quickly and

generate responses quickly. The other model is the

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

183

Volume 22, 2023

Random Forest, which is the most popular model to

solve classification problems in machine learning.

The random forest model has been developed by the

decision trees. A decision tree is the key point of the

Random Forest algorithm. The results of these

models have been discussed in Section IV.

3.3 ASD Detection Using Traditional

Machine Learning Models

The Support Vector Machine (SVM), K Nearest

Neighbour (KNN), and Random Forest models are

used to detect Autism using the UCI dataset as in

[13].

3.3.1 Dataset

Three datasets [13] have been used to solve the

ASD detection problem. These three datasets have

been taken from the UCI database. The three

datasets are AQ-10-Adult for adults, AQ-10-

Adolescence for adolescents, and AQ-10-Child for

children. The data has been classified into train data

and test data with different values. The values will

be selected randomly. The score of each subset of

data is measured with average accuracy, average

sensitivity, average F-measure, and average AUC,

as in [13].

3.3.2 Framework

The framework [13] is equipped with three models:

SVM, KNN, and Random Forest. All these models

are best for the classification problem. The first

model is SVM, which creates the best line or

decision boundary to classify the n-dimensional

space for plotting new data points in the correct

category. SVM uses the vectors to create the

hyperplan. The optimum hyperplan segregates the

vectors that define the classes. The KNN is another

supervised machine learning algorithm that can be

used for classification or regression. The K is the

nearest neighbor that has been used by the KNN

algorithm. A majority vote for a particular class

determines that a new observation should be inside

it. Larger values of K refer to stable decision

boundaries for classification, whereas small values

of K refer to decision boundaries that are not better

than a larger K value. Random Forest is a popular

model in classification. This model contains a

number of decision trees according to the various

subsets of the dataset, and it will calculate the

average for prediction. The greater number of

decision trees refers to the higher accuracy that

prevents the overfitting problem, as in [13].

Fig. 3. Framework of ASD detection using

Traditional ML

According to Fig. 3, first, the data is read from the

UCI dataset [13] and split into training and testing

sets. Then, a model is prepared for training using the

training data. After the model is trained, it is fine-

tuned to reduce errors and increase accuracy. The

model is then fitted with the data and ready for

testing using the test data. Once testing is complete,

the model can be used to predict results based on the

new data. The machine learning models SVM,

KNN, and Random Forest will be trained using the

UCI data as in [13]. The accuracy of each model has

been discussed in Section IV.

4 Results and Discussion

The models have been discussed with a framework

in Section III. This section refers to the discussion

about the results of each model. The first framework

is ASD detection using facial images of autistic

children and general children, as in [11]. The second

framework is the ASD detection of toddlers using

screening data [12], and the third frame is the ASD

detection from the UCI dataset as in [13]. Each

framework contains machine learning models, and

these models have been used for prediction after

successful training and testing.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

184

Volume 22, 2023

4.1 Result of the Models of ASD Detection

Using Facial Images

The first framework [11] is about ASD detection

using facial images. Three deep learning models—

Xception, VGG19, and NASNetMobile—have been

implemented to recognize ASD among children

using their facial images as data. The accuracy of

the Xception model has scored 91%, whereas the

specificity and sensitivity of this model are 94% and

88%, respectively. The VGG19 model has scored

80% accuracy, and its specificity and sensitivity

values are 83% and 78%, respectively. The

NASNetMobile model has 78% accuracy, 75%

specificity, and 82% sensitivity. Specificity, on the

other hand, measures the ability of a model to

correctly identify negative instances of a given

category. It is calculated as the number of true

negatives divided by the sum of true negatives and

false positives. Sensitivity is a metric that measures

the ability of a model to correctly identify positive

instances of a given category. It is calculated as the

number of true positives divided by the sum of true

positives and false negatives. The accuracy,

specificity, and sensitivity of each model have been

given in Table 1 as in [11].

Table 1. Accuracy, Specificity, and Sensitivity

of Each Model

Sl.

No.

Models

Specificit

y

Sensitivity

Accuracy

1

Xception

0.94

0.88

0.91

2

VGG19

0.83

0.78

0.80

3

NASNE

TMobile

0.75

0.82

0.78

4.2 Result of ASD Detection of Toddlers

Using Screening Data

The baseline XGBoost model [12] has performed

well to detect ASD among toddlers. The toddler’s

dataset contains 18 variables, where A1 to A10 are

questions that need answers to train the model. The

other machine learning models are also used to

detect ASD among toddlers. Neural networks and

Random Forest pre- and post-optimization are the

models that are used, and their performance has

been given in Table 2 as in [12].

Table 2. Performance Scores of ASD Detection

Models among Toddlers

Sl.

No.

Model

Precision

Recall

F1

Accuracy

1

Neural

Network

100%

2

Random

Forest(Pre

-

Optimizati

on)

98.15%

98.10

%

98.09

%

98.10%

3

Random

Forest(Pos

t-

Optimizati

on)

100%

4

XGBoost

97.04% -

Mean

Accuracy

and

1.78%

Standard

Deviation

The performance scores of each model can be seen

in Table 2, where the neural network model has

100% accuracy with 100% precision and a 100%

recall value. The Random Forest (post-optimization)

model has the same scores as the neural network

model. It has 100% scores in precision, recall, and

accuracy. The Random Forest (pre-optimization)

scored 98.15% in precision, 98.10% in the recall,

and 98.10% in accuracy, whereas XGBoost has a

97.04% accuracy score with a standard deviation

value of 1.78%, as in [12].

4.2 Result of ASD Detection Using

Traditional Machine Learning Models

The UCI dataset [13] has been taken as the main

data source to detect ASD among children. The

three most popular traditional machine learning

algorithms have been used for ASD prediction,

according to the new input. These models are KNN,

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

185

Volume 22, 2023

SVM, and Random Forest (RF). The performance of

each model has been given in Table 3 as in [13].

Table 3. Performance of Traditional Machine

Learning Models to Detect ASD among Children

Sl.

No

.

Models

Case

AUC

1

KNN

AQ-10-Adults for the

Case of Complete Data

0.94

2

SVM

AQ-10-Adults for the

Case of Complete Data

1.00

3

RF

AQ-10-Adults for the

Case of Complete Data

1.00

4

KNN

AQ-10-Adults for the

Case of Missing Data

0.93

5

SVM

AQ-10-Adults for the

Case of Missing Data

1.00

6

RF

AQ-10-Adults for the

Case of Missing Data

1.00

7

KNN

AQ-10- Adolescence for

the Case of Complete

Data

0.87

8

SVM

AQ-10- Adolescence for

the Case of Complete

Data

0.97

9

RF

AQ-10- Adolescence for

the Case of Complete

Data

1.00

10

KNN

AQ-10- Adolescence for

0.85

the Case of Missing Data

11

SVM

AQ-10- Adolescence for

the Case of Missing Data

0.98

12

RF

AQ-10- Adolescence for

the Case of Missing Data

1.00

13

KNN

AQ-10-Child for the Case

of Complete Data

0.85

14

SVM

AQ-10-Child for the Case

of Complete Data

0.89

15

RF

AQ-10-Child for the Case

of Complete Data

0.99

16

KNN

AQ-10-Child for the Case

of Missing Data

0.85

17

SVM

AQ-10-Child for the Case

of Missing Data

0.91

18

RF

AQ-10-Child for the Case

of Missing Data

1.00

Table 3, Table 3 shows the performance graph of

the KNN, SVM, and RF, which is based on the

AUC scores. Six cases have been classified as: 1.

AQ-10-Adults for the Case of Complete Data, 2.

AQ-10-Adults for the Case of Missing Data, 3. AQ-

10: Adolescence for the Case of Complete Data; 4.

AQ-10: Adolescence for the Case of Missing Data;

5. AQ-10: Child for the Case of Complete Data; and

6. AQ-10: Child for the Case of Missing Data. The

AUC score has been calculated by the true positive

rate and the false positive rate, as in [13]. The AUC

scores according to the first case of KNN, SVM,

and RF are 0.94, 1.00, and 1.00. In the second case,

the AUC scores of KNN, SVM, and RF are 0.93,

1.00, and 1.00. The AUC scores of KNN, SVM, and

RF in the 3rd case are 0.87, 0.97, and 1.00, whereas

the AUC scores are 0.85, 0.98, and 1.00 in the 4th

case. The AUC scores of KNN, SVM, and RF are

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

186

Volume 22, 2023

0.85, 0.89, and 0.99 in the 5th case, whereas 0.85,

0.91, and 1.00 in the 6th case, as in [13].

Each framework with machine learning models

has been discussed with figures and tables that

indicate the procedure for ASD detection among

children or adults. These models are good for

detecting ASD, but it is more important that models

detect the kinds of symptoms in ASD individuals.

Early detection of ASD among children is much

more important than ASD detection among adults.

You will get a good result if you start therapies on

early ASD-detected children. If ASD is detected

after a certain age, then it will be difficult to get any

good improvement results for ASD individuals. The

discussed frameworks will work on ASD detection,

but ASD symptoms are also needed for further

understanding of the need for therapies. Sometimes

fixed attributes are used in the dataset, or some

questions regarding ASD have been used in the

dataset with answers, but according to the ASD

problem, the best data source for ASD is parents

because an ASD individual spends most of the time

with the parent. Facial images are also used to detect

autism, but it is difficult to segregate ASD and

ADHD children through images. A child may have

an ASD problem or ADHD problem, as well as a

Global Development Delay (GDD) problem, which

can be there. Facial images cannot segregate this

problem correctly. If there are hundreds of ASD

children, and if we find patterns, then it is possible

to get a hundred different patterns from ASD

children. There are no fixed patterns to detect ASD

among children, but a few may be common. The

study of these three frameworks has given a clear

understanding of the strong role of AI in ASD

detection, where a hybrid approach can be executed.

5 Application of the Proposed Study

The application of the proposed study is to

understand the various techniques of ASD detection

among children. Today, autism is a major issue

among children, according to the World Health

Organization (WHO, https://www.who.int/news-

room/fact-sheets/detail/autism-spectrum-disorders).

Many applications have been developed using

machine learning and natural language processing to

detect autism in the early stages, but the research

method may not be cost-effective and the data

regarding this problem is not up to par. The

proposed study can be useful in identifying space

for further research, like parent-child dialogue with

an autistic child. The deep learning models can be

applied to understand the symptoms of autism at an

early age from the parent's dialogue.

6 Conclusion

Three frameworks have been discussed with

machine learning models to detect ASD among

children, toddlers, and adults. The first approach is

ASD detection using the facial images of ASD

children and general children. Advanced machine

learning models have been used to detect ASD.

These models are trained with facial images, and

their accuracy is very high for the detection of ASD.

The second framework is about ASD detection

among toddlers. This framework used some

screening data from toddlers to train the machine

learning models. ASD detection at an early age is a

good option to start therapies to reduce the

symptoms of ASD. Each model has been trained

with the ASD dataset, and performance scores are

high according to the predictions. The third

framework contains some traditional machine-

learning models that are popular machine-learning

models for classification problems. These models

are able to predict ASD among children,

adolescents, and adults with high accuracy after

training with complete and missing data. These

three kinds of frameworks have elaborated on AI

applications in the healthcare domain with strong

results. The deep learning models can be applied to

the parent-child dialogues of an autistic child. The

parent's dialogues are nothing but textual

information about their children, and this data can

be utilized for identifying the symptoms of autism.

After the detection of symptoms from the parents’

dialogues, the symptoms can be analyzed according

to the severity of autism, and this task will be a

future enhancement.

Acknowledgement:

The authors extend their appreciation to the

Manipur International University, Imphal, India for

supporting this research work on Autism.

References:

[1] Maria Lai, Jack Lee, Sally Chiu, Jessie Charm,

Wing Yee So, Fung Ping Yuen, Chloe Kwok,

Jasmine Tsoi, Yuqi Lin, Benny Zee, A

machine learning approach for retinal images

analysis as an objective screening method for

children with autism spectrum disorder,

EClinical Medicine, 2020, pp. 1-20.

[2] C. S. Paula, S. H. Ribeiro, E. Fombonne, and

M. T. Mercadante, Brief report: prevalence of

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

187

Volume 22, 2023

pervasive developmental disorder in Brazil: a

pilot study, Journal of Autism and

Developmental Disorders, vol. 41, no. 12,

2011, pp. 1738–1742.

[3] L. C. Nunes, P. R. Pinheiro, M. C. D. Pinheiro

et al., A Hybrid Model to Guide the

Consultation of Children with Autism

Spectrum Disorder, A. Visvizi and M. D.

Lytras, Eds., Springer International

Publishing, View at: Google Scholar, 2019, pp.

419–431.

[4] Apa–American Psychiatric Association,

Diagnostic and statistical manual of mental

disorders (DSM –5), 2020,

https://www.psychiatry.org/psychiatrists/practi

ce/dsm.

[5] R. Carette, F. Cilia, G. Dequen, J. Bosche, J.-L.

Guerin, and L. Vandromme, Automatic autism

spectrum disorder detection thanks to eye-

tracking and neural network-based approach,

the International Conference on IoT

Technologies for Healthcare, Springer, Angers,

France, 2017, pp. 75–81.

[6] L. Kanner, Autistic disturbances of affective

contact, Nerv. Child, Vol. 2, 1943, pp. 217–

250.

[7] E. Fombonne, Epidemiology of pervasive

developmental disorders, Pediatric Research,

Vol. 65, no. 6, 2009, pp. 591–598.

[8] D. Aarthi, M. Udhayamoorthi, G. Lavanya,

Autism Spectrum Disorder Analysis using

Artificial Intelligence: A Survey, International

Journal of Advanced Research in Engineering

and Technology, Vol. 11(10), 2020, pp. 235-

240.

[9] N. Ajaypradeep, R. Sasikala, Child Behavioral

Analysis: Machine Learning based

Investigation for Autism Screening and Early

Diagnosis, International Journal of Early

Childhood Special Education, Vol. 13(2),

2021, pp. 1199-1208.

[10] N. V. Ganapathi Raju, Karanam Madhavi, G.

Sravan Kumar, G. Vijendar Reddy, Kunaparaju

Latha, K. Lakshmi Sushma, Prognostication of

Autism Spectrum Disorder (ASD) using

Supervised Machine Learning Models,

International Journal of Engineering and

Advanced Technology (IJEAT), Vol. 8(4),

2019, pp.1028-1032.

[11] Fawaz Waselallah Alsaade and Mohammed

Saeed Alzahrani, Classification and Detection

of Autism Spectrum Disorder Based on Deep

Learning Algorithms, Computational

Intelligence and Neuroscience, 2022, pp. 1-10.

[12] Arjun Singh, Zoya Farooqui, Branden Sattler,

Unyime Usua, Michael Helde, Using Machine

Learning Optimization to Predict Autism in

Toddlers, 11th Annual International

Conference on Industrial Engineering and

Operations Management, Singapore, 2021, pp.

6920-6931.

[13] Uğur Erkan1, Dang N.H. Thanh, Autism

Spectrum Disorder Detection with Machine

Learning Methods, Current Psychiatry

Research and Reviews, Vol. 15(4), 2019.

[14] Dr. Sherif Kamel, Rehab Al-harbi, Newly

proposed technique for autism spectrum

disorder based machine learning, International

Journal of Computer Science & Information

Technology (IJCSIT), Vol. 13(2), 2021.

[15] Sriram Dhanyatha , A. Greeshma, Gouthami,

M. Yeshwanth, Y Shobha, Prediction of

Autism Spectrum Disorder based on Machine

Learning Approach, International Research

Journal of Engineering and Technology

(IRJET), Vol. 8(7), 2021, pp. 2907-2917.

[16] Anupam Garg, Anshu Parashar, Dipto Barman,

Sahil Jain, Divya Singhal, MehediMasud,

Mohamed Abouhawwash, Autism Spectrum

Disorder Prediction by an Explainable Deep

Learning Approach, Computers, Materials &

Continua, Vol. 71(1), 2022, pp. 1459-1471.

[17] Basma Ramdan Gamal Elshoky, Eman M. G.

Younis, Abdelmgeid Amin Ali, Osman Ali

Sadek Ibrahim, Comparing automated and non-

automated machine learning for autism

spectrum disorders classification using facial

images, ETRI Journal, 2021, pp. 613-623.

[18] P. Moridian1, N. Ghassemi, M. Jafari, S.

Salloum-Asfar, D. Sadeghi, M. Khodatars, A.

Shoeibi, A. Khosravi, S. H. Ling, A. Subasi, R.

Alizadehsani, J. M. Gorriz6, Sara A Abdulla,

U. Rajendra Acharya, Automatic Autism

Spectrum Disorder Detection Using Artificial

Intelligence Methods with MRI Neuroimaging:

A Review, Frontiers in Molecular

Neuroscience, Vol. 15, 2022, pp. 1-51.

[19] Fadi Thabtah, David Peebles, A New Machine

Learning Model based on Induction of Rules

for Autism Detection, Health Informatics

Journal, 2020, pp. 1-23.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

188

Volume 22, 2023

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.21

Prasenjit Mukherjee, Sourav Sadhukhan, Manish Godse

E-ISSN: 2224-2872

189

Volume 22, 2023