Data Mining Methods in Educational Process Management

ANNA HOVAKIMYAN, SIRANUSH SARGSYAN

Department of Programming and Information Technologies,

Yerevan State University,

1, Alek Manukyan st., Yerevan, 0025,

ARMENIA

Abstract: - The paper addresses the challenge of effectively managing the educational process by leveraging

intelligent data analysis of student performance during learning activities. It introduces an approach centered

around data clustering, specifically applied to the study of programming disciplines and languages. By utilizing

clustering techniques, the paper aims to identify the most challenging topics within a given academic subject,

track students' learning paths, evaluate and enhance teaching methodologies, and create personalized learning

plans tailored to individual students' needs. This approach enables educators to better understand and address

the diverse learning requirements of students, ultimately enhancing the overall educational experience.

Key-Words: - educational process, student’s educational path, programming languages teaching, knowledge

testing environments, data mining, data clustering, adaptive teaching scenarios.

Received: March 3, 2024. Revised: September 3, 2024. Accepted: October 5, 2024. Published: November 6, 2024.

1 Introduction

At present, educational institutions and training

centers gather and retain a vast amount of data about

the educational procedure. This information

includes records of student enrollment and

attendance to courses, outcomes of ongoing and

final exams, charts displaying the degree to which

students have attained the educational objectives

stated in educational programs, and so on.

Analyzing this data provides valuable insights that

aid educational institution staff in efficiently

arranging the educational process, [1], [2], [3], [4].

When dealing with large sets of data, data

mining methods can be very useful. Note that

traditional data mining algorithms may not be

suitable for solving problems interested. Therefore

the development of new algorithms that offer

specific functionality and integrity is necessary, [1].

The process of data mining aims to extract

knowledge from large amounts of raw data and has

become a crucial aspect of decision-making

systems. Data mining techniques are introduced into

different research fields, such as statistics,

databases, machine learning, artificial intelligence,

data visualization, etc. [1]. It has gained significant

attention and has become an important component

of the activities in various organizations. For

example, in the service sector, data mining helps

analyze customer behavior and improve a

company's performance. In the healthcare sector, it

serves as an additional tool for doctors to diagnose

diseases. In the marketing sector, it helps to study

the market and its needs and to make informed

decisions, [1].

Data mining is a continuous cyclic process that

does not stop after finding a solution. The results

generated from data mining lead to new business

goals, which can be used to create more focused

models.

As we know, the process of data mining

includes the following stages: problem

identification, data collection and preparation,

model building, evaluation, and model deployment,

[1].

In the first stage, the subject area as well as the

problem are analyzed, project goals and

requirements are identified, the data mining task is

formulated, and a preliminary implementation plan

is developed, [1], [5].

The second stage involves gathering and

researching data. This step helps to determine how

effectively the data gathered solves the problem.

One can remove unnecessary data or add more data

to improve the accuracy of the results. During this

stage, the data is also subjected to statistical

analysis. Proper preparation of the data can

significantly enhance the quality of information that

must be extracted using data mining techniques. In

addition, this stage involves visualizing and

interpreting the data for various stakeholders, [6].

During the model construction and evaluation

stage, different modeling methods are selected and

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATION

DOI: 10.37394/232010.2024.21.13

Anna Hovakimyan, Siranush Sargsyan

E-ISSN: 2224-3410

110

Volume 21, 2024

applied, and model parameters as well as

hyperparameters are adjusted to optimal values. At

this point, it is crucial to evaluate how well the

model satisfies the initially stated business goal, [6].

The final stage is the deployment of the model,

where data mining is used in the target environment.

In this stage, the trained model can be used to make

decisions based on real-time data, [7].

In data mining, various approaches and

algorithms are used to find connections between

different data characteristics. These methods include

classification, regression, cluster analysis, factor

analysis, social network analysis, association rules

searching, sequential pattern analysis, etc. The

results are used to predict future events and the

forecast results are presented via various visual

representations such as pie charts, histograms, Gantt

charts, tables, etc. This helps people to better

understand the patterns and forecast results, which

can then be used for decision-making, [6].

2 Data Mining in Education

Educational Data Mining (EDM) is a specific

research field in Data Mining that focuses on using

various studies, techniques, and tools to extract

useful information from vast amounts of data related

to educational processes, [8], [9]. This data can

include various details such as the performance of

applicants in entrance exams, the results of students'

session exams, how the independent and research

work is carried out, the academic subjects that

students frequently access during online learning, as

well as the students’ preferred format of educational

materials (text, multimedia, etc.), [10].

Data mining is widely used in education to

analyze a large amount of data produced by

information systems in schools, universities, and

educational centers. Experts from various fields

such as computer and communication technologies,

pedagogy, psychology, and statistics work together

to improve the educational process by combining

traditional and innovative teaching and learning

methods. The ultimate goal is to enhance the quality

of education through advanced data analysis

techniques (Figure 1), [11], [12].

Automated data analysis through data mining in

the field of education provides detailed results that

are difficult for a person to obtain through manual

analysis. These results can be used by the learning

process management system to establish a

correlation between a student's educational path,

final grades, achievements, and educational goals

[9], [10], [13].

Fig. 1: Data mining infrastructure in education

Data mining can be a valuable tool for educators

and researchers in the field of education. It can

assist in analyzing curriculum to ensure high-quality

learning for students, altering the list and content of

academic disciplines and workshops according to

student and employer requirements, studying the

educational paths of students and their connection

with educational goals, identifying patterns and

anomalies in student work, suggesting the most

effective criteria for selecting courses in

asynchronous learning, and much more, [3], [4], [9],

[12].

To comprehensively evaluate the entire

educational process and identify patterns between

students' academic achievements and different

educational approaches, it is necessary to analyze

and visualize information. Research has shown that

pre-processing algorithms should be applied to

learning process data before any specific data

mining methods can be applied, [1], [3], [7].

Data mining methods are utilized in various

forms of learning, such as offline, online, and hybrid

formats. They are applied to analyze the learning

outcomes of individual subjects and the entire

educational program, [3], [11], [13]. In a single

subject, it is possible to examine the results of

mastery of individual topics and patterns that reflect

the pace of mastery of a specific topic, depending on

the degree of mastery of other topics. Ultimately,

this assessment can provide an evaluation of the

students' acquisition of knowledge, competencies,

and skills in the entire academic discipline, [9], [11],

[13].

It is possible to establish a correlation between

accessibility and the degree of mastery of specific

topics within an academic discipline. The same

method can be used to evaluate the entire

educational program and determine the extent to

which the educational goals have been met, [14].

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATION

DOI: 10.37394/232010.2024.21.13

Anna Hovakimyan, Siranush Sargsyan

E-ISSN: 2224-3410

111

Volume 21, 2024

Automated testing environments can generate

data that can be used to build machine learning

models. With a target variable in mind, regression

models can be constructed to predict expected

output values. These predictive models can be used

to develop warning and recommendation systems, to

prevent undesirable academic outcomes, [11], [15].

3 Evaluating the Effectiveness of

Implementing the Educational

Program

One of the key factors in evaluating the

effectiveness of an educational program is to

determine the extent to which students have

achieved the goals outlined in the program. This is

done by assessing the educational outcomes

produced by various components of the curriculum

including final and current exams, tests, independent

and research work, practice, projects, final papers,

and so on. The points obtained by the students for

these components are then used for certification

purposes. Every unit of the syllabus is linked with

indicators that show whether it covers educational

objectives in the form of knowledge, abilities,

competencies, and skills that students acquire.

Gathering, processing, and analyzing the data will

enable students to be grouped based on the extent to

which they have achieved the stated educational

objectives.

We represent the knowledge, skills,

competencies, and abilities acquired by a student

with an ID as a vector   



 

   

 

󰇛  󰇜 



j-th knowledge/ability/

competence/skill acquired by the student identified

by this ID.

Cluster analysis of data should be conducted on

both individual components and groups of

components while excluding others, [16]. Typically,

the number of clusters is set to three, corresponding

to the categories of "bad," "satisfactory," and

"good." Visualizing the analysis results can aid in

performing a SWOT analysis on the implementation

of an educational program by highlighting the

program's strengths, weaknesses, and possible risks.

The outcomes of the analysis can be used to take

corrective measures to eliminate disadvantages and

enhance educational processes.

As a part of this research, some methods have

been created to evaluate the outcomes of students'

education in programming subjects, including

programming fundamentals and programming

languages. To accomplish this, the e-judge

interactive environment was used, which enables

automated verification of code written in various

programming languages, [17].

As is known programming is a discipline that

involves creating algorithms to solve problems

effectively. A good algorithm should be optimal in

terms of complexity. For the student, it is essential

to have the ability to create correct algorithms and

evaluate their complexity. Programming training

involves mastering the syntax of a programming

language, as well as acquiring the skills to build

effective and accurate programs in that language.

This means that the program should work correctly

for all proposed test cases.

Automated systems designed to support

programming disciplines such as e-judge, Code

Signal, etc. are equipped with tools that help check

the syntax of programs in the desired programming

language. They support the launch of the programs

on specific input data and provide results on test

cases. These results are available in a convenient

format for both manual and automated analysis,

[17].

This paper examines the use of data intellectual

analysis methods to improve education by analyzing

data from knowledge-testing environments. The

purpose of this study is to create a toolset for

exploring students' educational paths, monitoring

their progress, identifying challenging topics, and

implementing effective teaching methods. To solve

the problem, data clustering is proposed, [16].

Our research is conducted to evaluate students’

achievements in programming disciplines. The

students are given tasks to programming in different

programming languages such as C, C++, and

Assembler. The students must implement the tasks

in the e-judge environment, [17]. The tasks are

provided with test cases that the e-judge system uses

to test the programs.

The e-judge system presents program execution

results on test cases in different forms. If there are

compilation errors in the program that point gaps in

learning by students the syntax of the programming

language, the system returns a "Compilation Error"

message. If not all tests have been passed, an

"Execution Error" message is returned. If the

program is ineffective and exceeds the maximum

operating time, the system returns a "Maximum

Operating Time has been Exceeded" message.

Finally, if the program executes successfully on all

the test cases, the system returns an "Awaits for

Confirmation" message, [17].

Data clustering can be used to identify how

students learn the syntax of programming

languages, whether they create complete programs

with all the necessary functionality, how effective

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATION

DOI: 10.37394/232010.2024.21.13

Anna Hovakimyan, Siranush Sargsyan

E-ISSN: 2224-3410

112

Volume 21, 2024

their programs are, and which tasks are the most

challenging for them.

The data that needs to be analyzed consists of

vectors that represent students. Each vector reflects

the progress made by the respective student in a

training module section and provides a summary of

his task completion. The student vector contains

various components such as the student's ID,

number of tasks completed, task numbers, number

of completed tasks that ended in a particular status,

and the dates of the first and last submission of

completed tasks. These vectors will be used to

cluster the data.

Clustering is a process of dividing a set of

objects into groups called clusters. The main goal is

to place similar objects in one cluster while keeping

significantly different objects in separate clusters,

[16]. A cluster can be defined as a group of objects

that share common properties. Data clustering helps

us to understand the key issues related to learning

and to identify weaknesses in the educational

process. The results of data analysis can be used to

create an individualized learning approach for each

student by developing an adaptive learning scenario,

[11], [18].

The process of cluster data analysis is carried

out in several stages. First, features are selected

based on which clustering will be performed. Next,

a measure of distance between objects is chosen.

Then, a clustering method is selected. Finally, the

reliability of the clusters is interpreted and assessed,

[16].

In this work, two cluster analysis methods were

used: the K-means method and the DBSCAN

method, [16].

The K-means method involves selecting K

(K>=2) centroids randomly among the data. The

distance from each data point to each centroid is

then calculated, and each data point is assigned to

the cluster with the nearest centroid. After this, the

centroids are recalculated by taking the average of

all points in a given cluster. The algorithm ends

when the centroids of the newly formed clusters do

not change, or the data remains in one cluster

without moving from one cluster to another, or if the

maximum number of assigned iterations has been

completed, [7], [15], [16].

The DBSCAN algorithm, short for Density-

based spatial clustering of applications with noise, is

a clustering technique that takes into account the

distribution density of a random variable. It works

by grouping points that are located close to each

other and labeling the points that are situated in

areas of low density as noise. The algorithm

identifies neighboring points within a specific

neighborhood of a chosen point. If the number of

such points exceeds a given threshold, a new cluster

is created. Otherwise, the point is marked as noise.

Then the algorithm assigns all the points from the

given neighborhood to the same cluster as the main

point. These steps are repeated for unvisited points

as well as for those that are marked as noise, [16].

Clustering provides an easily interpretable

pattern of results by using the centroid values of

each cluster. The centroid represents the most

typical data or prototype in a cluster. However, it

does not necessarily describe any specific instance

in that cluster.

This work addresses the challenges of analyzing

task complexity, tracking individual student

progress, and evaluating educational paths in a

course topic.

To categorize the tasks based on their level of

difficulty and identify topics that may require

further study, k-means cluster analysis is conducted

[16]. The problems are categorized into three

different levels of difficulty: easy, medium, and

advanced. This is accomplished by computing the

percentage of correct answers for each problem, and

then conducting cluster analysis to group the tasks

based on their level of difficulty for students (Figure

2). The DBSCAN method is utilized to assess

students' individual work on a specific task. The

data processed by the method pertains to each

student and the particular task, reflecting the amount

of effort the student put into solving the given task.

The data vector includes information about the

student's ID number, task number, number of

attempts, and the date when results such as

"compilation error," "working time exceeded," or

"incorrect answer" were recorded.

Fig. 2: Cluster analysis of tasks by complexity

The DBSCAN method clusters a set of these

vectors, and the results are visualized (Figure 3).

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATION

DOI: 10.37394/232010.2024.21.13

Anna Hovakimyan, Siranush Sargsyan

E-ISSN: 2224-3410

113

Volume 21, 2024

Fig. 3: DBSCAN analysis of one specific task

By using statistical analysis of data provided by

the e-judge system, one can generate summary

information that includes both solved and unsolved

tasks. This information can be used to develop an

effective learning strategy (Figure 4), [11], [18].

Fig. 4: Information about solutions to problems

The teacher can obtain information about the

student՛s progress regarding the specific task.

Filtering is performed by the student name and a

task, and the student’s progress in solving the

problem is visualized (Figure 5).

Fig. 5: Filtering by the student’s name and a specific

task

4 Conclusion

Most of the studies on the use of data mining

techniques in the field of education (EDM) focus on

the use of e-learning technologies such as Moodle,

WebCT, Blackboard, Classroom, etc., and student

knowledge automated testing using tools like

Google Forms, Moodle, e-judge, Code Signal, etc.,

[17], [19].

Currently, the main challenge is to evaluate the

level of attainment of the final educational outcomes

(learning outcomes), where student performance

plays a crucial role. Predicting student performance

based on their learning path can help both students

and teachers enhance their learning and teaching

methods. Frequently, the current research is limited

by the statistical techniques utilized for processing

data. The application of data mining methods to the

educational process is an important issue that

requires attention. This includes data collection,

problem formulation, clarification of the methods

used, determination of forecasting goals, and

practical application of the results obtained.

One practical way to implement research results

is to develop recommendation systems for

personalized learning. These systems will provide

students with a learning path that is tailored to their

needs. Recent research on personalized learning

systems examines some simple features such as

learner preferences, interests, and learning and

testing behavior.

This paper demonstrates the outcomes of a

cluster analysis that was applied to data related to

teaching programming and programming languages.

The purpose of this analysis is to identify the

shortcomings in the educational process and to

develop new teaching and learning strategies that

are tailored to the needs of the students.

If more data becomes available in the future, it

will be possible to apply additional data analysis

methods that will yield more effective results. It

would also be desirable to have the ability to

automatically connect to databases of automated

testing systems to process real-time data, [17], [19].

The analysis via data mining methods provides

insights into the importance of required courses in

the syllabus. A valuable tool for teachers is an

analysis tool that predicts the degree to which a

student will achieve her/his educational goals.

We consider also an opportunity to use machine

learning technologies to predict the expected

learning outcomes for a student, as well as to create

warning and recommendation systems to avoid

undesirable academic outcomes, [6], [15].

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATION

DOI: 10.37394/232010.2024.21.13

Anna Hovakimyan, Siranush Sargsyan

E-ISSN: 2224-3410

114

Volume 21, 2024

References:

[1] A. A. Barseghyan, M. S. Kupriyanov, I. I.

Kholod, M. D. Tess, S. I. Elizarov. Analysis of

data and processes: textbook. 3rd ed., BHV-

Petersburg, 2009 (in Russian).

[2] Коvalev E.E. A System Model and Tools for

Modernization of Federal and Regional

Digital Services of Statistics and Data

Analytics in Education, Lecture Notes in

Networks and Systems (Volume 1). Germany.

Springer Nature, 2021.

[3] Fiofanova O. A. Data Analysis Competencies

in Professional Standards: From Data-Experts

to Evidence-Based Education / Advances in

Natural, Human-Made, and Coupled

HumanNatural Systems Research, Lecture

Notes in Networks and Systems (Volume 1).

Germany, Springer Nature, 2021.

[4] Ch. Fischer, Z. A. Pardos, R. Sh. Baker,

.J.Williams, P. Smyth, R. Yu,

S.Slater,R.Baker, M. Warshauer, Mining Big

Data in Education: Affordances and

Challenges. Review of Research in Education.

Vol. 44, issue 1, 2020. pp. 130–160. doi:

https://doi.org/10.3102/0091732X20903304.

[5] Nong Ye. The Handbook of Data Mining.

CRC Press, 2004.

[6] Mohammed J.Zaki and Wagner Meira. Data

Mining and Machine Learning: Fundamental

Concepts and Algorithms. 2nd ed., Cambridge

University Press, 2020.

[7] Siranush Sargsyan, Anna Hovakimyan,

Varditer Kerobyan. An Approach to

Developing and Implementing a

Recommendation System. International

Journal of Economics and Management

Systems. ISSN: 2367-8925, Vol.7, 2022, pp.

270-273, [Online].

https://www.iaras.org/home/caijems/an-

approach-to-developing-and-implementing-a-

recommendation-system (Accessed Date:

October 20, 2024).

[8] A. H.Cairns, B.Gueni, М. Fhima, A. Cairns,

S.David, N. Khelifa. Process Мining in the

Education Doмain. International juornal on

Advances in Intellegent Systems, Vol.8,

no.1&2, 2015, pp.219-232, [Online].

https://www.thinkmind.org/library/IntSys/IntS

ys_v8_n12_2015/intsys_v8_n12_2015_18.ht

ml (Accessed Date: October 20, 2024).

[9] Ginica Mahajan, Bhavna Sahini,

EducationalData Minig: a state-of-the-art

survey on tools and techniques used in EDM,

International Journal of Computer

Applications & Information Tecnology,

Vol.12, No.1, 2020, pp. 310-316, [Online].

https://www.researchgate.net/publication/340

983783 (Accessed Date: October 20, 2024).

[10] Boyarinov D. A. Knowledge integration maps

in the context of the issue of automated

pedagogical information processing Problemy

sovremennogo obrazovaniya. 2019, No. 6, pp.

232–239, [Online].

http://www.pmedu.ru/images/2019-6/22.pdf

(in Russian). (Accessed Date: October 20,

2024).

[11] Khan M. A., Khojah M., Vivek V. Artificial

Intelligence and Big Data: The Advent of

NewPedagogy in the Adaptive E-Learning

System in the Higher Educational Institutions

of Saudi Arabia. Education Research

International. Vol.2, 2022, Article ID

1263555, 10p, doi:

https://doi.org/10.1155/2022/1263555.

[12] C.Romero, S.Ventura, М. Pechenizkiy and

R.Baker. Handbook of Educational Data

Мining., Taylor&Francis, 2010.

[13] V.Sothavilay, K. Yacef and R.A.Calvo,

Process mining to support student’s

collaborative writing, 3rd International

Conference on Educational Data Мining

Proceedings, Pittsburgh,PA, June 11-13,

2010, pp.257-266, [Online].

https://educationaldatamining.org/EDM2010/

uploads/proc/2010%20Proceedings%20Prefac

e,%20TOC.pdf (Accessed Date: October 20,

2024).

[14] М. Pechenizkiy, N.Treka, E.Vasilyeva and P.

De Bra. Process mining online assesment

data, 2nd International Conference on

Educational Data Мining (EDМ09)

Proceedings, Cordoba, Spain.July 1-3, 2009,

pp.279-288, [Online].

https://www.educationaldatamining.org/EDM

2009/uploads/proceedings/edm-proceedings-

2009.pdf (Accessed Date: October 20, 2024).

[15] Andreas C. Müller, Sarah Guido, Introduction

to Machine Learning with Python: A Guide

for Data Scientists, O’Reilly, 2017.

[16] L. Kaufman and P. Rousseeuw, Finding

groups in data: an introduction to cluster

analysis, Wiley&Sons, Inc. 1990.

[17] eJudge, [Online]. http://ejudge-y.ispras.ru/c/,

http://ejudge-y.ispras.ru/asm/ (Accessed Date:

June 26, 2024).

[18] Boyarinov D. A. Pedagogical Model for

Creating Individual Learning Paths Based

onEducational Maps. VI International Forum

on Teacher Education,ARPHA Proceedings 3,

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATION

DOI: 10.37394/232010.2024.21.13

Anna Hovakimyan, Siranush Sargsyan

E-ISSN: 2224-3410

115

Volume 21, 2024

Kazan, 2020, pp. 277–289.

https://doi.org/10.3897/ap.2.e0277.

[19] C.Romero, S.Ventura and E. Garcia. Data

mining in course management systems:

moodle case study and tutorial. Computers&

Education, Vol.51, No.1, 2008, pp.368-384,

https://doi.org/10.1016/j.compedu.2007.05.01

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

- Anna Hovakimyan carried out the gathering of

information from the e-judge system.

- Siranush Sargsyan was responsible for the

implementation of clustering algorithms.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATION

DOI: 10.37394/232010.2024.21.13

Anna Hovakimyan, Siranush Sargsyan

E-ISSN: 2224-3410

116

Volume 21, 2024