Visual Attention Patterns in Finding Source Code Defects
CHRISTINE LOURRINE S. TABLATIN
Information Technology Department,
Pangasinan State University,
San Vicente, Urdaneta City, Pangasinan,
PHILIPPINES
Abstract: - Existing research used visual effort metrics to determine the visual attention patterns of participants
with varying skill levels while finding source code defects. While most of the findings of these studies agree on
the results for fixation count metrics, there are differences in the results for fixation duration metrics. Therefore,
there is a need for further investigations on the use of visual effort metrics to determine the difference in the
visual effort of experts and novices between multiple programs. Thus, we aimed to identify the factors affecting
the varying results on fixation duration metrics and validate the results on fixation count metrics. We used
visual effort metrics to identify the visual attention patterns of high and low-performing students engaged in
defect-finding tasks on multiple programs. We performed statistical tests on the total fixation count, fixation
counts on the error lines, total fixation duration, and fixation duration on the error lines to determine the
difference in the visual attention patterns between the groups. Among the fixation metrics used, only the total
fixation duration metric revealed a significant difference between the high and low-performing students across
all programs. High-performing students spent less time on simple programs with simple error types but spent
more time on complex programs with logical and semantic error types. In contrast, low-performing students
focused more attention on easy programs with one or more syntax errors compared to high-performing
students. The results of this study could shed some light on the contrasting findings of previous studies
regarding fixation duration. These findings suggest that visual attention patterns of high and low-performing
students may vary on multiple programs. The amount of visual effort exerted by the group depends upon the
program’s complexity, location of errors in the source code, type of errors injected, and the number of lines of
code. This implies that the time spent finding the errors may be associated with the characteristics of the
programs and the location and type of injected errors. Therefore, researchers must provide detailed information
on these characteristics when describing differences in visual effort metrics between subjects engaged in bug-
finding tasks.
Key-Words: - Visual effort metrics, fixation, visual attention, visual attention patterns, debugging, fixation
metrics, program comprehension
Received: August 23, 2023. Revised: October 5, 2023. Accepted: October 16, 2023. Published: November 1, 2023.
1 Introduction
Program comprehension is an integral aspect of
software development since other programming
activities depend on it, [1]. Programmers spend a
considerable amount of time comprehending source
code, [2]. Unlike natural language comprehension,
source code is a structured document that might be
difficult to understand, [3]. Source code
comprehension does not only involve understanding
text structure and meaning but also understanding
code execution. Programmers, therefore, need to
master their ability to trace source code execution
along with their ability to read its words and
structures, [4].
Eye trackers have become a standard tool for
conducting empirical studies in programming by
recording eye-movement data of participants while
performing a task to capture their visual attention,
[5]. Eye trace data reveals the focus of attention and
how it travels within the stimuli, providing
important insight into the underlying cognitive
processes of a subject, [6]. The increased use of eye-
tracking data in understanding the cognitive
processes of subjects while performing program
comprehension tasks is based on theories about
comprehension and eye movements.
The first study to employ eye-tracking data was
conducted in 1990 to analyze students'
comprehension processes while reading algorithms,
[7]. Since then, researchers and computer science
educators have been actively investigating how
programmers think using eye-tracking data while
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
375
Volume 20, 2023
they complete programming tasks such as program
comprehension, [8], [9] [10], model comprehension,
[11], [12], debugging, [13], [14], [15], requirements
traceability, [16], [17], and collaborative
programming, [18], [19]. While most of these
published papers focused on program
comprehension and debugging, [20], we still have
limited knowledge about the visual strategies
employed when finding bugs on static source codes.
Several studies have attempted to identify the
differences in the visual attention patterns of experts
and novices while locating bugs in static source
codes using visual effort metrics, [21], [22], [23],
[24]. Among the visual effort metrics, fixation
counts, and fixation durations are commonly used
by previous studies to determine the visual attention
of participants while comprehending source code.
Although no visual processing occurs during
saccades, [22], used saccade lengths and fixation
durations to measure the visual attention of novice
and advanced programmers while identifying bugs
or determining the program output.
Previous studies revealed that experts tend to
focus more on areas where the error is while novices
read the codes more broadly, [21], [23]. The
findings imply that experts had higher fixation
counts and higher fixation durations on the error
lines. The study, [24], supports their findings in
terms of fixation counts on the error lines, but not
fixation durations on the error lines. Fixation
durations and saccade lengths showed that advanced
programmers had shorter fixations and saccades,
[22], indicating that they can easily understand and
see more details in the code. This result differs from
that of, [21], [23], in terms of fixation durations, but
it is consistent with, [24]. The analysis of gaze
patterns of individuals in programming pairs
characterized the more successful participants to
have higher overall fixation counts, higher fixation
counts on error lines, and longer fixation duration
per program, [25]. This result is not in accordance
with the findings of these studies, [22], [24].
While most of the reviewed studies agree on the
results for fixation count metrics, there are
differences in the findings for fixation duration
metrics. Therefore, further investigations on the use
of fixation metrics to determine the difference in the
visual effort of experts and novices between
multiple programs are required to establish a general
trend. Thus, this study aimed to identify the factors
affecting the varying results on fixation duration
metrics and validate the results on fixation count
metrics. We used visual effort metrics to determine
the visual attention patterns of high and low-
performing students engaged in defect-finding tasks
on multiple programs. This study is similar to [21],
in terms of objective, comprehension task, and
analysis employed to assess the visual effort exerted
by the students. Unlike the previous studies that
utilized programs written in C, C++, and Python, we
used programs written in Java with different
numbers of lines of code, numbers of injected bugs,
and cyclomatic complexity.
Exploring the visual strategies employed by
experts or high-performing students when
performing comprehension tasks will allow us to
identify effective strategies that can be explicitly
taught to low-performing students to enhance their
code comprehension and debugging skills, [26].
2 Methodology
This paper is an analysis of a larger eye-tracking
study on programmer tracing and debugging skills
as well as the development of higher education’s
capacity to conduct eye-tracking research. The
methods discussed here are also discussed in the
study of, [10], [27].
2.1 Participants
The participants of this study were Computer
Science and Management Information Systems
students who have at least taken a college-level
introductory programming course using Java as the
programming language. A total of 64 undergraduate
students from four universities in the Philippines: 16
from School A, 17 from School B, 16 from School
C, and 15 from School D, participated in the eye-
tracking experiment. The study used two participant
groups: high-performing and low-performing. The
scores of the participants in the debugging tasks
were used to assign them to a particular group.
High-performing group consisted of students who
scored above and equal to the mean score while the
low-performing group consisted of students who
scored lower than the mean score.
2.2 Datasets
The Ateneo Laboratory for Learning Sciences
previously collected the data set used in this study,
which is part of a larger study on programmer eye-
tracking behavior. The eye-tracking data of 64
students were collected and saved in an individual
Comma-Separated Values (CSV) file. The file is
composed of information regarding the fixation
timestamp, the location of fixation, fixation
duration, blinking count, pupil dilation, and separate
values for the left and right eye movements. For the
analysis of this study, only the fixation location, and
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
376
fixation duration were extracted from the individual
CSV file. This study used a total of 238,733 fixation
data points.
2.3 Experimental Setup and Procedure
All participants underwent a screening process.
Students were given an informed consent form to
fill out and sign. Screening questionnaires were
distributed to determine their eligibility to take part
in the study. Students who passed the initial
screening were required to undergo a nine-point
eye-tracking calibration test. A written program
comprehension test (20 minutes) was administered
after the successful calibration test to determine the
student’s prior knowledge of programming. The
actual eye-tracking experiment which was designed
for 60 minutes followed after the written pre-test.
The Gazepoint eye tracker was used in the eye-
tracking experiment with a sampling rate of 60Hz
and 0.5-1 degree of accuracy. The screen resolution
of the monitor was set to 1366 x 768 and the source
code was presented in a full-screen window. The
participants were asked to read 12 program codes
with known errors and should mark the location of
the errors using the mouse. There is no need for the
participants to correct them. Figure 1 shows the
standard setup of the eye-tracking experiment.
Fig. 1: Standard Set-up of the Eye Tracking
Experiment
2.4 Hardware/Software Setup
A slide sorter program (Figure 2) with buttons
Previous, Next, Reset, and Finish were created to
display the specifications of the program followed
by the program code with injected bugs. The
Previous and Next buttons were used to navigate
through the slides. The Reset button was used to
clear the marked error locations of a particular slide
while the Finish button was used to save the marks
and end the debugging session.
2.5 Task Stimuli and Injected Defects
The experiment had 12 program codes as stimuli
which are typically written by novice programmers.
All program codes are written in Java language,
contain intermediate syntax and constructs, and
consist of varied lines of code. All codes presented
to the participants were guaranteed to fit on the
computer screen for readability and no scrolling was
required.
Fig. 2: Screenshot of the Slide Sorter Program
Bugs were intentionally added to the program
codes. Few of these injected bugs take a minimal
number of scans to detect. But quite a number take a
considerable amount of time and may involve the
participant’s analytical skills and prior knowledge in
programming. Each program was assigned either 1
or 3 bugs, with different numbers of lines of codes,
cyclomatic complexity, and nested block depth as
shown in Table 1.
Table 1. Code Description, Line Numbers of
Injected Defects, and Metric Value of Codes
Code
Description
Lines
of
Code
Line
with
Error
Cyclom
atic
Comple
xity
Nested
Block
Depth
P01
What’s the Next
Number?
17
10
2
1
P02
Reverse of Strings
15
15
2
2
P03
Arrow
16
14
1
1
P04
Q Prime
26
15, 22,
29
6
4
P05
Parenthesis
Matching
24
13, 17,
23
6
2
P06
Palindrome
19
15, 18,
22
3
3
P07
Rock, Paper, and
Scissor
34
20, 26,
30
9.5
1.5
P08
The Diamond
Pattern
27
11, 13,
18
7
2
P09
Paralleloword
29
15, 24,
30
7
3
P10
Consecutive Words
16
11, 14,
16
3
2
P11
Earthquake’s Class
25
11, 18,
24
7
1
P12
Basic Calculator
28
11, 22,
27
5
1
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
377
Errors to locate range from the easiest to spot
(syntax errors) to the hardest (semantic and logical
errors). Table 2 presents the characteristics of the
errors consisting of the description, type, and
location of the errors injected in each program. Only
1 error was injected for programs 1-3, while
programs 4-12 had 3 errors each. The injected errors
were of different types and were injected in different
lines and sections of the programs.
2.6 Experiment Procedure and Data
Segmentation
The participant was asked to sit comfortably, and
then the eye-tracker was adjusted to detect the eyes
of the participant. Calibration of the eye-tracker on
the participant was done by asking the participant to
follow the white circle/red dot to check if the
calibration was proper. Calibration was done since
the accuracy and precision of the eye tracker data
depend on successful calibration. When the
calibration was successful, the participant started the
experiment. The Gaze Video was closed to avoid
recording the face of the participant while the
experiment was ongoing. During the experiment
process, the participant had to view all 12 short Java
programs and locate the bugs. Out of the 12 short
programs, 9 of them had 3 bugs while 3 had 1 bug
each. The participant was asked to view the
programs in one setting by showing the program
description of each program first, followed by the
program code. The eye tracker records the visual
behavior of the participant while reading the static
source code to find the bug/bugs in the program. A
red oval appears on the screen and the participant
marks the location of the errors in the program using
a mouse click.
The eye tracker system continually records all
the gaze movements of the participant and stores
them in a CSV file format. Data such as the time of
the recording (timestamp) when fixations occur, the
location of the fixations (values of x and y
coordinates), the fixation duration of each fixation,
pupil dilation, blinking counts, and separate values
for the left and right eye movements were recorded.
The data from the CSV file were segmented to
extract the basic values needed for the analysis.
2.7 Data Pre-Processing
The segmented file by stimulus viewed by each
participant was used in this study. However, not all
the data included in the CSV file were used in the
analysis. In the context of this study, only the x and
y coordinates and the fixation durations were
needed for the data analysis.
The participants viewed 12 short Java programs
consisting of various constructs and lines of code.
Areas of Interest (AOIs) of these programs were
defined based on the lines of code excluding blank
lines using the OGAMA Areas of Interest module,
[28], to get the AOI coordinates. In this study, a
simple rectangle shape was used to mark the AOIs.
The AOIs were defined per line of code to
determine the visual attention exerted on each line
of code, particularly the error lines.
The AOI coordinates extracted from OGAMA
return coordinates with respect to the setting of the
screen resolution specified when the AOIs were
defined. To map the location of fixations to the
program codes, the x and y coordinates from the
eye-tracking data were converted by multiplying the
x coordinates with 1366 and the y coordinates with
768. This was done to match the coordinates of the
program codes during the experiment since the
screen resolution used was 1366 x 768. In addition,
the fixation durations were recorded in terms of
seconds by the eye tracker and were converted into
milliseconds by multiplying the duration by 1000.
These processes were done for the eye-tracking data
of the 64 students consisting of 238,733 data points
to determine their visual attention patterns.
2.8 Data Analysis
To determine the difference in the visual attention
patterns of high-performing and low-performing
students, fixation counts and fixation durations were
used. The total fixation counts on the entire stimulus
(TotalFC), total fixation durations (ms) on the entire
stimulus (TotalFD), total fixation counts on the error
lines (FCBugLine), and total fixation durations (ms)
on the error lines (FDBugLine) were computed to
measure the visual effort exerted by the high and
low-performing students.
Statistical analysis was conducted to compare
the visual efforts of the high and low-performing
students using the visual effort metrics stated above.
Independent samples t-tests were used to determine
whether there is statistical evidence that the visual
attention patterns are significantly different between
high and low-performing students. Independent
samples t-tests were used on the total fixation
counts, total fixation durations, total fixation counts
on the error lines, and total fixation durations on the
error lines of the high and low-performing students
while performing debugging tasks.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
378
3 Results and Discussion
The eye-tracking data of 64 students from four
private universities in the Philippines were used in
the analysis. Twenty-five (25) students were
identified as high-performing while thirty-nine (39)
students were considered low-performing based on
their debugging scores. However, after data pre-
processing, one (1) eye tracking data from the high-
performing group was discarded since most of the
fixations on the stimuli were recorded with negative
x- and y- gaze coordinates and could not be mapped
to the identified AOIs. Thus, only the eye gaze data
of 63 students were analyzed to determine the
difference in the visual attention patterns of high-
performing and low-performing students.
3.1 Visual Attention Patterns using Total
Fixation Counts
The overall fixation counts on the entire stimulus
(Figure 3) show that low-performing students have
higher fixation counts on programs 1-4. Programs 1-
3 had 1 injected defect each and had the lowest
cyclomatic complexity and number of lines of code.
Further, the errors are generally simple, consisting
of missing and additional semicolons, and the use of
print instead of println. Although Program 4 had 3
injected defects, these were easy to determine since
they were all syntax errors. High-performing
students, on the other hand, had more fixation
counts on most of the programs that are
characterized by more lines of code, cyclomatic
complexity, and the number of errors. Furthermore,
based on the profile plot, we can see that there is a
big difference between the fixation counts of high
and low-performing students in programs 6, 9, and
10. These programs had 3 injected defects
consisting of syntax, logical, and semantic types of
errors. Program 9 had 3 logical errors located on the
repetition structures, had a cyclomatic complexity of
7, and had 29 lines of code. Programs 6 and 10 had
a cyclomatic complexity of 3, 19, and 16 lines of
code respectively, and had combinations of syntax,
semantic, and logical errors. High-performing
students also had higher total fixation counts in
Program 12 than the low-performing students. This
program had 3 injected defects consisting of 2
logical errors and 1 semantic error.
The visual summary of the distribution and
skewness of the data on the total fixation counts is
presented in Figure 4 to supplement the information
on the profile plot. We can see from the boxplots
that several outliers are present in both groups. The
boxplots show that 1 or 2 low-performing students
had higher fixation counts than the maximum value
of the low-performing group on several programs
while 3 high-performing students in Program 2 and
1 in Program 7 had higher fixation counts than the
maximum value of the high-performing group.
Further, the median fixation counts of the high-
performing group in Programs 6, 9, 10, and 12 show
great difference from the median fixation counts of
the low-performing group. The box plots also show
that the data of the high-performing group are more
dispersed in the majority of the programs compared
to the low-performing group. Furthermore, the
distribution of data for most of the programs is
positively skewed for both groups.
An independent samples t-test was performed to
determine if there was a significant difference in the
total fixation counts of the high and low-performing
students. The result of the analysis revealed that, if
the type of task is not considered, there is no
significant difference in the total fixation counts of
the high-performing students (M = 336.50, SD =
79.60) and low-performing students (M = 293.83,
SD = 90.47), t (61) = -1.901, p = 0.062. The result
suggests that the total fixation counts are similar for
both high and low-performing students across the
different programs. This result is not surprising
given that the tasks have varying levels of
difficulties and presumably would require varying
fixation counts as can also be inferred from the
boxplots in Figure 4.
To investigate this matter more deeply,
independent sample t-tests were conducted for each
program to check if there were programs that would
reveal statistical significance between the high and
low-performing groups. The current partition of the
high and low-performing groups was revised before
performing the statistical tests. This was done since
a student who does well overall but poorly on a
specific task may incorrectly contribute high-
performing” techniques on that particular task if the
current partition was retained for the analysis. Thus,
the high and low performers were partitioned based
on their scores in each type of task to determine the
techniques that may enable a student to do well in
each of the different problems. High-performing
students comprise those whose scores are greater
than or equal to the mean score for the given task
while low-performing students comprise those
whose scores are less than the mean score.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
379
Fig. 3: Profile Plot of the Total Fixation Counts
Fig. 4: Boxplots of the Total Fixation Counts
Fig. 5: Profile Plot of the Total Fixation Counts on the Error Lines
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
380
Table 3. Result of Statistical Tests on Total Fixation
Counts
Prog
ram
No.
Group
Mean
Standard
Deviation
t-value
p-
value
1
Low
271.54
159.04
1.965
0.054
High
204.03
113.37
2
Low
215.80
88.68
1.475
0.145
High
173.29
124.69
3
Low
195.53
151.75
1.496
0.141
High
149.00
87.66
4
Low
374.40
236.78
1.263
0.211
High
311.75
151.39
5
Low
393.52
235.78
-0.541
0.591
High
424.68
215.63
6
Low
320.35
165.40
-1.803
0.076
High
405.86
189.68
7
Low
355.87
207.60
-1.354
0.181
High
431.00
231.82
8
Low
375.89
349.03
-0.312
0.756
High
398.32
169.97
9
Low
375.20
265.72
-0.081
0.936
High
380.22
227.16
10
Low
307.30
182.08
-1.405
0.165
High
379.35
224.18
11
Low
254.47
127.67
-0.217
0.829
High
252.44
123.61
12
Low
237.70
101.69
-1.017
0.313
High
265.14
109.01
No significant difference was observed in the
total fixation counts of the high and low-performing
groups in each problem, as seen in Table 3.
However, high-performing students were associated
with lower fixation counts for Programs 1-4
compared to the low-performing students. These
programs were injected with simple error types and
have low cyclomatic complexity. Higher fixation
counts were observed in high-performing students
for Programs 5-12. These programs are more
complex given the cyclomatic complexity, number
of lines of code, number of errors, error locations,
and type of injected errors. Thus, the fixation
counts of the high-performing groups can be
associated with the characteristics of the program
codes. High-performing students exert more visual
efforts on complex programs with errors that are
difficult to identify such as logical and semantic
errors.
Advanced programmers perform at the level of
novices when the program violates the plans, and
the programming discourse rules, [29]. A high
number of fixations were observed from the high-
performing students in Programs 5-12 since the
logical and semantic errors injected in these
programs violate the plan or mental schema, making
it hard to comprehend the program. Repeated
fixations to relevant elements in the program code
are necessary to correctly identify the logical and
semantic errors in the program though, a high
number of eye fixations on program codes reflects
an inefficient approach to finding information, [30].
The results indicate that high-performing students
had more fixation counts on complex programs with
logical and semantic errors, even if there was no
significant difference across all programs between
the high and low-performing students. Further,
based on the number of fixations on all lines of code
in each program, both groups had more visual
attention on lines consisting of control structures,
variable declarations, and compound inputs and
assignment statements. This finding also suggests
that these program elements can be considered as
beacons that facilitate program comprehension and
identification of bugs.
3.2 Visual Attention Patterns Using Fixation
Counts on the Error Lines
The profile plot of the overall fixation counts on the
error lines (Figure 5) shows that high-performing
students have more fixation counts on errors
injected in Programs 2 and 6 while errors injected in
the majority of the programs have almost the same
number of fixations from both groups. Program 2
had only one error and is located in the repetition
structure. Program 6 on the other hand, had 3 errors,
and two of the errors are also located in the
repetition structure. These errors could be
repeatedly fixed if the students follow the control
flow of the program. The error in Program 2 is
located on line 15 which has an additional
semicolon in the for-loop structure. The buggy line
of code received the highest number of fixations
among all the lines in this program from both
groups. However, the high-performing group had a
higher percentage of fixations (27%) compared to
the low-performing group (20%). The same result
was observed for the number of fixations on the
error lines in Program 6 where the high-performing
group had a higher percentage of fixations (37%)
compared to the low-performing group (32%).
Figure 7 provides additional information on the
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
381
distribution and skewness of the fixation counts on
the buggy lines of code.
We can see from the boxplots (Figure 6) that
several outliers are present in both groups. The box
plots show that a greater number of extreme values
from both sides of the box plots can be observed in
the low-performing group than in the high-
performing group. Among all the boxplots between
the groups, a big difference in the median fixation
counts on the buggy lines of code of the high-
performing group can be observed in Program 6.
The box plots also show that the data of the high-
performing group are more dispersed in the majority
of the programs compared to the low-performing
group. Furthermore, the distribution of data is either
positively or negatively skewed for both groups.
An independent samples t-test was also
performed to determine if there was a significant
difference in the total fixation counts on the error
lines of the high and low-performing students. The
result of the analysis revealed that, if the type of
task is not considered, there is no significant
difference in the total fixation counts on the error
lines of the high-performing students (M = 0.19, SD
= 0.03) and low-performing students (M = 0.18, SD
= 0.03), t (42.24) = -1.644, p = 0.108. The result
suggests that the total fixation counts on the error
lines are similar for both high and low-performing
students across different programs. Additional
statistical tests were performed since there was no
statistically significant difference between the
groups across the different programs. Independent
samples t-tests were conducted for each program to
distinguish between a high-performing student
working on a difficult task versus a low-performing
student working on an easy task or vice versa. The
partition of high and low-performing students used
in the statistical tests of fixation counts per problem
was also used in this analysis.
Only Programs 1 and 6 revealed a significant
difference between the groups, as shown in Table 4.
The high-performing students were associated with
statistically significantly higher total fixation counts
on the error lines of these programs than the low-
performing students. The number of fixations on an
AOI can be linked to its importance, [31]. Although
there is no significant difference in the visual
attention of high and low-performing students in
terms of the fixation counts on the error lines across
all programs, the result suggests that high-
performing students focused more on the control
structure elements of the program code. Thus, more
visual attention can be observed from the high-
performing students on errors located in the
repetition structures, particularly in Program 6.
Further, we can also observe that high-performing
students had more fixation counts on errors in 9 out
of 12 programs. Low-performing students had
slightly higher fixation counts on errors in Programs
3, 10, and 12. This result is in line with the findings
of, [21], [23], [24], that experts or advanced
programmers have more fixation counts on the
buggy lines of code.
Table 4. Result of Statistical Tests on Total Fixation
Counts on the Error Lines
Progr
am
No.
Group
Mean
Standard
Deviation
t-value
p-
value
1
Low
0.04
0.03
-3.425
0.001
*
High
0.09
0.06
2
Low
0.22
0.10
-0.472
0.638
High
0.23
0.16
3
Low
0.13
0.06
0.402
0.689
High
0.12
0.06
4
Low
0.17
0.07
-1.147
0.256
High
0.20
0.08
5
Low
0.21
0.05
-2.252
0.802
High
0.21
0.06
6
Low
0.29
0.09
-3.076
0.003
*
High
0.37
0.09
7
Low
0.07
0.04
-0.433
0.667
High
0.08
0.03
8
Low
0.20
0.06
-0.69
0.493
High
0.21
0.08
9
Low
0.13
0.05
-1.611
0.112
High
0.15
0.04
10
Low
0.41
0.14
0.362
0.719
High
0.40
0.12
11
Low
0.16
0.04
-0.315
0.754
High
0.17
0.04
12
Low
0.09
0.03
0.51
0.612
High
0.09
0.03
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
382
Fig. 6: Boxplots of the Total Fixation Counts on the Error Lines
Fig. 7: Profile Plot of the Total Fixation Durations
Fig. 8: Boxplots of the Total Fixation Durations
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
383
Fig. 9: Profile Plot of the Total Fixation Durations on the Error Lines
Fig. 10: Boxplots of the Total Fixation Durations on the Error Lines
3.3 Visual Attention Patterns using Total
Fixation Durations
Figure 7 shows the profile plot of the total fixation
durations of the high and low-performing students
while Figure 8 shows the visual summary of the
distribution and skewness of the data on total
fixation durations. Based on the profile plot, high-
performing students spent more time on programs
with more lines of code, code complexity, number
of errors, and programs with logical and semantic
error types. Low-performing students, on the other
hand, spent more time reading Programs 1-4. We
note that Programs 1-3 had only one error each and
Program 4 had 3 syntax errors, which can be
considered as simple programs with simple error
types.
The boxplots in Figure 8 show that several
outliers are present in both groups. The boxplots
show that there are extreme values on all programs
in the low-performing group while 3 out of 12
programs had extreme values in the high-performing
group. Further, the median fixation durations of the
high-performing group in Programs 6, 9, 10, and 12
show great difference from the median fixation
durations of the low-performing group. The box
plots also show that the data of the high-performing
group are more dispersed in the majority of the
programs compared to the low-performing group.
Furthermore, the distribution of data for many of the
programs is positively skewed for both groups.
An independent samples t-test was performed to
determine if there was a significant difference in the
total fixation durations between the high and low-
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
384
performing students. The result of the analysis
revealed that there is a significant difference in the
total fixation durations of the high-performing
students (M = 151063.59, SD = 36731.93) and low-
performing students (M = 125107.42, SD =
41689.59), t (61) = -2.508, p = 0.015. The result
suggests that the total fixation durations of the high-
performing students are significantly higher than the
low-performing students across the different
programs. This result has been published in, [10], as
a preliminary analysis to check for the difference in
the fixation duration of high and low-performing
students and to analyze if the difference has a
relationship with the pattern similarity. However,
the result presented in this paper discussed how the
complexity and difficulty of the programs and the
debugging task affect the visual attention patterns of
the students.
Fixation durations are influenced by the
complexity and difficulty of the visual content and
task being performed, [22], [31]. More visual effort
may be exerted for complex programs and programs
with errors that are difficult to identify such as
logical and semantic errors. This finding is
consistent with, [22], since high-performing
students have shorter fixation durations on Programs
1-4. This could mean that high-performing students
can understand the code easily and be able to see
more details in it compared to low-performing
students. However, high-performing students spent
more time on Programs 5-12 which consisted of a
greater number of errors, lines of code, and complex
programming constructs, and had combinations of
logical, semantic, and syntax errors. The complexity
of the program and the difficulty in identifying the
errors injected in the programs may have caused
low-performing students to stop trying to answer the
problems and move to other programs, resulting in
lower fixation durations. High-performing students,
on the other hand, employ more careful viewing of
the program codes to effectively identify the errors
resulting in higher fixation durations. As shown in
Figure 7, there is a big difference in the fixation
durations of the groups on Programs 6, 9, and 10.
These programs had complex programming
constructs and were injected with logical and
semantic errors that were difficult to identify.
Therefore, the difference in the overall fixation
durations between the groups is influenced by the
program constructs and the errors injected into the
programs. High-performing students spent less time
on simple programs with simple error types but
spent more time on complex programs with logical
and semantic error types.
3.4 Visual Attention Patterns using Fixation
Durations on the Error Lines
An independent samples t-test was also performed
to determine if there was a significant difference in
the total fixation durations on the error lines
between the high and low-performing students. The
result of the analysis revealed that there is no
significant difference in the total fixation durations
on the error lines of the high-performing students
(M = 0.20, SD = 0.04) and low-performing students
(M = 0.19, SD = 0.03), t (42.24) = -1.614, p = 0.112.
The result suggests that the total fixation durations
on the error lines are similar for both high and low-
performing students across the different programs.
Additional statistical tests were performed since no
statistically significant difference was observed
across the different programs. Independent samples
t-tests were conducted for each program to check if
there were programs that would reveal statistical
significance between the high and low-performing
groups. The partition of high and low-performing
students used in the previous sections was also used
in the statistics tests. Similar to the findings on
fixation counts on the error lines, only Programs 1
and 6 revealed significant differences between the
groups (Table 5). The high-performing students
were associated with statistically significantly
higher total fixation durations on the buggy lines of
code of Programs 1 and 6 than the low-performing
students.
The increased fixation duration in AOIs of the
visual stimulus could be used to detect more
difficult-to-process components or AOIs that are
engaging the cognitive resources of the observer,
[31]. Experts tend to focus more on areas where the
errors are located while novices read the codes more
broadly, [21], [23]. In contrast with the findings of
the latter, novices spent more time on the buggy
lines of code, [22], [24]. However, based on the
profile plot in Figure 9, fixation durations on the
buggy lines of code are almost similar for both high
and low-performing students except for Programs 2
and 6. Refer also to the boxplots in Figure 10 to see
the visual summary of the distribution and skewness
of the total fixation durations on the error lines of
the high and low-performing students. The boxplots
are similar to the boxplots presented in Figure 6
show the visual summary of the distribution and
skewness of the total fixation counts on the error
lines of the high and low-performing students.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
385
Table 5. Result of Statistical Tests on Total Fixation
Durations on the Error Lines
Program
No.
Grou
p
Mean
Standar
d
Deviatio
n
t-
value
p-value
1
Low
0.04
0.03
-3.578
0.001*
High
0.10
0.09
2
Low
0.24
0.11
-0.532
0.597
High
0.26
0.18
3
Low
0.14
0.08
-0.006
0.995
High
0.14
0.08
4
Low
0.18
0.10
-1.229
0.224
High
0.22
0.11
5
Low
0.21
0.06
-0.401
0.69
High
0.22
0.07
6
Low
0.30
0.11
-3.096
0.003*
High
0.39
0.11
7
Low
0.08
0.05
0.094
0.926
High
0.08
0.03
8
Low
0.20
0.08
-0.712
0.479
High
0.21
0.08
9
Low
0.14
0.05
-1.316
0.193
High
0.15
0.04
10
Low
0.43
0.16
0.199
0.843
High
0.43
0.13
11
Low
0.17
0.05
0.26
0.795
High
0.17
0.05
12
Low
0.09
0.04
-0.157
0.876
High
0.09
0.03
High-performing students spent more time
looking at the buggy lines of 9 out of 12 programs.
Errors in the programs located in the repetition
structure (for loop) which is a complex
programming construct since it consists of
initialization, condition, and increment/decrement
operation may be difficult to process. Thus,
increased fixation duration was observed. The result
of this study supports previous study findings, [21],
[23], that high-performing students or experts spent
more time on the error lines but only for Programs 1
and 6. Overall, the result shows that there is no
significant difference in the fixation durations on the
error lines between the high and low-performing
students. The contrasting results regarding the
findings in the fixation durations on the buggy lines
of code may be associated with the characteristics of
the programs used in their studies and the type and
location of errors injected in the programs.
Therefore, when describing the difference in
fixation durations on the buggy lines of code
between groups, it might be necessary to provide
information on the characteristics of the program
and the injected errors.
The difference in the visual attention of high and
low-performing students while performing a
debugging task can be measured using fixation
count and fixation duration metrics. The findings in
the analysis of visual attention suggest that longer
fixation durations and more fixation counts indicate
difficulty in identifying the buggy lines of code.
High-performing students exert more visual
attention on complex programs with logical and
semantic errors resulting in more fixation counts
and longer fixation durations compared to low-
performing students. This visual behavior of high-
performing students may be related to the field-
independent (FI) cognitive style theory of human
cognition. The FI individuals tend to choose a more
analytical processing approach and they pay
attention to details, [32]. Conversely, the visual
behavior of low-performing students may be related
to field-dependent (FD) cognitive style wherein they
choose a more holistic way of processing visual
information and experience difficulties in
identifying details in the complex visual stimulus. FI
individuals have significantly more and longer
fixations than FD individuals and these differences
may be associated with the analytical nature of FI
individuals.
4 Conclusion
This study contributes to the evidence of the
effectiveness of eye tracking as a method to enrich
computing education research. The analysis of the
visual effort metrics provided considerable insights
into the visual attention patterns of high and low-
performing students in finding source code defects.
Results revealed that high and low-performing
students could be distinguished based on their visual
attention patterns. High-performing students spend
more time on complex programs with logical and
semantic errors to effectively find source code
defects. On the other hand, low-performing students
spend more time on simple programs with simple
error types. The result of the difference in the time
spent in finding source code defects suggests that
high-performing students prefer a more analytical
approach while low-performing students choose a
more holistic approach. Our findings also revealed
that both high and low-performing students had
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
386
more visual attention on lines consisting of control
structures, variable declarations, and compound
inputs and assignment statements. These program
elements can be considered as beacons that facilitate
program comprehension and identification of source
code defects. These findings suggest that visual
attention patterns of high and low-performing
students may vary on multiple programs. The
amount of visual effort exerted by the group
depends upon the program’s complexity, location of
errors in the source code, type of errors injected, and
the number of lines of code. This implies that the
time spent finding the errors may be associated with
the characteristics of the programs and the location
and type of injected errors. Therefore, researchers
must provide detailed information on these
characteristics when describing differences in visual
effort metrics between subjects engaged in bug-
finding tasks.
By exploring the visual strategies employed by
the high-performing students using eye-tracking
data, we could develop learning materials and
activities that could help low-performing students
improve their code reading and debugging skills.
Debugging should be taught as a program
comprehension or program understanding task
rather than a search task. Students should learn how
to identify the relevant code elements of the
program to identify source code defects.
Programming educators should teach more problem-
solving activities to develop the student’s analytical
skills. Further, teaching students to consciously
employ debugging strategies according to code size,
program structure, and type of errors would enhance
their ability to plan for the debugging task.
Acknowledgement:
The author would like to thank the Ateneo de
Manila, Ateneo de Davao, University of
Southeastern Philippines, University of San Carlos,
Private Education Assistance Committee of the
Fund for Assistance to Private Education for the
grant entitled “Analysis of Novice Programmer
Tracing and Debugging Skills using Eye Tracking
Data” and Ateneo de Manila University’s Loyola
Schools Scholarly Work Faculty grant entitled
“Building Higher Education’s Capacity to Conduct
Eye-tracking Research using the Analysis of Novice
Programmer Tracing and Debugging Skills as a
Proof of Concept”. The author also thanks Prof.
Maria Mercedes Rodrigo for being her mentor in
conducting this study.
References:
[1] Sharma, K., Jermann, P., Nüssli, M., and
Dillenbourg, P. (2012). Gaze Evidence for
Different Activities in Program
Understanding. In Proceedings of 24th
Workshop of the Psychology of
Programming Interest Group, PPIG, pp.20-
31.
[2] Schröter, I., Krüger, I., Siegmund, J., and
Leich, T. (2017). Comprehending studies on
program comprehension. In Proceedings of
the 25th International Conference on
Program Comprehension (ICPC '17), IEEE
Press, Piscataway, NJ, USA, pp.308-311.
[3] Tvarozek, J., Konopka, M., Navrat, P., and
Bielikova, M. (2016). Studying Various
Source Code Comprehension Strategies in
Programming Education. In Proceedings of
the Third International Workshop on Eye
Movements in Programming: Models to
Data, pp.25-26.
[4] Busjahn, T., Bednarik, R., Begel, A., Crosby,
M., Paterson, J. H., Schulte, C., Sharif, B.,
and Tamm, S. (2015). Eye movements in
code reading: relaxing the linear order. In
Proceedings of the 2015 IEEE 23rd
International Conference on Program
Comprehension (ICPC '15). IEEE, pp.255-
265.
[5] Sharafi, Z., Soh, Z., Guéhéneuc, Y., and
Antoniol, G. (2012). Women and men—
Different but equal: On the impact of
identifier style on source code reading. In
Proceedings of the 2012 IEEE 20th
International Conference on Program
Comprehension (ICPC), IEEE Explore,
pp.27–36.
[6] Begel, A. and Vzrakova, H. (2018). Eye
movements in code review. In Proceedings
of the Workshop on Eye Movements in
Programming (EMIP '18). (Poland, 2018),
ACM, pp.1-5.
https://doi.org/10.1145/3216723.3216727
[7] Crosby, M. E. and Stelovsky, J. (1990). How
do we read algorithms? A case study.
Computer 23, 1 (1990), pp.25-35.
[8] Jbara, A. and Feitelson, D. G. (2017). How
programmers read regular code: A controlled
experiment using eye-tracking. Empirical
Software Engineering, 22(3), pp.14440-1477.
[9] Peachock, P., Iovino, N., And Sharif, B.
(2017). Investigating Eye Movements in
Natural Language and C++ Source Code—A
Replication Experiment. In Proceedings of
the 11th International Conference on
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
387
Augmented Cognition. Neurocognition and
Machine Learning (AC’17), Springer
International Publishing, pp.206–218.
[10] Tablatin, C. L. S., & Rodrigo, M. M. T.
(2022). Identifying Code Reading Strategies
in Debugging using STA with a Tolerance
Algorithm. APSIPA Transactions on Signal
and Information Processing, Vol.11(1).
[11] Cagitlay, N. E., Tokdemir, G., Kilic, O., and
Topalli, D. (2013). Performing and analyzing
non-formal inspections of entity relationship
diagram (ERD). Journal of Systems and
Software, 86(8), pp.2184–2195.
[12] Jeanmart, S., Guéhéneuc, Y-G., Sahraoui, H.,
and Habra, N. (2009). Impact of the visitor
pattern on program comprehension and
maintenance. In Proceedings of the 2009 3rd
International Symposium on Empirical
Software Engineering and Measurement,
IEEE Computer Society, pp.69–78.
[13] Barik, T., Smith, J., Lubick, K., Holmes, E.,
Feng, J., Murphy-Hill, E., and Parnin, C.
(2017). Do developers read compiler error
messages?. In Proceedings of the 39th
International Conference on Software
Engineering, IEEE Press, pp.575–585.
[14] Lin, Y., Wu, C., Hou, T., Lin, Y., Yang, F.,
and Chang, C. (2016). Tracking students
cognitive processes during program
debugging: an eye-movement approach.
IEEE Transactions on Education, 59(3),
pp.175–186.
[15] Chen, M. and Lim, V. (2013). Eye gaze and
mouse cursor relationship in a debugging
task. In HCI International 2013Posters
Extended Abstracts, Springer, pp.468-472.
[16] Ali, N., Sharafl, Z., Gueheneuc, Y-G., and
Antoniol, G. (2012). An empirical study on
requirements traceability using eye-tracking.
28th IEEE International Conference on
Software Maintenance (ICSM), (Italy, 2012),
IEEE, pp.191-200.
[17] Walters, B., Shaffer, T., Sharif, B., And
Kagdi, H. (2014). Capturing software
traceability links from developer’s eye gazes.
In Proceedings of the 22nd International
Conference on Program Comprehension
(ICPC’14), ACM, pp.201–204.
[18] Villamor, M. And Rodrigo, M. M. (2017).
Characterizing Collaboration in the Pair
Program Tracing and Debugging Eye-
Tracking Experiment: A Preliminary
Analysis. In Proceedings of the 10th
International Conference on Educational
Data Mining, pp.174-179.
[19] Villamor, M. M., & Rodrigo, M. M. T.
(2022). Predicting Pair Success in a Pair
Programming Eye Tracking Experiment
Using Cross-Recurrence Quantification
Analysis. APSIPA Transactions on Signal
and Information Processing, Vol.11(1).
[20] Obaidellah, U., Al Haek, M., and Cheng, P.
C-H. (2018). A Survey on the Usage of Eye-
Tracking in Computer Programming. ACM
Computing Surveys, 51 (1), 5:1-5:58.
[21] Chandrika. K. R., and Amudha, J. (2017). An
Eye Tracking Study to Understand the Visual
Perception Behavior while Source Code
Comprehension. International Journal of
Control Theory and Applications.
International Science Press, vol. 10(19),
pp.169-175.
[22] Nivala, M., Hauser, F., Mottok, J., and
Gruber, H. (2016). Developing visual
expertise in software engineering: An eye
tracking study. 2016 IEEE Global
Engineering Education Conference
(EDUCON), pp.613-620.
[23] Sharif, B., Falcone, M. and Maletic, J. I.
(2012). An eye-tracking study on the role of
scan time in finding source code defects. In
Proceedings of the Symposium on Eye
Tracking Research and Applications
(ETRA’12), ACM, pp.381-384.
[24] Turner, R., Falcone, M., Sharif, B., and
Lazar, A. (2014). An eye- tracking study
assessing the comprehension of C++ and
Python source code. In Proceedings of the
Symposium on Eye Tracking Research and
Applications (ETRA '14). (Florida, 2014)
ACM, NY, USA, pp.231-234.
[25] Villamor, M. And Rodrigo, M. M. (2019).
Analyzing Gaze Patterns of Individuals in
Programming Pairs. In Proceedings of the
Philippine Computing Science Congress
2019.
[26] Bednarik, R., Busjahn, T., and Schulte, C.,
(Eds.). (2014). Eye Movements in
Programming Education: Analyzing the
Expert’s Gaze. In Proceedings of the First
International Workshop, Finland, 2014, pp.1-
3.
[27] Tablatin, C. L., & Rodrigo, M. M. (2018).
Identifying Common Code Reading Patterns
using Scanpath Trend Analysis with a
Tolerance. In Proceedings of thee 26th
International Conference for Computers in
Education (ICCE 2018), Metro Manila,
Philippines.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
388
[28] Von Mayrhauser, A., and Lang, S. (1999). A
coding scheme to support systematic analysis
of software comprehension. IEEE Trans. on
Software Engineering, 25(4), pp.526–540.
[29] Soloway, E. and Ehrlich, K. (1984).
Empirical studies of programming
knowledge. IEEE Transaction on Software
Engineering, Vol. SE-10, No. 5, 595-609.
[30] Goldberg, J. H. and Kotval, X. P. (2010).
Computer interface evaluation using eye
movements: methods and constructs,
International Journal of Industrial
Electronics, 24(6), pp.631-645.
[31] Bylinskii, Z., Borkin, M. A., Kim, N. W.,
Pfisher, H., and Oliva, A. (2015). Eye
fixation metrics for large-scale evaluation
and comparison of information
visualizations. In Eye Tracking and
Visualization, eds M. Burch, L Chuang, B.
Fisher, A. Schmidt, and D. Weiskopf (Cham:
Springer), 235-255. Doi: 10.1007/978-3-319-
47024-5_14
[32] Raptis, G. E., Katsini, K., Belk, M., Fidas,
C., Samaras, G., and Avouris, N. (2017).
Using Eye Gaze Data and Visual Activities
to Infer Human Cognitive Styles: Method
and Feasibility Studies. In Proceedings of the
25th Conference on User Modeling,
Adaptation and Personalization (UMAP '17).
(Bratislava Slovakia, 2017), ACM, NY,
USA, pp.164-173.
Contribution of Individual Authors to the
Creation of a Scientific Article
The sole author of this scientific article
independently conducted and prepared the entire
work from the formulation of the problem to the
final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
This study was supported in part by the Private
Education Assistance Committee of the Fund for
Assistance to Private Education for the grant
entitled “Analysis of Novice Programmer Tracing
and Debugging Skills using Eye Tracking Data” and
Ateneo de Manila University’s Loyola Schools
Scholarly Work Faculty grant entitled “Building
Higher Education’s Capacity to Conduct Eye-
tracking Research using the Analysis of Novice
Programmer Tracing and Debugging Skills as a
Proof of Concept”.
Conflict of Interest
The sole author declared no potential conflicts of
interest with respect to the research, authorship,
and/or publication of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.40
Christine Lourrine S. Tablatin
E-ISSN: 2224-3402
389