On the Pulling Linear Regression and Its Applications in Digital
Mammograms
NAHATAI TEPKASETKUL1, WEENAKORN IEOSANURAK1,
THANAPONG INTHARAH2, WATCHARIN KLONGDEE1
1Department of Mathematics, Faculty of Science, Khon Kaen University,
THAILAND
2Department of Statistics, Faculty of Science, Khon Kaen University,
THAILAND
Abstract: - Regression analysis is a statistical approach used to investigate the correlations between variables,
especially linear regression, that is a simple but effective approach for analyzing the relationship between a
dependent variable and one independent variable. Since it has limitations based on the assumption that the
mean of the noise should be zero, there are still some areas where it may be improved. In this article, we
introduce a novel data fitting algorithm called the pulling linear regression, which is separated into two types:
the line-pulling linear regression and the band-pulling linear regression. The method is developed from linear
regression, which can create the regression line from the function that uses noise with various distributions. The
result demonstrates that the sequence of sum square errors of the pulling linear regression is convergent.
Moreover, we have a numerical example to show that the performance of the proposed algorithm is better than
that of linear regression when the mean of the noise is not zero. And the last, we have an application to smooth
the boundary of the pectoral muscle in digital mammograms. We found that the regression line of the proposed
algorithm can do better than the linear regression when we would like to remove only the muscle part.
Key-Words: - linear regression, pulling linear regression, sum square error, root mean square error
Received: April 29, 2022. Revised: January 15, 2023. Accepted: February 9, 2023. Published: March 2, 2023.
1 Introduction
Least squares linear regression was performed by
Legendre and Gauss for the prediction of planetary
movement, [1]. Regression analysis is a statistical
technique for examining correlations between
variables used in numerous domains, including
economics, engineering, physical science, biological
science, social science, and medicine, among many
others, [2].
Linear regression (LR) is a powerful and
adaptable method for dealing with regression
difficulties. The model definition, model estimation,
statistical inference, model diagnosis, variable
selection, and prediction are described
comprehensively, [3]-[6]. Therefore, researchers are
quite interested in the trend in LR models. For
example, Pérez-Domínguez et al., [7], offered a
contribution using linear regression and applied
Dimensional Analysis (DA) to solve instability and
error problems of the data transformation. Jokubaitis
and Leipus, [8], studied the asymptotic normality in
a high-dimensional linear regression where the
covariance matrix of the regression variables has a
KMS structure. Al-Kandari et al., [9], introduced a
strategy for accounting for uncertainty in the
residuals of the linear regression model using fuzzy
statistics. Liu and Chen, [10], improved the value
for fuzzy linear regression analysis using symmetric
triangular fuzzy numbers and the least fuzziness
criterion. Kabán, [11], provide a new analysis of
compressive least squares regression that eliminates
a false  component, where is the total
number of training points. Additionally, several
researchers have developed methods related to
linear regression. For example, linear mixed models
are used by Yi and Tang, [12], Ahn, Zhang and Lu,
[13], and multiple linear regression is used by
Uyanık and Güler, [14], Liu et al., [15], Li, He and
Liu, [16].
A linear regression model is defined by
 where is a scaling parameter, is a
location parameter, is the dependent or response
variable, is the independent or predictor variable,
and the random variable is the error term in the
model, [17]-[19]. The linear regression is carried out
under the assumption that has a normal
distribution with a mean and variance of zero and
, respectively, i.e., 󰇟󰇠 and 󰇛󰇜.
In this article, we will consider the scenario
where has an alternative distribution or when
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
66
Volume 20, 2023
󰇟󰇠, which reduces the influence of partial
observations by deviating from the regression line,
. The two novel algorithms are line-
pulling linear regression (LPR) and band-pulling
linear regression (BPR), which are presented in the
next section.
The remainder of the article is structured as
follows: the LPR and BPR algorithms are
introduced in Section 2. Section 3 describes a
mathematical proof of some property. Next, the
numerical results of our algorithms are illustrated
and discussed in Section 4. Section 5 shows how the
application is used to remove the pectoral muscle.
Finally, conclusions and some suggestions are
drawn and presented in Section 6.
2 Description of LPR and BPR
We introduce two novel data-fitting algorithms:
line-pulling linear regression (LPR) and band-
pulling linear regression (BPR). These algorithms
are defined as follows:
Let 󰇛󰇜󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇦
be an initial data such that are distinct and
. The procedure for the LPR
algorithm is the following. Set 
󰇛󰇜󰇛󰇜.
(i) Consider the  iteration, . We get
the linear regression 󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜 of
the data 
󰇛󰇜, where
󰇛󰇜󰇛󰇜
󰇛󰇜 󰇡󰇛󰇜󰇛󰇜󰇢2

Denote 󰇛󰇜󰇛󰇜󰇛󰇜12.
(ii) Define a band
󰇛󰇜󰇥󰇛󰇜󰇻󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇦
where 󰇛󰇜
󰇥󰇛󰇜󰇛󰇜󰇦, and
󰇛󰇜
󰇥󰇛󰇜󰇛󰇜󰇦.
(iii) Update the data 
󰇛󰇜 given by

󰇛󰇜󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇦,
such that for each ,
󰇛󰇜󰇛󰇜 󰇛󰇜󰇛󰇜
󰇛󰇜 󰇛󰇜󰇛󰇜
(1)
(iv) Return to step 1 to repeat until the error values
󰇡󰇛󰇜󰇛󰇜󰇛󰇜󰇢

where is a fixed value.
Following that, we shall introduce the BPR
algorithm. Set 
󰇛󰇜󰇛󰇜. The steps of the
algorithm are defined the same as the LPR
algorithm, except for step (iii), replaced by step
(iii*) as follows.
(iii*) Update the data 
󰇛󰇜 given by

󰇛󰇜󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇦,
such that for each ,
󰇛󰇜
󰇛󰇜󰇛󰇜 󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜 󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜 󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜
(2)
The two above algorithms are different for
updating data. The LPR algorithm is used to pull the
points outside the band toward the regression line,
but the BPR algorithm is used to pull those points
toward the boundary of the band as illustrated in
Fig. 1.
(a) LPR
(b) BPR
Fig. 1: The proposed algorithm.
The following is an example of how to
understand our algorithm.
Example 1. Let . We take
where and 󰇛󰇜 is a uniform
noise.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
67
Volume 20, 2023
We obtain the initial data for our example,
󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞
(see, Fig. 2).
In the first iteration, we obtain the linear least
square regression:
󰇛󰇜󰇛󰇜
The points 󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜 that lie on the regression 󰇛󰇜󰇛󰇜 and
󰇛󰇜󰇝󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜
󰇛󰇜󰇞
󰇛󰇜󰇝󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜
󰇛󰇜󰇞
Then, 󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜 is shown in
Fig. 3.
Fig. 2: Initial data.
Fig. 3: The band 󰇛󰇜 for the example.
In Fig. 3, we found that points 󰇛󰇜󰇛󰇜
and 󰇛󰇜 are outside of the band 󰇛󰇜. From
equations (1) and (2), we will update the data as
follows:
LPR: We get the range of 
󰇛󰇜 as 󰇥󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜󰇦 and obtain the updated data,

󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞
(see, Fig. 4(a)).
BPR: We get the range of 
󰇛󰇜 as 󰇥󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇦 and obtain
the updated data,

󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞
(see, Fig. 5(a)).
The results of the LPR and BPR algorithms are
shown in Tables 1 and 2, respectively. In the 
iteration for LPR and the  iteration for BPR, we
obtain the regressions 󰇛󰇜 and
󰇛󰇜, respectively, for 
, shown in Fig. 4(b) and Fig. 5(b).
(a) the updated data for the  iteration
(b) the data fitting for  iteration
Fig. 4: The result for the LPR algorithm.
(a) the updated data for the  iteration
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
68
Volume 20, 2023
Fig. 5: The result for the BPR algorithm.
Table 1. The results of the LPR algorithm for 󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞.
󰇛󰇜󰇛󰇜󰇛󰇜
Data
Error
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
0





1








2








3








4








5








6



Table 2. The results of the BPR algorithm for 󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞.
󰇛󰇜󰇛󰇜󰇛󰇜
Data
Error
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
0





1








2









10








11



3 Main Results
Definition 1. Let 󰇟󰇠, 󰇛󰇜
󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇦 be an initial
data such that are distinct. The sum square error
(SSE) of  iteration for LPR with respect to
󰇛󰇜 is given by

󰇛󰇜󰇛󰇜󰇡󰇛󰇜󰇛󰇜󰇛󰇜󰇢

(3)
where
󰇛󰇜󰇛󰇜
󰇛󰇜 󰇛󰇜󰇛󰇜2
 ,
and 󰇛󰇜 is an updated value as mentioned in LPR.
Similarly, the SSE of  iteration for BPR with
respect to 󰇛󰇜 is given by

󰇛󰇜󰇛󰇜󰇡󰇛󰇜󰇛󰇜󰇛󰇜󰇢

(4)
where
󰇛󰇜󰇛󰇜
󰇛󰇜 󰇛󰇜󰇛󰇜2
 ,
and 󰇛󰇜 is an updated value as mentioned in BPR.
In specific case, . We observe that
󰇛󰇜󰇛󰇜, i.e., 󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜.
That is, each point in 󰇛󰇜 is pulled into the line
󰇛󰇜󰇛󰇜. Therefore, 
󰇛󰇜󰇛󰇜
and 
󰇛󰇜󰇛󰇜, .
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
69
Volume 20, 2023
In the opposite case, . We then get
󰇛󰇜
󰇥󰇛󰇜󰇛󰇜󰇦 and 󰇛󰇜
󰇥󰇛󰇜
󰇛󰇜󰇦. It means that all points are in 󰇛󰇜, i.e.,
󰇛󰇜󰇛󰇜. Therefore, 
󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜 and 
󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜, .
From Tables 1 and 2, we observe that the sum
square errors are decreasing for both algorithms.
This leads to the following property.
Lemma 1. Let 󰇛󰇜󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢
󰇡󰇛󰇜󰇢󰇦 be an initial data such that are
distinct. If 󰇟󰇠, the sequences

󰇛󰇜󰇛󰇜 and 
󰇛󰇜󰇛󰇜 are
decreasing on .
Proof. For the  iteration where 
Let 󰇛󰇜󰇛󰇜󰇛󰇜12, such that,
󰇛󰇜󰇛󰇜
󰇛󰇜 󰇛󰇜󰇛󰇜2

Case LPR: From equation (1), we get
󰇛󰇜󰇱󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜
󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜
Thus,
󰇡󰇛󰇜󰇛󰇜󰇢
󰇱󰇡󰇛󰇜󰇛󰇜󰇢 󰇡󰇛󰇜󰇢󰇛󰇜
 󰇡󰇛󰇜󰇢󰇛󰇜
󰇡󰇛󰇜󰇛󰇜󰇢
Next, consider equation (3),

󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜


󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜

󰇡󰇛󰇜󰇛󰇜󰇢

󰇡󰇛󰇜󰇛󰇜󰇢


󰇛󰇜󰇛󰇜.
Case BPR: From equation (2), we get
󰇛󰇜
󰇛󰇜󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜
󰇛󰇜󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜
Thus,
󰇡󰇛󰇜󰇛󰇜󰇢
󰇛󰇜 󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜 󰇛󰇜󰇛󰇜
󰇛󰇜 󰇛󰇜󰇛󰇜󰇛󰇜
󰇡󰇛󰇜󰇛󰇜󰇢
Next, consider equation (4),

󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜


󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜

󰇡󰇛󰇜󰇛󰇜󰇢

󰇡󰇛󰇜󰇛󰇜󰇢


󰇛󰇜󰇛󰇜.
This completes the proof.
From equations (3) and (4), it is obvious that the
sequences 
󰇛󰇜󰇛󰇜 and

󰇛󰇜󰇛󰇜 are nonnegative sequences,
that is, they have the lower bound to be zero, and,
by Lemma 1, they are monotone decreasing, this
leads to the following theorem.
Theorem 2. Let 󰇛󰇜 be an initial data. If
󰇟󰇠, the sequences 
󰇛󰇜󰇛󰇜 and

󰇛󰇜󰇛󰇜 are convergent on .
4 Numerical Examples
This section shows the numerical example using the
initial data generated from the linear function
and noises (uniform distribution 󰇛󰇜,
normal distribution 󰇛󰇜, and gamma distribution
󰇛󰇜 *). We also use various values of and
to compare the root mean square error and the
number of iterations of the proposed algorithm.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
70
Volume 20, 2023
In Table 3, we generate by the equation
󰇛󰇜
where is a generated noise, and .
Thus, the root mean square error (RMSE) is given
by
󰇛󰇜
󰇡󰇛󰇜󰇢


where 󰇛󰇜 is obtained from the linear least square
of  iteration of the proposed algorithms. We
summarise the result in Table 3 as follows.
󰇛󰇜, the LPR algorithm with
and  has the minimum value of
RMSE with  iteration.
󰇛󰇜, the BPR algorithm with
 and  has the minimum
value of RMSE with  iteration.
󰇛󰇜, the LPR algorithm with
 and  has the minimum value
of RMSE with  iteration.
󰇛󰇜, the BPR algorithm with
 and  has the minimum
value of RMSE with  iteration.
󰇛󰇜, the LPR algorithm with
 and has the minimum value of
RMSE with  iteration.
󰇛󰇜, the LPR and BPR algorithms
with and have the minimum
value of RMSE with  iteration.
Table 3. The root mean square error and the number of iterations of the LPR and BPR.
Noises
Algorithms
Proportions











.25




󰇛󰇜
LPR
3.555
(2)
3.623
(5)
3.627
(10)
3.663
(17)
8.414
(64)
8.414
(64)
4.259
(7)
3.966
(11)
0.144
(58)
0.143
(59)
2.860
(9)
3.198
(11)
BPR
3.555
(2)
3.649
(8)
3.651
(14)
3.644
(31)
8.414
(64)
8.410
(88)
5.022
(20)
4.409
(25)
0.144
(58)
0.146
(77)
2.133
(19)
2.758
(23)
󰇛󰇜
LPR
1.563
(2)
1.562
(5)
1.509
(9)
1.269
(14)
5.819
(74)
5.819
(74)
2.202
(7)
1.676
(10)
2.357
(88)
2.357
(88)
0.401
(9)
0.889
(9)
BPR
1.563
(2)
1.511
(8)
1.464
(14)
1.425
(34)
5.819
(74)
5.815
(93)
3.266
(16)
2.606
(22)
2.357
(88)
2.330
(122)
0.174
(15)
0.337
(19)
󰇛󰇜
LPR
0.372
(2)
0.352
(5)
0.370
(7)
0.323
(11)
3.268
(65)
3.264
(67)
0.906
(8)
0.592
(12)
4.367
(88)
4.367
(88)
0.880
(10)
0.482
(11)
BPR
0.372
(2)
0.374
(8)
0.378
(14)
0.362
(29)
3.268
(65)
3.271
(96)
1.545
(17)
0.886
(20)
4.367
(88)
4.351
(117)
2.170
(19)
1.419
(21)
󰇛󰇜
LPR
1.698
(2)
1.698
(5)
1.769
(8)
1.656
(15)
1.694
(55)
1.694
(55)
1.270
(9)
1.584
(11)
6.304
(63)
6.304
(64)
2.007
(9)
1.829
(11)
BPR
1.698
(2)
1.721
(8)
1.701
(13)
1.673
(29)
1.694
(55)
1.692
(72)
0.203
(17)
0.718
(23)
6.304
(63)
6.299
(87)
3.346
(18)
2.729
(21)
󰇛󰇜
LPR
4.813
(2)
4.804
(5)
4.825
(9)
4.940
(15)
0.027
(71)
0.026
(71)
4.293
(9)
4.383
(9)
9.082
(98)
9.082
(98)
5.738
(9)
5.290
(13)
BPR
4.813
(2)
4.803
(8)
4.839
(14)
4.914
(32)
0.027
(71)
0.075
(103)
3.110
(19)
3.765
(21)
9.082
(98)
9.067
(132)
6.477
(18)
5.890
(21)
󰇛󰇜
LPR
4.073
(2)
4.074
(5)
4.107
(8)
4.053
(14)
5.670
(52)
5.670
(52)
4.356
(5)
4.183
(10)
2.360
(52)
2.360
(52)
3.724
(8)
3.954
(10)
BPR
4.073
(2)
4.074
(7)
4.069
(12)
4.077
(26)
5.670
(52)
5.668
(70)
4.708
(15)
4.471
(19)
2.360
(52)
2.362
(69)
3.340
(16)
3.652
(19)
󰇛󰇜
LPR
1.235
(2)
1.186
(5)
1.058
(6)
1.040
(12)
2.459
(44)
2.460
(45)
1.295
(9)
1.207
(9)
0.877
(88)
0.874
(88)
0.863
(7)
0.903
(10)
BPR
1.235
(2)
1.279
(7)
1.172
(13)
1.176
(27)
2.459
(44)
2.456
(58)
1.741
(17)
1.469
(19)
0.877
(88)
0.937
(123)
0.621
(16)
0.839
(20)
󰇛󰇜
LPR
1.082
(2)
1.078
(5)
1.108
(8)
1.079
(13)
1.200
(57)
1.201
(58)
0.702
(9)
0.915
(10)
3.013
(53)
3.014
(54)
1.396
(7)
1.317
(9)
BPR
1.082
(2)
1.061
(7)
1.065
(12)
1.067
(26)
1.200
(57)
1.198
(76)
0.362
(18)
0.598
(21)
3.013
(53)
3.011
(69)
1.767
(17)
1.492
(19)
󰇛󰇜
LPR
3.899
(2)
3.906
(6)
3.927
(8)
4.095
(14)
2.007
(65)
2.009
(66)
3.640
(9)
3.793
(10)
6.256
(57)
6.255
(57)
4.422
(7)
4.274
(12)
BPR
3.899
(2)
3.946
(7)
3.972
(13)
3.990
(27)
2.007
(65)
2.014
(87)
3.185
(17)
3.520
(22)
6.256
(57)
6.253
(79)
4.704
(19)
4.476
(21)
󰇛󰇜
LPR
0.788
(2)
0.788
(4)
0.788
(6)
0.739
(10)
0.023
(33)
0.022
(34)
0.637
(7)
0.693
(9)
1.662
(41)
1.663
(42)
0.873
(6)
0.828
(10)
BPR
0.788
(2)
0.782
(7)
0.776
(13)
0.777
(26)
0.023
(33)
0.025
(44)
0.389
(13)
0.528
(19)
1.662
(41)
1.659
(54)
1.175
(13)
1.022
(19)
󰇛󰇜
LPR
2.827
(2)
2.824
(5)
2.877
(10)
2.965
(13)
0.299
(72)
0.306
(72)
2.346
(7)
2.568
(9)
5.697
(59)
5.697
(59)
3.465
(8)
3.186
(12)
BPR
2.827
(2)
2.863
(8)
2.875
(14)
2.865
(32)
0.299
(72)
0.332
(97)
1.809
(17)
2.172
(20)
5.697
(59)
5.693
(80)
4.176
(20)
3.683
(23)
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
71
Volume 20, 2023
* We random noises by using MATLAB program version R2017a, the codes for 󰇛󰇜, 󰇛󰇜, and 󰇛󰇜 are
random('unif',a,b,m,n), random('norm',a,b,m,n), and random('gam',a,b,m,n), respectively, where is a size of random noises.
󰇛󰇜, the BPR algorithm with
 and  has the minimum value
of RMSE with  iteration.
󰇛󰇜, the BPR algorithm with
 and  has the minimum value
of RMSE with  iteration.
󰇛󰇜, the LPR and BPR algorithms
with and have the minimum
value of RMSE with  iteration.
󰇛󰇜, the LPR algorithm with
 and has the minimum value of
RMSE with  iteration.
󰇛󰇜, the LPR and BPR algorithms
with and have the minimum
value of RMSE with  iteration.
We observe that the number of iterations of the
BPR algorithm is always greater than or equal to
that of the LPR algorithm. Next, we discuss the
values of and . In the case , the
regression line located in the centre of the point.
This designation is suitable for noise having a mean
of zero. When , the bandwidth above the
regression line is narrower than that below. The
points that are pulled down are more than those that
are pulled up. As a result, the regression line will
gradually drop in the next iteration. This designation
is suitable for noise with a mean of more than zero.
When , the result is the opposite of the
previous case. As a result, the regression line will
gradually rise in the next iteration. This designation
is appropriate for noise having a mean less than
zero.
Furthermore, when every noise is positive, such
as when is derived from 󰇛󰇜, 󰇛󰇜, or
󰇛󰇜, the appropriate is . On the other
hand, if every noise is negative, the appropriate is
.
The following figures are some examples from
Table 3. Fig. 6 shows the regression of the original
function with 󰇛󰇜, 󰇛󰇜 and
󰇛󰇜, respectively. The black asterisks
on the black line are the points 󰇛󰇜 of the original
function, the pink points are in 󰇛󰇜, and the red,
green, and blue lines are the regression lines of
linear regression, LPR, and BPR, respectively.
(a) 󰇛󰇜 and 
(b) 󰇛󰇜 and 
(c) 󰇛󰇜 and 
Fig. 6: The regression line with noise.
5 Applications
In general, the data points used to create the
regression line have noise from the beginning. As a
result, we cannot know whether those noises are
positive or negative, making it impossible to choose
a suitable -value. Therefore, the selection of
must be determined depending on the desired
outcome. For example, if the mean is to be used to
represent the data, and should be equal. If the
regression line is lower than the total data,
and should be set. Similarly, set
and when we need a regression line over all
data points.
This section will demonstrate how the proposed
algorithm can be used to analyse mammograms. In
the mediolateral-oblique (MLO) view, the existence
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
72
Volume 20, 2023
of the pectoral muscle may mislead the diagnosis of
cancer due to its high-level similarity to the breast
body. Therefore, we cut the pectoral muscle part and
then employ LPR or BPR to smooth the boundary.
We separate the area of the pectoral muscle and
breast using the difference in intensity of a
mammogram. We have its border and transform to
the Cartesian coordinate, which are referred to as
the connection points and defined as 󰇛󰇜. Fig. 7
depicts an example of a mammogram and shows the
connection points 󰇛󰇜 of the mammogram, with the
area above the points representing the pectoral part
and the area below representing the muscle part.
Since we want to remove only the muscle part, the
regression shall be below all points. We then define
.
The value of should be in the range 󰇟󰇜; we
specify  for this example. Fig. 8 shows the
regression lines derived from linear regression,
LPR, and BPR using red, green, and blue,
respectively. We get the regression
 from linear regression,
 from LPR in 
iteration, and  from BPR
in  iteration. Moreover, the mammograms
after removing the pectoral muscle using linear
regression, LPR, and BPR, respectively, are shown
in Fig. 9.
The images obtained by finding the regression
line by the LPR and BPR algorithms are comparable
to and better than those obtained by the linear
regression.
(a) the original
mammogram
(b) the connection
points plot on the
mammogram
(c) the connection points plot on a graph
Fig. 7: An example of a mammogram.
Fig. 8: The regression lines.
(a) linear regression
(b) LPR
(c) BPR
Fig. 9: The mammograms after removing the muscle
part by different algorithms.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
73
Volume 20, 2023
6 Conclusions and Discussions
In this paper, we proposed the algorithm to create a
regression line from an original function with noise
, where is not necessarily a normal distribution
with a mean of zero, called the line-pulling linear
regression (LPR) and the band-pulling linear
regression (BPR). These algorithms can set the
regression line to the centre, top, or bottom of data
points by assigning values and . If
, the LPR and BPR algorithms provide the same
regression lines as linear regression. When ,
the resulting line is below the linear regression. And
when , the resulting line is above the linear
regression. However, since we do not know the
noise distribution in the data, we determine the
value of and based on user requirements.
The numerical examples show that the results of
the LPR and BPR algorithms are similar. The
noticeable difference is the number of iterations for
which the LPR algorithm converges faster than the
BPR algorithm.
The application of these algorithms is the
smoothing of the pectoral muscle’s boundary. We
use  and to create the regression
line at the bottom of all data points, ensuring we
remove only the muscle part.
In addition, the LPR and BPR algorithms can be
extended to more complicated models, such as using
quadratic or cubic polynomial equations rather than
a linear equation, which is expected to bring greater
application benefits.
Acknowledgement:
The first author would like to express gratitude to
the Science Achievement Scholarship of Thailand
(SAST) for financial assistance for this paper. This
research is supported by Department of
Mathematics, Faculty of Science, Khon Kaen
University, Fiscal Year 2022.
References:
[1] S. M. Stigler, The history of statistics: The
measurement of uncertainty before 1900,
Harvard University Press, 1986.
[2] D. C. Montgomery, E. A. Peck, G. G. Vining,
Introduction to linear regression analysis,
John Wiley & Sons, 2021.
[3] X. Su, X. Yan, C. Tsai, Linear regression,
WIREs Computational Statistics, Vol.4, No.3,
2012, pp. 275-294.
[4] W. Yao, L. Li, A New Regression Model:
Modal Linear Regression, Scandinavian
Journal of Statistics, Vol.41, 2014, pp. 656-
671.
[5] K. H. Zou, K. Tuncali, S. G. Silverman,
Correlation and simple linear regression,
Radiology, Vol.227, No.3, 2003, pp. 617-628.
[6] D. Maulud, A. M. Abdulazeez, A review on
linear regression comprehensive in machine
learning, Journal of Applied Science and
Technology Trends, Vol.1, No.4, 2020,
pp.140-147.
[7] L. Pérez-Domínguez, H. Garg, D. Luviano-
Cruz, J.L. García Alcaraz, Estimation of
Linear Regression with the Dimensional
Analysis Method, Mathematics, Vol.10,
No.10, 2022, pp. 1645.
[8] S. Jokubaitis, R. Leipus, Asymptotic
normality in linear regression with
approximately sparse structure, Mathematics,
Vol.10, No.10, 2022, pp. 1657.
[9] M. Al-Kandari, K. Adjenughwure, K.
Papadopoulos, A Fuzzy-Statistical Tolerance
Interval from Residuals of Crisp Linear
Regression Models, Mathematics, Vol.8,
No.9, 2020, pp. 1422.
[10] X. Liu, Y. Chen, A systematic approach to
optimizing value for fuzzy linear regression
with symmetric triangular fuzzy numbers,
Mathematical Problems in Engineering,
Vol.2013, 2013.
[11] A. Kabán, New bounds on compressive linear
least squares regression, Artificial intelligence
and statistics, 2014, pp. 448-456.
[12] J. Yi, N. Tang, Variational Bayesian inference
in high-dimensional linear mixed models,
Mathematics, Vol.10, No.3, 2022, pp. 463.
[13] M. Ahn, H. H. Zhang, W. Lu, Moment-based
method for random effects selection in linear
mixed models, Statistica Sinica, Vol.22, No.4,
2012, pp. 1539.
[14] G. K. Uyanık, N. Güler, A Study on Multiple
Linear Regression Analysis, Procedia - Social
and Behavioral Sciences, Vol.106, 2013, pp.
234-240.
[15] M. Liu, S. Hu, Y. Ge, G. B. Heuvelink, Z.
Ren, X. Huang, Using multiple linear
regression and random forests to identify
spatial poverty determinants in rural China,
Spatial Statistics, Vol.42, 2021, pp. 100461.
[16] Y. Li, X. He, X. Liu, Fuzzy multiple linear
least squares regression analysis, Fuzzy Sets
and Systems, 2022.
[17] S. Weisberg, Applied Linear Regression, 4th
editio, 2014.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
74
Volume 20, 2023
[18] M. S. Paolella, Linear models and time-series
analysis: regression, ANOVA, ARMA and
GARCH, John Wiley & Sons, 2018.
[19] A. C. Rencher, G.B. Schaalje, Linear models
in statistics, John Wiley & Sons, 2008.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
-Nahatai Tepkasetkul carried out the
conceptualization, investigation, methodology,
software, validation, visualization, writing - original
draft, and writing - review & editing.
-Weenakorn Ieosanurak carried out the supervision,
validation, and writing - review & editing.
-Thanapong Intharah carried out the
conceptualization, supervision, validation, and
writing - review & editing.
-Watcharin Klongdee carried out the
conceptualization, investigation, methodology,
supervision, validation, visualization, writing -
original draft, and writing - review & editing.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.9
Nahatai Tepkasetkul, Weenakorn Ieosanurak,
Thanapong Intharah, Watcharin Klongdee
E-ISSN: 2224-3402
75
Volume 20, 2023
The first author would like to express gratitude to
the Science Achievement Scholarship of Thailand
(SAST) for financial assistance for this paper. This
research is supported by Department of
Mathematics, Faculty of Science, Khon Kaen
University, Fiscal Year 2022.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
Conflict of Interest
The authors have no conflicts of interest to declare
that are relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US