On the Pulling Linear Regression and Its Applications in Digital

Mammograms

NAHATAI TEPKASETKUL1, WEENAKORN IEOSANURAK1,

THANAPONG INTHARAH2, WATCHARIN KLONGDEE1

1Department of Mathematics, Faculty of Science, Khon Kaen University,

THAILAND

2Department of Statistics, Faculty of Science, Khon Kaen University,

THAILAND

Abstract: - Regression analysis is a statistical approach used to investigate the correlations between variables,

especially linear regression, that is a simple but effective approach for analyzing the relationship between a

dependent variable and one independent variable. Since it has limitations based on the assumption that the

mean of the noise should be zero, there are still some areas where it may be improved. In this article, we

introduce a novel data fitting algorithm called the pulling linear regression, which is separated into two types:

the line-pulling linear regression and the band-pulling linear regression. The method is developed from linear

regression, which can create the regression line from the function that uses noise with various distributions. The

result demonstrates that the sequence of sum square errors of the pulling linear regression is convergent.

Moreover, we have a numerical example to show that the performance of the proposed algorithm is better than

that of linear regression when the mean of the noise is not zero. And the last, we have an application to smooth

the boundary of the pectoral muscle in digital mammograms. We found that the regression line of the proposed

algorithm can do better than the linear regression when we would like to remove only the muscle part.

Key-Words: - linear regression, pulling linear regression, sum square error, root mean square error

Received: April 29, 2022. Revised: January 15, 2023. Accepted: February 9, 2023. Published: March 2, 2023.

1 Introduction

Least squares linear regression was performed by

Legendre and Gauss for the prediction of planetary

movement, [1]. Regression analysis is a statistical

technique for examining correlations between

variables used in numerous domains, including

economics, engineering, physical science, biological

science, social science, and medicine, among many

others, [2].

Linear regression (LR) is a powerful and

adaptable method for dealing with regression

difficulties. The model definition, model estimation,

statistical inference, model diagnosis, variable

selection, and prediction are described

comprehensively, [3]-[6]. Therefore, researchers are

quite interested in the trend in LR models. For

example, Pérez-Domínguez et al., [7], offered a

contribution using linear regression and applied

Dimensional Analysis (DA) to solve instability and

error problems of the data transformation. Jokubaitis

and Leipus, [8], studied the asymptotic normality in

a high-dimensional linear regression where the

covariance matrix of the regression variables has a

KMS structure. Al-Kandari et al., [9], introduced a

strategy for accounting for uncertainty in the

residuals of the linear regression model using fuzzy

statistics. Liu and Chen, [10], improved the  value

for fuzzy linear regression analysis using symmetric

triangular fuzzy numbers and the least fuzziness

criterion. Kabán, [11], provide a new analysis of

compressive least squares regression that eliminates

a false  component, where  is the total

number of training points. Additionally, several

researchers have developed methods related to

linear regression. For example, linear mixed models

are used by Yi and Tang, [12], Ahn, Zhang and Lu,

[13], and multiple linear regression is used by

Uyanık and Güler, [14], Liu et al., [15], Li, He and

Liu, [16].

A linear regression model is defined by 

 where  is a scaling parameter,  is a

location parameter,  is the dependent or response

variable,  is the independent or predictor variable,

and the random variable  is the error term in the

model, [17]-[19]. The linear regression is carried out

under the assumption that  has a normal

distribution with a mean and variance of zero and

, respectively, i.e., 󰇟󰇠 and 󰇛󰇜.

In this article, we will consider the scenario

where  has an alternative distribution or when

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

󰇟󰇠, which reduces the influence of partial

observations by deviating from the regression line,

. The two novel algorithms are line-

pulling linear regression (LPR) and band-pulling

linear regression (BPR), which are presented in the

next section.

The remainder of the article is structured as

follows: the LPR and BPR algorithms are

introduced in Section 2. Section 3 describes a

mathematical proof of some property. Next, the

numerical results of our algorithms are illustrated

and discussed in Section 4. Section 5 shows how the

application is used to remove the pectoral muscle.

Finally, conclusions and some suggestions are

drawn and presented in Section 6.

2 Description of LPR and BPR

We introduce two novel data-fitting algorithms:

line-pulling linear regression (LPR) and band-

pulling linear regression (BPR). These algorithms

are defined as follows:

Let 󰇛󰇜󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇦

be an initial data such that  are distinct and

. The procedure for the LPR

algorithm is the following. Set 

󰇛󰇜󰇛󰇜.

(i) Consider the  iteration, . We get

the linear regression 󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜 of

the data 

󰇛󰇜, where

󰇛󰇜󰇛󰇜

󰇛󰇜 󰇡󰇛󰇜󰇛󰇜󰇢2



 

Denote 󰇛󰇜󰇛󰇜󰇛󰇜12.

(ii) Define a band

󰇛󰇜󰇥󰇛󰇜󰇻󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇦

where 󰇛󰇜

󰇥󰇛󰇜󰇛󰇜󰇦, and

󰇛󰇜

󰇥󰇛󰇜󰇛󰇜󰇦.

(iii) Update the data 

󰇛󰇜 given by



󰇛󰇜󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇦,

such that for each ,

󰇛󰇜󰇛󰇜 󰇛󰇜󰇛󰇜

󰇛󰇜 󰇛󰇜󰇛󰇜

(1)

(iv) Return to step 1 to repeat until the error values

󰇡󰇛󰇜󰇛󰇜󰇛󰇜󰇢



 

where  is a fixed value.

Following that, we shall introduce the BPR

algorithm. Set 

󰇛󰇜󰇛󰇜. The steps of the

algorithm are defined the same as the LPR

algorithm, except for step (iii), replaced by step

(iii*) as follows.

(iii*) Update the data 

󰇛󰇜 given by



󰇛󰇜󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇦,

such that for each ,

󰇛󰇜











󰇛󰇜󰇛󰇜 󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜 󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜 󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜

(2)

The two above algorithms are different for

updating data. The LPR algorithm is used to pull the

points outside the band toward the regression line,

but the BPR algorithm is used to pull those points

toward the boundary of the band as illustrated in

Fig. 1.

(a) LPR

(b) BPR

Fig. 1: The proposed algorithm.

The following is an example of how to

understand our algorithm.

Example 1. Let . We take 

 where  and 󰇛󰇜 is a uniform

noise.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

We obtain the initial data for our example,

󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞

(see, Fig. 2).

In the first iteration, we obtain the linear least

square regression:

󰇛󰇜󰇛󰇜

The points 󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜 that lie on the regression 󰇛󰇜󰇛󰇜 and

󰇛󰇜󰇝󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜

󰇛󰇜󰇞

󰇛󰇜󰇝󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜

󰇛󰇜󰇞

Then, 󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜 is shown in

Fig. 3.

Fig. 2: Initial data.

Fig. 3: The band 󰇛󰇜 for the example.

In Fig. 3, we found that points 󰇛󰇜󰇛󰇜

and 󰇛󰇜 are outside of the band 󰇛󰇜. From

equations (1) and (2), we will update the data as

follows:

LPR: We get the range of 

󰇛󰇜 as 󰇥󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜󰇦 and obtain the updated data,



󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞

(see, Fig. 4(a)).

BPR: We get the range of 

󰇛󰇜 as 󰇥󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇦 and obtain

the updated data,



󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞

(see, Fig. 5(a)).

The results of the LPR and BPR algorithms are

shown in Tables 1 and 2, respectively. In the 

iteration for LPR and the  iteration for BPR, we

obtain the regressions 󰇛󰇜 and

󰇛󰇜, respectively, for 

, shown in Fig. 4(b) and Fig. 5(b).

(a) the updated data for the  iteration

(b) the data fitting for  iteration

Fig. 4: The result for the LPR algorithm.

(a) the updated data for the  iteration

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

(b) the data fitting for  iteration

Fig. 5: The result for the BPR algorithm.

Table 1. The results of the LPR algorithm for 󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞.



󰇛󰇜󰇛󰇜󰇛󰇜

Data

Error

󰇛󰇜

󰇛󰇜

󰇛󰇜

󰇛󰇜

󰇛󰇜

󰇛󰇜

󰇛󰇜

































































































Table 2. The results of the BPR algorithm for 󰇛󰇜󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇞.



󰇛󰇜󰇛󰇜󰇛󰇜

Data

Error

󰇛󰇜

󰇛󰇜

󰇛󰇜

󰇛󰇜

󰇛󰇜

󰇛󰇜

󰇛󰇜





































































3 Main Results

Definition 1. Let 󰇟󰇠, 󰇛󰇜

󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢󰇦 be an initial

data such that  are distinct. The sum square error

(SSE) of  iteration for LPR with respect to

󰇛󰇜 is given by



󰇛󰇜󰇛󰇜󰇡󰇛󰇜󰇛󰇜󰇛󰇜󰇢



 

(3)

where

󰇛󰇜󰇛󰇜

󰇛󰇜 󰇛󰇜󰇛󰇜2



 ,

and 󰇛󰇜 is an updated value as mentioned in LPR.

Similarly, the SSE of  iteration for BPR with

respect to 󰇛󰇜 is given by



󰇛󰇜󰇛󰇜󰇡󰇛󰇜󰇛󰇜󰇛󰇜󰇢



 

(4)

where

󰇛󰇜󰇛󰇜

󰇛󰇜 󰇛󰇜󰇛󰇜2



 ,

and 󰇛󰇜 is an updated value as mentioned in BPR.

In specific case, . We observe that

󰇛󰇜󰇛󰇜, i.e., 󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜.

That is, each point in 󰇛󰇜 is pulled into the line

󰇛󰇜󰇛󰇜. Therefore, 

󰇛󰇜󰇛󰇜

and 

󰇛󰇜󰇛󰇜, .

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

In the opposite case, . We then get

󰇛󰇜

󰇥󰇛󰇜󰇛󰇜󰇦 and 󰇛󰇜

󰇥󰇛󰇜

󰇛󰇜󰇦. It means that all points are in 󰇛󰇜, i.e.,

󰇛󰇜󰇛󰇜. Therefore, 

󰇛󰇜󰇛󰇜



󰇛󰇜󰇛󰇜 and 

󰇛󰇜󰇛󰇜



󰇛󰇜󰇛󰇜, .

From Tables 1 and 2, we observe that the sum

square errors are decreasing for both algorithms.

This leads to the following property.

Lemma 1. Let 󰇛󰇜󰇥󰇡󰇛󰇜󰇢󰇡󰇛󰇜󰇢

󰇡󰇛󰇜󰇢󰇦 be an initial data such that  are

distinct. If 󰇟󰇠, the sequences



󰇛󰇜󰇛󰇜 and 

󰇛󰇜󰇛󰇜 are

decreasing on .

Proof. For the  iteration where 

Let 󰇛󰇜󰇛󰇜󰇛󰇜12, such that,

󰇛󰇜󰇛󰇜

󰇛󰇜 󰇛󰇜󰇛󰇜2





Case LPR: From equation (1), we get

󰇛󰇜󰇱󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜

󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜

Thus,

󰇡󰇛󰇜󰇛󰇜󰇢

󰇱󰇡󰇛󰇜󰇛󰇜󰇢 󰇡󰇛󰇜󰇢󰇛󰇜

 󰇡󰇛󰇜󰇢󰇛󰇜

󰇡󰇛󰇜󰇛󰇜󰇢

Next, consider equation (3),



󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜







󰇛󰇜󰇛󰇜󰇛󰇜





󰇛󰇜󰇛󰇜󰇛󰇜





󰇡󰇛󰇜󰇛󰇜󰇢





󰇡󰇛󰇜󰇛󰇜󰇢







󰇛󰇜󰇛󰇜.

Case BPR: From equation (2), we get

󰇛󰇜











󰇛󰇜󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜

󰇛󰇜󰇛󰇜 󰇡󰇛󰇜󰇢󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜

Thus,

󰇡󰇛󰇜󰇛󰇜󰇢













󰇛󰇜 󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜 󰇛󰇜󰇛󰇜

󰇛󰇜 󰇛󰇜󰇛󰇜󰇛󰇜

󰇡󰇛󰇜󰇛󰇜󰇢

Next, consider equation (4),



󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜







󰇛󰇜󰇛󰇜󰇛󰇜





󰇛󰇜󰇛󰇜󰇛󰇜





󰇡󰇛󰇜󰇛󰇜󰇢





󰇡󰇛󰇜󰇛󰇜󰇢







󰇛󰇜󰇛󰇜.

This completes the proof. 

From equations (3) and (4), it is obvious that the

sequences 

󰇛󰇜󰇛󰇜 and



󰇛󰇜󰇛󰇜 are nonnegative sequences,

that is, they have the lower bound to be zero, and,

by Lemma 1, they are monotone decreasing, this

leads to the following theorem.

Theorem 2. Let 󰇛󰇜 be an initial data. If 

󰇟󰇠, the sequences 

󰇛󰇜󰇛󰇜 and



󰇛󰇜󰇛󰇜 are convergent on .

4 Numerical Examples

This section shows the numerical example using the

initial data generated from the linear function 

 and noises (uniform distribution 󰇛󰇜,

normal distribution 󰇛󰇜, and gamma distribution

󰇛󰇜 *). We also use various values of  and 

to compare the root mean square error and the

number of iterations of the proposed algorithm.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

In Table 3, we generate  by the equation

󰇛󰇜

where  is a generated noise, and .

Thus, the root mean square error (RMSE) is given

󰇛󰇜

󰇡󰇛󰇜󰇢



 

where 󰇛󰇜 is obtained from the linear least square

of  iteration of the proposed algorithms. We

summarise the result in Table 3 as follows.

 󰇛󰇜, the LPR algorithm with 

 and  has the minimum value of

RMSE with  iteration.

 󰇛󰇜, the BPR algorithm with

 and  has the minimum

value of RMSE with  iteration.

 󰇛󰇜, the LPR algorithm with 

 and  has the minimum value

of RMSE with  iteration.

 󰇛󰇜, the BPR algorithm with

 and  has the minimum

value of RMSE with  iteration.

 󰇛󰇜, the LPR algorithm with 

 and  has the minimum value of

RMSE with  iteration.

 󰇛󰇜, the LPR and BPR algorithms

with  and  have the minimum

value of RMSE with  iteration.

Table 3. The root mean square error and the number of iterations of the LPR and BPR.

Noises

Algorithms

Proportions







































.25









󰇛󰇜

LPR

3.555

(2)

3.623

(5)

3.627

(10)

3.663

(17)

8.414

(64)

8.414

(64)

4.259

(7)

3.966

(11)

0.144

(58)

0.143

(59)

2.860

(9)

3.198

(11)

BPR

3.555

(2)

3.649

(8)

3.651

(14)

3.644

(31)

8.414

(64)

8.410

(88)

5.022

(20)

4.409

(25)

0.144

(58)

0.146

(77)

2.133

(19)

2.758

(23)

󰇛󰇜

LPR

1.563

(2)

1.562

(5)

1.509

(9)

1.269

(14)

5.819

(74)

5.819

(74)

2.202

(7)

1.676

(10)

2.357

(88)

2.357

(88)

0.401

(9)

0.889

(9)

BPR

1.563

(2)

1.511

(8)

1.464

(14)

1.425

(34)

5.819

(74)

5.815

(93)

3.266

(16)

2.606

(22)

2.357

(88)

2.330

(122)

0.174

(15)

0.337

(19)

󰇛󰇜

LPR

0.372

(2)

0.352

(5)

0.370

(7)

0.323

(11)

3.268

(65)

3.264

(67)

0.906

(8)

0.592

(12)

4.367

(88)

4.367

(88)

0.880

(10)

0.482

(11)

BPR

0.372

(2)

0.374

(8)

0.378

(14)

0.362

(29)

3.268

(65)

3.271

(96)

1.545

(17)

0.886

(20)

4.367

(88)

4.351

(117)

2.170

(19)

1.419

(21)

󰇛󰇜

LPR

1.698

(2)

1.698

(5)

1.769

(8)

1.656

(15)

1.694

(55)

1.694

(55)

1.270

(9)

1.584

(11)

6.304

(63)

6.304

(64)

2.007

(9)

1.829

(11)

BPR

1.698

(2)

1.721

(8)

1.701

(13)

1.673

(29)

1.694

(55)

1.692

(72)

0.203

(17)

0.718

(23)

6.304

(63)

6.299

(87)

3.346

(18)

2.729

(21)

󰇛󰇜

LPR

4.813

(2)

4.804

(5)

4.825

(9)

4.940

(15)

0.027

(71)

0.026

(71)

4.293

(9)

4.383

(9)

9.082

(98)

9.082

(98)

5.738

(9)

5.290

(13)

BPR

4.813

(2)

4.803

(8)

4.839

(14)

4.914

(32)

0.027

(71)

0.075

(103)

3.110

(19)

3.765

(21)

9.082

(98)

9.067

(132)

6.477

(18)

5.890

(21)

󰇛󰇜

LPR

4.073

(2)

4.074

(5)

4.107

(8)

4.053

(14)

5.670

(52)

5.670

(52)

4.356

(5)

4.183

(10)

2.360

(52)

2.360

(52)

3.724

(8)

3.954

(10)

BPR

4.073

(2)

4.074

(7)

4.069

(12)

4.077

(26)

5.670

(52)

5.668

(70)

4.708

(15)

4.471

(19)

2.360

(52)

2.362

(69)

3.340

(16)

3.652

(19)

󰇛󰇜

LPR

1.235

(2)

1.186

(5)

1.058

(6)

1.040

(12)

2.459

(44)

2.460

(45)

1.295

(9)

1.207

(9)

0.877

(88)

0.874

(88)

0.863

(7)

0.903

(10)

BPR

1.235

(2)

1.279

(7)

1.172

(13)

1.176

(27)

2.459

(44)

2.456

(58)

1.741

(17)

1.469

(19)

0.877

(88)

0.937

(123)

0.621

(16)

0.839

(20)

󰇛󰇜

LPR

1.082

(2)

1.078

(5)

1.108

(8)

1.079

(13)

1.200

(57)

1.201

(58)

0.702

(9)

0.915

(10)

3.013

(53)

3.014

(54)

1.396

(7)

1.317

(9)

BPR

1.082

(2)

1.061

(7)

1.065

(12)

1.067

(26)

1.200

(57)

1.198

(76)

0.362

(18)

0.598

(21)

3.013

(53)

3.011

(69)

1.767

(17)

1.492

(19)

󰇛󰇜

LPR

3.899

(2)

3.906

(6)

3.927

(8)

4.095

(14)

2.007

(65)

2.009

(66)

3.640

(9)

3.793

(10)

6.256

(57)

6.255

(57)

4.422

(7)

4.274

(12)

BPR

3.899

(2)

3.946

(7)

3.972

(13)

3.990

(27)

2.007

(65)

2.014

(87)

3.185

(17)

3.520

(22)

6.256

(57)

6.253

(79)

4.704

(19)

4.476

(21)

󰇛󰇜

LPR

0.788

(2)

0.788

(4)

0.788

(6)

0.739

(10)

0.023

(33)

0.022

(34)

0.637

(7)

0.693

(9)

1.662

(41)

1.663

(42)

0.873

(6)

0.828

(10)

BPR

0.788

(2)

0.782

(7)

0.776

(13)

0.777

(26)

0.023

(33)

0.025

(44)

0.389

(13)

0.528

(19)

1.662

(41)

1.659

(54)

1.175

(13)

1.022

(19)

󰇛󰇜

LPR

2.827

(2)

2.824

(5)

2.877

(10)

2.965

(13)

0.299

(72)

0.306

(72)

2.346

(7)

2.568

(9)

5.697

(59)

5.697

(59)

3.465

(8)

3.186

(12)

BPR

2.827

(2)

2.863

(8)

2.875

(14)

2.865

(32)

0.299

(72)

0.332

(97)

1.809

(17)

2.172

(20)

5.697

(59)

5.693

(80)

4.176

(20)

3.683

(23)

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

* We random noises by using MATLAB program version R2017a, the codes for 󰇛󰇜, 󰇛󰇜, and 󰇛󰇜 are

random('unif',a,b,m,n), random('norm',a,b,m,n), and random('gam',a,b,m,n), respectively, where  is a size of random noises.

 󰇛󰇜, the BPR algorithm with 

 and  has the minimum value

of RMSE with  iteration.

 󰇛󰇜, the BPR algorithm with 

 and  has the minimum value

of RMSE with  iteration.

 󰇛󰇜, the LPR and BPR algorithms

with  and  have the minimum

value of RMSE with  iteration.

 󰇛󰇜, the LPR algorithm with 

 and  has the minimum value of

RMSE with  iteration.

 󰇛󰇜, the LPR and BPR algorithms

with  and  have the minimum

value of RMSE with  iteration.

We observe that the number of iterations of the

BPR algorithm is always greater than or equal to

that of the LPR algorithm. Next, we discuss the

values of  and . In the case , the

regression line located in the centre of the point.

This designation is suitable for noise having a mean

of zero. When , the bandwidth above the

regression line is narrower than that below. The

points that are pulled down are more than those that

are pulled up. As a result, the regression line will

gradually drop in the next iteration. This designation

is suitable for noise with a mean of more than zero.

When , the result is the opposite of the

previous case. As a result, the regression line will

gradually rise in the next iteration. This designation

is appropriate for noise having a mean less than

zero.

Furthermore, when every noise is positive, such

as when  is derived from 󰇛󰇜, 󰇛󰇜, or

󰇛󰇜, the appropriate  is . On the other

hand, if every noise is negative, the appropriate  is

.

The following figures are some examples from

Table 3. Fig. 6 shows the regression of the original

function with 󰇛󰇜, 󰇛󰇜 and

󰇛󰇜, respectively. The black asterisks

on the black line are the points 󰇛󰇜 of the original

function, the pink points are in 󰇛󰇜, and the red,

green, and blue lines are the regression lines of

linear regression, LPR, and BPR, respectively.

(a) 󰇛󰇜 and 

(b) 󰇛󰇜 and 

Fig. 6: The regression line with noise.

5 Applications

In general, the data points used to create the

regression line have noise from the beginning. As a

result, we cannot know whether those noises are

positive or negative, making it impossible to choose

a suitable -value. Therefore, the selection of 

must be determined depending on the desired

outcome. For example, if the mean is to be used to

represent the data,  and  should be equal. If the

regression line is lower than the total data, 

and  should be set. Similarly, set 

and  when we need a regression line over all

data points.

This section will demonstrate how the proposed

algorithm can be used to analyse mammograms. In

the mediolateral-oblique (MLO) view, the existence

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

of the pectoral muscle may mislead the diagnosis of

cancer due to its high-level similarity to the breast

body. Therefore, we cut the pectoral muscle part and

then employ LPR or BPR to smooth the boundary.

We separate the area of the pectoral muscle and

breast using the difference in intensity of a

mammogram. We have its border and transform to

the Cartesian coordinate, which are referred to as

the connection points and defined as 󰇛󰇜. Fig. 7

depicts an example of a mammogram and shows the

connection points 󰇛󰇜 of the mammogram, with the

area above the points representing the pectoral part

and the area below representing the muscle part.

Since we want to remove only the muscle part, the

regression shall be below all points. We then define

.

The value of  should be in the range 󰇟󰇜; we

specify  for this example. Fig. 8 shows the

regression lines derived from linear regression,

LPR, and BPR using red, green, and blue,

respectively. We get the regression 

 from linear regression, 

 from LPR in 

iteration, and  from BPR

in  iteration. Moreover, the mammograms

after removing the pectoral muscle using linear

regression, LPR, and BPR, respectively, are shown

in Fig. 9.

The images obtained by finding the regression

line by the LPR and BPR algorithms are comparable

to and better than those obtained by the linear

regression.

(a) the original

mammogram

(b) the connection

points plot on the

mammogram

Fig. 7: An example of a mammogram.

Fig. 8: The regression lines.

(a) linear regression

(b) LPR

Fig. 9: The mammograms after removing the muscle

part by different algorithms.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

6 Conclusions and Discussions

In this paper, we proposed the algorithm to create a

regression line from an original function with noise

, where  is not necessarily a normal distribution

with a mean of zero, called the line-pulling linear

regression (LPR) and the band-pulling linear

regression (BPR). These algorithms can set the

regression line to the centre, top, or bottom of data

points by assigning values  and . If 

, the LPR and BPR algorithms provide the same

regression lines as linear regression. When ,

the resulting line is below the linear regression. And

when , the resulting line is above the linear

regression. However, since we do not know the

noise distribution in the data, we determine the

value of  and  based on user requirements.

The numerical examples show that the results of

the LPR and BPR algorithms are similar. The

noticeable difference is the number of iterations for

which the LPR algorithm converges faster than the

BPR algorithm.

The application of these algorithms is the

smoothing of the pectoral muscle’s boundary. We

use  and  to create the regression

line at the bottom of all data points, ensuring we

remove only the muscle part.

In addition, the LPR and BPR algorithms can be

extended to more complicated models, such as using

quadratic or cubic polynomial equations rather than

a linear equation, which is expected to bring greater

application benefits.

Acknowledgement:

The first author would like to express gratitude to

the Science Achievement Scholarship of Thailand

(SAST) for financial assistance for this paper. This

research is supported by Department of

Mathematics, Faculty of Science, Khon Kaen

University, Fiscal Year 2022.

References:

[1] S. M. Stigler, The history of statistics: The

measurement of uncertainty before 1900,

Harvard University Press, 1986.

[2] D. C. Montgomery, E. A. Peck, G. G. Vining,

Introduction to linear regression analysis,

John Wiley & Sons, 2021.

[3] X. Su, X. Yan, C. Tsai, Linear regression,

WIREs Computational Statistics, Vol.4, No.3,

2012, pp. 275-294.

[4] W. Yao, L. Li, A New Regression Model:

Modal Linear Regression, Scandinavian

Journal of Statistics, Vol.41, 2014, pp. 656-

671.

[5] K. H. Zou, K. Tuncali, S. G. Silverman,

Correlation and simple linear regression,

Radiology, Vol.227, No.3, 2003, pp. 617-628.

[6] D. Maulud, A. M. Abdulazeez, A review on

linear regression comprehensive in machine

learning, Journal of Applied Science and

Technology Trends, Vol.1, No.4, 2020,

pp.140-147.

[7] L. Pérez-Domínguez, H. Garg, D. Luviano-

Cruz, J.L. García Alcaraz, Estimation of

Linear Regression with the Dimensional

Analysis Method, Mathematics, Vol.10,

No.10, 2022, pp. 1645.

[8] S. Jokubaitis, R. Leipus, Asymptotic

normality in linear regression with

approximately sparse structure, Mathematics,

Vol.10, No.10, 2022, pp. 1657.

[9] M. Al-Kandari, K. Adjenughwure, K.

Papadopoulos, A Fuzzy-Statistical Tolerance

Interval from Residuals of Crisp Linear

Regression Models, Mathematics, Vol.8,

No.9, 2020, pp. 1422.

[10] X. Liu, Y. Chen, A systematic approach to

optimizing  value for fuzzy linear regression

with symmetric triangular fuzzy numbers,

Mathematical Problems in Engineering,

Vol.2013, 2013.

[11] A. Kabán, New bounds on compressive linear

least squares regression, Artificial intelligence

and statistics, 2014, pp. 448-456.

[12] J. Yi, N. Tang, Variational Bayesian inference

in high-dimensional linear mixed models,

Mathematics, Vol.10, No.3, 2022, pp. 463.

[13] M. Ahn, H. H. Zhang, W. Lu, Moment-based

method for random effects selection in linear

mixed models, Statistica Sinica, Vol.22, No.4,

2012, pp. 1539.

[14] G. K. Uyanık, N. Güler, A Study on Multiple

Linear Regression Analysis, Procedia - Social

and Behavioral Sciences, Vol.106, 2013, pp.

234-240.

[15] M. Liu, S. Hu, Y. Ge, G. B. Heuvelink, Z.

Ren, X. Huang, Using multiple linear

regression and random forests to identify

spatial poverty determinants in rural China,

Spatial Statistics, Vol.42, 2021, pp. 100461.

[16] Y. Li, X. He, X. Liu, Fuzzy multiple linear

least squares regression analysis, Fuzzy Sets

and Systems, 2022.

[17] S. Weisberg, Applied Linear Regression, 4th

editio, 2014.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

[18] M. S. Paolella, Linear models and time-series

analysis: regression, ANOVA, ARMA and

GARCH, John Wiley & Sons, 2018.

[19] A. C. Rencher, G.B. Schaalje, Linear models

in statistics, John Wiley & Sons, 2008.

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

-Nahatai Tepkasetkul carried out the

conceptualization, investigation, methodology,

software, validation, visualization, writing - original

draft, and writing - review & editing.

-Weenakorn Ieosanurak carried out the supervision,

validation, and writing - review & editing.

-Thanapong Intharah carried out the

conceptualization, supervision, validation, and

writing - review & editing.

-Watcharin Klongdee carried out the

conceptualization, investigation, methodology,

supervision, validation, visualization, writing -

original draft, and writing - review & editing.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.9

Nahatai Tepkasetkul, Weenakorn Ieosanurak,

Thanapong Intharah, Watcharin Klongdee

E-ISSN: 2224-3402

Volume 20, 2023

The first author would like to express gratitude to

the Science Achievement Scholarship of Thailand

(SAST) for financial assistance for this paper. This

research is supported by Department of

Mathematics, Faculty of Science, Khon Kaen

University, Fiscal Year 2022.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US