A Classification Study in High-Dimensional Data of Linear

Discriminant Analysis and Regularized Discriminant Analysis

AUTCHA ARAVEEPORN, SOMSRI BANDITVILAI

Department of Statistics, School of Science,

King Mongkut's Institute of Technology Ladkrabang

10520, Bangkok,

THAILAND

Abstract: - The objective of this work is to compare linear discriminant analysis (LDA) and regularized

discriminant analysis (RDA) for classification in high-dimensional data. This dataset consists of the response

variable as a binary or dichotomous variable and the explanatory as a continuous variable. The LDA and RDA

methods are well-known in statistical and probabilistic learning classification. The LDA has created the

decision boundary as a linear function where the covariance of two classes is equal. Then the RDA is extended

from the LDA to resolve the estimated covariance when the number of observations exceeds the explanatory

variables, or called high-dimensional data. The explanatory dataset is generated from the normal distribution,

contaminated normal distribution, and uniform distribution. The binary of the response variables is computed

from the logit function depending on the explanatory variable. The highest average accuracy percentage

evaluates to propose the performance of the classification methods in several situations. Through simulation

results, the LDA was successful when using large sample sizes, but the RDA performed when using the most

sample sizes.

Key-Words: - high-dimensional data, linear discriminant analysis, regularized discriminant analysis

Received: September 11, 2022. Revised: April 2, 2023. Accepted: April 26, 2023. Published: May 10, 2023.

1 Introduction

The discriminant analysis is a statistical technique

that is helped the researcher to separate response

variables in terms of categorical data depending on

the explanatory variable. This method comprises a

discriminant function or decision function in the

form of a linear or quadratic function to divide two

or more classes of the response variable. [1],

illustrated the discriminant analysis to challenge the

classifying data. This paper demonstrated that the

discriminant analysis had good predictive accuracy

in the normal distribution. [2], applied the cosine

similarity measure based on decision rue in the

discriminant analysis.

Linear discriminant analysis is a well-known

technique for dimensionality reduction problems.

Pre-processing step is a machine learning and

pattern classification application, [3]. This technique

comes from the assumption of a standard covariance

matrix based on the multivariate normal

distribution. The decision boundary function is

created for computing the population. The

maximization of the likelihood function is to

evaluate the observation and the proportion of each

population. [4], applied linear discriminant analysis

for small sample sizes in the classification of face

recognition, bioinformatics, and text recognition.

[5], developed the linear discriminant analysis to

neighborhood linear discriminant analysis. Then, the

scatter matrices are defined on a neighborhood

consisting of reverse nearest neighbors.

When the assumption of the covariance matrix

has an individual for each group, this leads to so-

called quadratic discriminant analysis. The linear

discriminant analysis is straightforward, where the

number of observations is greater than the number

of the explanatory variable. However, it becomes a

severe problem where the number of the

explanatory variable is greater than the number of

observations, or it defines the high-dimensional

data. The quadratic discriminant analysis cannot be

inverted for computation because the sample

covariance matrix is singular. To overcome these

problems, the linear discriminant analysis makes

some adaptations to a new method as regularized

discriminant analysis, [6]. [7], improved the

covariance in regularized discriminant analysis on

the high-dimensional low-sample size data for the

ill-posed inverse problem. [8], conducted a large

dimensional experiment of regularized discriminant

analysis classifiers with its two popular methods,

known as regularized LDA and QDA.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.37

Autcha Araveeporn, Somsri Banditvilai

E-ISSN: 2224-2880

315

Volume 22, 2023

The LDA has extended to flexible discriminant

analysis (FDA), [9], a valuable multigroup

classification tool. FDA obtained nonparametric

versions of discriminant analysis by replacing linear

regression with any nonparametric regression

method, and this technique can improve its

classification performant results. [10], considered

the high-dimensional data for the within-class

covariance singular matrix, called penalized LDA,

that evaluated the performance of the resulting

methods in the simulation study. [11], described a

penalized version of LDA designed for highly

correlated independent variables. [12], fitted the

Gaussian mixture to each class to facilitate effective

classification in non-normal settings.

This article aims to study the binary

classification of high-dimensional data by

comparing LDA and QDA. Through simulation

data, we generate explanatory variables from the

normal distribution, contaminated normal

distribution, and uniform distributions, while

response variables are obtained from the logit

function. The maximum average accuracy

percentage investigates the performance of two

methods.

This study is divided into four sections: the first

section discusses the importance and background of

linear discriminant analysis and regularized

discriminant analysis. Section 2, the general

definitions related to discriminant analysis, proposes

the theorems of these methods. Section 3 presents

the simulation study and results used to construct

the response and explanatory variables in the high-

dimensional data. A discussion of our simulation

results is shown in section 4. Finally, the conclusion

and recommendations are provided in Section 5.

2 Discriminant Analysis

The explanation of LDA and RDA relates to the

Bayes theory concept based on a multivariate

normal distribution. The two classes have a normal

distribution in the real world, then

2

1 1 1

2

22

( , ), if ,

( , ), if

N x C

xN x C













where

1

C

and

2

C

denote the first and the second

class. The definition of the probability distribution is

   

1 1 1,P x f x



  C

and

   

2 2 2,P x f x



  C

where the prior distributions denote

 

1

fx

and

 

2

fx

by

1



and

2



. According to the Bayes

theorem, the posterior distribution is written by

   

 

11

1

11

1

|(

()

() ,

)

|

C

k

kk

P X x x P x

Px PX

Xx x

fx

P X x x





  











CC

C

(1)

where

C

is the number of class. The likelihood and

the prior functions of class are

 

1

fx

and

1



.

Therefore, the posterior distribution in (1) becomes

   

1 1 1 2

11

( ) ( ) ,

||

kk

CC

kk

f x f x

P X x x P X x x







   



CC

then

   

21 21 .f x f x





(2)

Now, the thinking of a multivariate dataset of

discriminant analysis is

12

( , ,..., )

n

= x x xx

with

n

observations where

12

( , ,..., ) , 1,2,...,

i i i ip

x x x x i n





in

p

variables. This dataset focuses on the multivariate

normal distribution called

~ ( , )Nx



. The

probability distribution function for

x

is

 

   

1

( ) ,

2

T

p

f exp











  







xx

x | ,





(3)

where

12

( , ,..., )

p

=

  



denotes the mean of

the dataset,



denotes the covariance matrix, and

1



denotes the inverse of the covariance matrix.

Therefore, the two classes of multivariate

normal distribution in (2) and (3) become

 

   

 

   

11

1

22

2

1

2

1

2

1,

2

T

p

T

p

exp































xx



(4)

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.37

Autcha Araveeporn, Somsri Banditvilai

E-ISSN: 2224-2880

316

Volume 22, 2023

2.1 Linear Discriminant Analysis

The linear discriminant analysis mentions the equal

covariance matrix on two classes

12

  

,

[13]. Therefore, the probability distribution function

in (3) becomes:

 

   

 

   

11

1

22

2

1

1exp 2

2

1exp ,

2

T

p

T

p































xx



(5)

where

1



and

2



are the probability of two

classes, and

1



and

2



are the mean of two

classes.

Take the natural logarithm in (5) two sides, and

the simplified term shows that

1 1 1 1

1 1 1

11 ln( )

22

T T T



  

     x x x

  

2 2 2 2

1 1 1

11 ln( ).

22

T T T



  

      x x x

  

(6)

where (6) is

11

11TT

xx



, and multiply

two sides by two, and we have:

 

   

 

2 1 2 1 2 1

2

1

11

2

2ln 0.

TT





   









x

     

(7)

For obtaining (7), this equation can be seen in the

form of a linear function

T

A x+ b = 0

which is

called the LDA. The decision boundary to

discriminate the two classes is

 

   

 

2 1 2 1 2 1

2

1

11

ˆ ˆ ˆ ˆ ˆ ˆ

( ) 2

ˆ

2ln .

ˆ

ˆTT







    







xx

     

(8)

The classification corresponds to assigning two

classes as

1 , ( ) 0

() 2 , ( ) 0

if

xif













x

. (9)

The parameters associated with (9) are

approximated from the multivariate dataset as the

mean and covariance matrices following:

1

ˆ, 1,2

k

n

i

k

n





x



1 1 2 2

ˆˆ

( 1) ( 1)

ˆ,

2

nn

n

    

 

12

n n n

,

  

1

ˆˆˆ

1

k

nT

k i k i k

i

k

n

   

xx



and

ˆ,

k

n





where

ˆ



is called the pooled covariance matrix.

2.2 Regularized Discriminant Analysis

In high-dimensional data, the performance of linear

discriminant analysis is far from optimal since the

lack of observation is unstable data. Therefore, the

regularized discriminant analysis is proposed to

resolve the singularity problem. [14], proposed the

regularization in a covariance matrix

(



) by defining

ˆ(1 ) p

I



    

, (10)

where



is defined as the regularized parameter on

values

01





. Then, the regularization probably

is adjusted by the sample correlation matrix

1/2 1/2

ˆ

ˆ ˆ ˆ

R D D

in the same way,

ˆ(1 ) ,

p

R R I



  

(11)

where

ˆ

D

is the diagonal matrix of the pooled

covariance matrix (

ˆ



). Then, the regularized

covariance matrix is modified by (10) and (11) as

1/2 1/2

ˆˆ

.D RD

(12)

Now, the decision boundary depends on regularized

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.37

Autcha Araveeporn, Somsri Banditvilai

E-ISSN: 2224-2880

317

Volume 22, 2023

covariance matrix that can define the corresponding

linear discriminant analysis as,

 

   

 

2 1 2 1 2 1

2

1

11

ˆ ˆ ˆ ˆ ˆ ˆ

( ) 2

ˆ

2ln ,

ˆ

TT







    







xx

     

(13)

where the



can be from (12) and the classify

criterion is the same as (9).

3 Simulation Study and Results

The simulation study will classify the binary

response variables (

y

) based on an explanatory

variable (

x

) by using linear discriminant analysis

and regularized discriminant analysis. The

explanatory variables are generated on the normal

distribution, contaminated normal distribution, and

uniform distribution.

The normal distribution is the common data with

parameter



and variance

2



in the following

function:

2

()

22

2

1

( ; , ) ,

2

, , 0.

x

f x e x





 





    

    

The simulation data is generated from a normal

distribution with a mean of zero and a variance of

twenty-five or called

2

( , ) (0,25)NN





and the

probability density is shown in Fig. 1.

Fig. 1: The normal probability density with mean

zero and variance twenty-five.

The contaminated normal distribution is a mixture

of two normal distributions with a mixing

probability of contaminated data

p

and

1p

,

where

0 0.1p

. Then the contaminated normal

probability density is

 

2 2 2 2

; , (1 ) ( , ) ( , )f x p N pN c

     

  

,

where

c

is a parameter that determines the wider

standard deviation. In this case, we used the ten

percent of contaminated data (

0.1p

) and

5c

.

The mean and variance are defined as normal

distribution, and the histogram of the contaminated

normal distribution is shown in Fig. 2.

Fig. 2: The histogram of contaminated normal

distribution with mean zero, variance twenty-five,

0.1p

, and

5c

.

Finally, the uniform distribution is the

symmetric distribution with parameters

a

and

b

,

which are the minimum and maximum values. The

uniform probability density is written by

1

( ) , ,f x a x b

ba

  



where the mean is

() 2

ba

EX 



, and variance is

2

()

() 12

ba

Var X 



. This explanatory variable is

simulated in the range of -2 to 2 with a mean zero

and a variance of 1.333. The probability density is

exhibited in Fig. 3.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.37

Autcha Araveeporn, Somsri Banditvilai

E-ISSN: 2224-2880

318

Volume 22, 2023

Fig. 3: The uniform probability density in the range

of -2 to 2.

Through simulation, the explanatory variables

are greater than the observed data

()n

standing on

the high-dimensional data. The number of

explanatory variables is defined as

30

( 15, 20, 25)n

,

60

( 20, 30, 40, 50, 55)n

,

and 100

( 20, 30, 40, 50, 70, 95)n

. The

response variable is obtained from the logit function

() 1

i

ie

pe





x

, where

x

are the explanatory

and



are the parameter of correlation coefficients.

If

( ) 0.5

i

px

, the response variables are denoted

as

1

i

y

, and

0

i

y

, when

( ) 0.5.

i

px

The R program was conducted to simulate data

and approximated the decision boundary to classify

the response variable. The confusion matrix was

created to decide the performance of these

classification methods. The predicted data were

evaluated to compare with the real data using the

accuracy percentage (Table 1).

Table 1. The confusion matrix of real data

(

i

y

) and predicted data (

ˆi

y

).

Accuracy Percentage 100.

TP TN

TP TN FP FN





  

The average accuracy percentage for the

classification of the linear discriminant analysis and

regularized discriminant analysis are shown in Table

2, Table 3, and Table 4. Then Fig. 4, Fig. 5, and Fig.

6 show the average accuracy percentage trend when

sample sizes are increased.

Table 2. The average accuracy percentage of linear discriminant analysis (LDA) and regularized

discriminant analysis (RDA) under 30 independent variables.

In Table 2, the RDA employs the highest

average accuracy percentage in all cases. It can

see that the increased sample size of RDA does

not affect classification except for LDA. When

the sample sizes increase, the average accuracy

percentage of LDA is increased, as shown in

Fig. 4.

Predicted data

Real data

1

i

y

0

i

y

ˆ1

ii

y

True Positive

(TP)

False Positive

(FP)

ˆ0

i

y

False Negative

(FN)

True Negative

(TN)

Sample Sizes

(

n

)

Normal

Contaminated Normal

Uniform

LDA

RDA

LDA

RDA

LDA

RDA

15

85.14

99.60

84.21

97.13

85.69

99.72

20

92.71

99.63

90.64

97.12

93.15

99.54

25

98.44

99.36

96.30

96.52

98.44

99.27

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.37

Autcha Araveeporn, Somsri Banditvilai

E-ISSN: 2224-2880

319

Volume 22, 2023

Fig. 4: The trend of the average accuracy percentage of linear discriminant analysis (LDA) and regularized

discriminant analysis (RDA) under 30 independent variables.

Table 3. The average accuracy percentage of linear discriminant analysis (LDA) and regularized

discriminant analysis (RDA) under 60 independent variables.

From the average accuracy percentage in Table 2,

the RDA is appropriate for the small sample sizes,

but LDA outperforms the large sample sizes. The

average accuracy percentage of LDA is increased

when the sample sizes increase, as shown in Fig. 5.

Fig. 5: The trend of the average accuracy percentage of linear discriminant analysis (LDA) and regularized

discriminant analysis (RDA) under 60 independent variables.

Sample Sizes

(

n

)

Normal

Contaminated Normal

Uniform

LDA

RDA

LDA

RDA

LDA

RDA

20

77.71

99.86

79.62

98.64

77.04

99.82

30

85.61

99.67

87.19

98.28

85.64

99.65

40

94.00

99.31

93.05

98.15

94.19

99.52

50

99.27

99.23

97.94

97.78

99.28

99.16

55

99.94

99.06

99.50

97.66

99.95

99.14

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.37

Autcha Araveeporn, Somsri Banditvilai

E-ISSN: 2224-2880

320

Volume 22, 2023

Table 4. The average accuracy percentage of linear discriminant analysis (LDA) and regularized

discriminant analysis (RDA) under 100 independent variables.

Sample Sizes

(

n

)

Normal

Contaminated Normal

Uniform

LDA

RDA

LDA

RDA

LDA

RDA

20

70.74

99.96

73.61

99.45

70.70

99.96

30

75.60

99.87

78.69

99.14

75.33

99.89

40

80.75

99.80

83.32

99.09

80.82

99.79

50

85.92

99.54

87.51

98.83

86.16

99.62

70

96.14

99.33

94.93

98.58

96.29

99.42

95

99.99

98.99

99.94

98.19

99.99

98.89

According to the results in Table 4, the RDA

performs well in most cases, but the LDA is a

perfect classification in the largest sample sizes. The

average LDA accuracy percentage increases when

the sample sizes increase, as shown in Fig. 6.

Fig. 6: The trend of the average accuracy percentage of linear discriminant analysis (LDA) and regularized

discriminant analysis (RDA) under 100 independent variables.

4 Discussion

The classification performance for the binary

response variable depended on the explanatory

variables via the normal, contaminated normal, and

uniform distributions shown in Table 2, Table 3, and

Table 4. Starting with the first table, the average

accuracy percentage in RDA for small explanatory

variables is more significant than LDA for all

sample sizes. Moreover, when the explanatory

variables are increased to the moderate and high

range, the average accuracy percentage in RDA is

more significant than LDA in most sample sizes, as

shown in Table 3 and Table 4. Meanwhile, in the

largest sample sizes, the average accuracy

percentage in LDA is greater than RDA. The

average accuracy percentage increases when the

sample sizes are increased, as shown in Fig. 4, Fig.

5, and Fig. 6. The several distributions give the

same performance methods, but the normal and

uniform distributions present the highest average

accuracy percentage. The choice of data distribution

plays a vital role in good classification accuracy,

[15].

5 Conclusion

This paper provided a binary classification by

applying the high-dimensional data for linear

discriminant analysis (LDA) and regularized

discriminant analysis (RDA). We explained the

benefit of explanatory variables on several

distributions for predicting binary response

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.37

Autcha Araveeporn, Somsri Banditvilai

E-ISSN: 2224-2880

321

Volume 22, 2023

variables. Through a simulation study, the RDA

outperformed more than the LDA in most sample

sizes. However, the LDA was reasonable working

with the largest sample sizes.

When considering the distribution, the average

accuracy percentage of the normal and uniform

distributions was slightly different because of the

symmetric distribution. In the case of outlier data,

the RDA performed well for classification. These

results explained that the RDA was adequate for a

classification based on high-dimensional data in

most cases. Therefore, we concluded that the RDA

could classify the situation of the sizeable

explanatory variable and the sample sizes.

Furthermore, the RDA was recommended for small

sample sizes, [16], and large dimensional data, [17].

For future work, the RDA can apply the

classification of psychological tasks, [18].

The simulation data is mainly used in this

research. For future work, the real dataset in high-

dimensional distribution, especially medical data

such as large-scale gene expression data for

classification disease in small patients. This research

focuses the discriminant classification. Then the

machine learning method can apply in this case.

Acknowledgments:

This research is supported by King Mongkut's

Institute of Technology Lad-krabang.

References:

[1] T. Ramayah, N.H. Ahmad, H. A. Halim,

S, R. M. Zanai, M. H. Lo, Discriminant

analysis: An illustrate example, African

Journal of Business Management, Vol.4,

No.9, 2010, pp. 1654-1667.

[2] C. Liu, Discriminant analysis and similarity

measure, Pattern Recognition, Vol.47, No.1,

2014, pp.359 -367.

[3] A. Tharwat, T. Gaber, A. Ibrahim, A. E.

Hassanien, Linear discriminant analysis:

detailed tutorial, Ai Communications, Vol.30,

No.2, 2017, pp.169-190.

[4] A. Sharma, K. K. Paliwal, Linear discriminant

analysis for the small sample size

problem: an overview, International Journal

of Machine Learning and Cybernetics, Vol.6,

2017, pp.443-454.

[5] F. Zhu, J. Gao, J. Yang, N. Ye, Neighborhood

linear discriminant analysis, Pattern

Recognition, Vol.123, No.2, 2022, Article no.

108422.

[6] F. H. Friedman, Regularized discriminant

analysis, Journal of the American Statistical

Association, Vol.84, 1889, pp.165-175.

[7] S. Yang, H. Xiong, K. Xu, L. Wang, J. Bian,

Z. Sun, Improving covariance-regularized

discriminant analysis HER-based predictive

analytics of diseases, Applied Intelligence,

Vol.51, 2021, pp.377-395.

[8] K. Elkhalil, A. Kammoun, R. Couillet, T. Y.

Al-Naffouri, M. S. Alouini, A large

dimensional study of regularized discriminant

analysis, IEEE Transections on Signal

Processing, Vol.68, 2020, pp.2464-2479.

[9] H. Trevor, R. Tibshirani, A. Buja. Flexible

Discriminant Analysis by Optimal Scoring,

Journal of the American Statistical

Association, Vol.89, No. 428, 1994, pp.1255-

1270.

[10] D.M. Witten, R. Tibshirani, Penalized

Classification using Fisher’s Linear

Discriminant, The Journal of the Royal

Statistical Society, Series B, Vol.75, No.5,

2011, pp.753-772.

[11] T. Hastie, A. Buja, R. Tibshirani, Penalized

Discriminant Analysis, The Annals of

Statistics, Vol. 23, No. 1, pp. 73-102.

[12] T. Hastie, R. Tibshirani, Discriminant

Analysis by Gaussian Mixtures, The Journal

of the Royal Statistical Society, Series B,

Vol.58, No.1, 1966, pp.155-176.

[13] B. Ghojogh, M. Crowley, Linear and

quadratic discriminant analysis: Tutorial,

2019, Available at http:// arXiv preprint

arXiv:1906.02590.

[14] Y.Guo, T. Hastie, R. Tibshirani, Regularized

Discriminant Analysis and Its Application in

Microarrays, Biostatistics, Vol. 1, No.1,2005,

pp. 1-8.

[15] K. Ksushbu, P. Nishad, V. Kasyap, I. Gupta.

A Classification and Distribution Model for

Data Leakage Prevention and Detection,

International Research Journal of

Modernization in Engineering Technology

and Science, Vol. 3, No.2, 2021, pp. 348-354.

[16] J. Ye, T. Wang, Regularized discriminant

analysis for high dimensional, low sample

size data, Proceeding of the 12th ACM

SIGKDD international conference on

Knowledge discovery and data mining.

Philadelphia, Pennsylvania, USA, 2006,

pp.454-463.

[17] X. Yang, K. Elkhalil, A. Kammoun, T. Y. Al-

Naffouri, M. S. Alouini, Regularized

Discriminant Analysis: A large Dimensional

Study, 2018 IEEE International Symposium

on Information Theory (ISIT). Vali, Colorado,

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.37

Autcha Araveeporn, Somsri Banditvilai

E-ISSN: 2224-2880

322

Volume 22, 2023

USA, 2018, pp. 536-540.

[18] R. Fu, M. Han, Y. Tian, P. Shi, Improvement

motor imagery EEG classification based on

sparse common spatial pattern and regularized

discriminant analysis, Journal od

Neuroscience. Vol. 343, 2020, Article no.

108833.

Contribution of Individual Authors to the

Creation of a Scientific Article:

-Autcha Araveeporn has conceptualized the research

and organized the simulation process to the

discussion.

-Somsri Banditvilai has derived the results and

made the conclusion.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself:

This research is supported by King Mongkut's

Institute of Technology Ladkrabang.

Conflict of Interest

The authors have no conflict of interest to declare.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.37

Autcha Araveeporn, Somsri Banditvilai

E-ISSN: 2224-2880

323

Volume 22, 2023