Gaussian Mixtures with common Variance

MIGUEL FELGUEIRAS 1,3,4, JOÃO MARTINS 2,3,5, RUI SANTOS 1,3

1ESTG, Polytechnic Institute of Leiria, PORTUGAL

2ESS, Polytechnic of Porto, Rua Dr. António Bernardino de Almeida, 4249-015 Porto, PORTUGAL

3CEAUL, Faculdade de Ciências, Universidade de Lisboa, PORTUGAL

4CIDMA, University of Aveiro, PORTUGAL 5CEISUC/CIBB, Coimbra, PORTUGAL

Abstract: The interest in Gaussian mixtures has grown significantly in recent years, primarily owing to their

adaptability and widespread applications across various fields of knowledge. A specific category within these

mixtures is Gaussian mixtures with common variance, wherein the assumption is made that the variances of

all subpopulations are equal. This study delves Gaussian location mixtures family, exploring their applications,

characterizations, and the challenges associated with estimation. Following this, we introduce an approximation

to the beta distribution. When addressing scenarios involving two subpopulations, a novel test for equality of

variances is proposed, employing the beta distribution approximation. This paper presents a new test for variance

equality which is a novelty in the Gaussian mixture context. Practical applications for the proposed test are

provided and discussed.

Key-Words: Gaussian location mixtures, beta distribution, variance equality test.

1 Introduction

The history of Gaussian mixtures models goes as far

as the nineteen century. In 1894, [1, 2] analysed a

sample of crabs to determine the size of their fore-

heads. He concluded that not a single species of crabs,

but a mixture of two crab species, was observed. In

a remarkable work, Pearson used a Gaussian mixture

to fit that data set. In this paper we consider Gaussian

mixtures in Pearson’s sense, that is, a Gaussian mix-

ture is a convex mixture of Gaussian random variables

when its density function is

fX(x) =

j=1

wj1

√2πσj

exp (−1

2x−µj

σj2),

(1)

where σj>0, wj>0,

j=1

wj=1and Ndenotes

the number of Gaussian random variables, each with

mean µj, standard deviation σjand weight wj. Note

that some works from other authors as [3, 4] deal with

a different kind of Gaussian mixture, namely assum-

ing that one of the Gaussian distribution parameters is

a random variable, usually the scale parameter. This

is a type of infinite Gaussian mixture that will not be

tackled in this work. Independently of the type of the

considered mixture, all kind of mixtures are very ef-

fective when fitting real data since they can accom-

modate multimodality and a wide range of density

shapes. For example, [5] uses deep Gaussian mixture

models to describe data in a very flexible way, since

at each layer the variables follow a mixture of Gaus-

sian distributions. In a machine learning approach

for communications, [6] applies Gaussian mixtures

to channel estimation. However, previous examples

have a major counter back: a large number of param-

eters must be estimated. The increase of computa-

tional power throughout the last decades allowed the

software implement of the expectation-maximization

algorithm (EM) [7], used to numerically estimate the

parameters, despite of some convergence constraints

[8].

In this context, variance equality is an important

theme since inference procedures are usually more

simple and accurate under that assumption. More-

over, the previously indicated estimation issues be-

come less relevant since the number of parameters to

estimate diminishes. From a practical point of view,

it is also relevant to decide whenever subpopulations

variances can be considered as equal. Hence, in this

work we deal with Gaussian mixtures with common

variances, presenting some results under that assump-

tion. Furthermore, a variance equality test is devel-

oped.

2 Moments and Miscellaneous for

Gaussian Mixtures

Let us consider that a random variable Xis a Gaus-

sian mixture with density as defined in Equation 1.

Therefore, moments can be obtained from cumulant

Received: November 11, 2023. Revised: March 9, 2024. Accepted: March 21, 2024. Published: April 24, 2024.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.30

Miguel Felgueiras, João Martins, Rui Santos

E-ISSN: 2224-2880

276

Volume 23, 2024

generation function,

ln[ϕX(−it)] = κ1it−κ2

2! −κ3it3

3! +κ4

4! +Ot5

(2)

with

κ1=µ0

1;κ2=µ(2);κ3=µ(3);κ4=µ(4)−3µ2

(2),

(3)

where µ(k)stands for the k-th centered moment, µ0

denotes the k-th raw moment and ϕXis the charac-

teristic function.

Two standard simplifications can be considered.

One corresponds to mean equality, that is, µj=µfor

j= 1, ..., N. Under mean equality, the mixture can

be approximated to the t-Student distribution. More-

over, t-Student distribution can be used to evaluate

the equality of means, that is, to test [9]

H0:µ1=µ2=... =µN.

The other standard simplification, that we will deal

with in this work, is to consider σ2

j=σ2for j=

1, ..., N.

Theorem 1. Let Xbe a Gaussian mixture where all

the subpopulations have equal variance σ2. Then

=V+Y

where Vand Yare independent random variables,

V∼N(0, σ)and Yis such that P(Y=µj) = wj,

for j= 1, ..., N.

Proof. Recall that for any independent random vari-

ables Vand Ywe have ϕV+Y(t) = ϕV(t)ϕY(t)and

that when V∼N(0, σ)then ϕV(t) = exp −t2σ2

2.

Consequently, the characteristic function of the sum

of the independent variables Vand Ydefined above

ϕV+Y(t) = ϕV(t)ϕY(t) =

=exp −t2σ2

2



j=1

wjexp (itµj)

=

j=1

wjexp itµj−t2σ2

2=ϕX(t),

and by the uniqueness of the characteristic function

we obtain Xd

=V+Y.

Thus, when all the Gaussian subpopulations share

the same variance, the mixture can be seen as a con-

volution between a Gaussian noise and a discrete ran-

dom variable [10]. This would take us to deconvo-

lution problems, often study in statistics [11, 12] but

beyond the scope of the present paper. The above con-

volution appears in many known applications. Previ-

ous work under amphibian nervous system [13, 14]

concluded that the junction between primary afferent

fibre and motoneurone provides joint electrical and

chemical transmission. The mixed synapse can be fit-

ted by binomial or Poisson convolutions with a Gaus-

sian noise. In image or signal processing, convolu-

tions between Poisson and zero mean Gaussian are

also used. For example, [15] refers that astronom-

ical images have additive uncorrelated noise. Pois-

son noise, due to photon arrival events, and Gaussian

white noise, due to commonly used digitized photo-

graphic plates.

Unimodality and multimodality are always possi-

ble, according with different combinations of parame-

ters [16]. If the mixture has an unimodal density func-

tion, it can be approximated to the Pearson system,

according to its β1and β2values [17], where β1and

β2

β1=µ(3)

µ3/2

(2)

;β2=µ(4)

µ2

(2)

(4)

are the skewness and the kurtosis coefficients. For

Pearson type I distribution (four parameters beta), the

approximation holds when

1.5β2

1< β2<1.5β2

1+ 3.(5)

3 Two Subpopulations

The main goal of this work is to develop a variance

equality test for Gaussian mixtures, considering only

two subpopulations as the starting point. A sufficient

condition for unimodality, independent from the val-

ues of w1and w2is given by [18]

|µ1−µ2| ≤ 2min (σ1, σ2).(6)

We will now assume that the previous condition

holds. Nevertheless, in practical issues, it is often

complicated to distinguish if multimodality is due to

the model or to a particular sample issue [19]. When

the subpopulations share the same variance, that is

when σ2

1=σ2

2=σ2,it is clear that w1=wand

w2= 1 −w. Under these circumstances, and for a

wide range of values of w, the mixture can now be ap-

proximated to a beta distribution [9, 10], using equa-

tion 5.

Theorem 2. Let Xbe a finite unimodal Gaussian

mixture with two components with equal variance. If

w∈1

2±√3

6,

then the mixture can be approximated to a beta distri-

bution.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.30

Miguel Felgueiras, João Martins, Rui Santos

E-ISSN: 2224-2880

277

Volume 23, 2024

Note that the situation where µ1=µ2(leading to

a single gaussian) corresponds to the case where the

winterval is tighter. In any other scenario we obtain a

wider interval that contains the presented one. For ex-

ample, if |µ1−µ2|=σwe get w∈[0.1899; 0.8101] .

As previously stated, Theorem 2 holds for most uni-

modal mixtures with common variance. Mixtures

with w/∈[0.2113; 0.7887] correspond, roughly, to the

contaminated populations problem, where a popula-

tion has a few elements that do not belong to it [20].

This is also an interesting problem, for example, when

dealing with infected or non infected elements from a

population with some disease.

4 Testing Variance Equality

As previously stated, the mixture can be approxi-

mated by a beta distribution when σ2

1=σ2

2, that is

X◦

∼beta(a, b, p, q)or

Y=X−a

b−a

◦

∼beta(p, q).

Unfortunately, in some situations when the variances

are different the approximation is still possible, even

theoretically (see example in Figure 1).

Figure 1: Region where condition (5) holds for

(w, µ1, µ2, σ1, σ2) = (0.35,0,2, σ1, σ2)

Hence, when testing H0: data follows a beta distri-

bution versus H1: data does not follow a beta distri-

bution, the rejection of H0implies that σ2

16=σ2

2,but

if H0is not rejected then the variances may or may not

be equal. Even though, σ2

1and σ2

2should be at least

close. Therefore, this test can be used to indirectly test

H0:σ2

1=σ2

1vs H1:σ2

16=σ2

2or, in a equivalent

way, to test H0:σ1=σ1vs H1:σ16=σ2.

All the four parameters can be simultaneous es-

timated, numerically, by the maximum likelihood

method [21]. This method is already implemented

in some software, like the R package ExDist based

on [22] and [23] work. However, [21] states that

good results can only be achieved for large samples,

since convergence to a global maximum is not guar-

anteed. Alternatively, straightforward estimators for

aand b, based on the sample minimum and maximum

(min Xi,max Xi)are

ba=min Xi−max Xi−min Xi

b=max Xi+max Xi−min Xi

and then the moment estimators can be defined as [23]

bp=X−a

b−a21−X−a

b−a

(b−a)2−X−a

b−a

bq=X−a

b−a1−X−a

b−a

(b−a)2−1−bp,

where as usual Xand S2represent sample mean and

sample variance, respectively.

5 Applications

In this section we apply the test to three real data sets,

in order to understand if the results can be applied to

practical situations. For all the analysed data sets, the

unknown parameter vector (µ1, µ2, σ1, σ2, w), where

the parameter wcorresponds to the first component

weight, was estimated by the EM algorithm using

MatlabR2013b. To compare models with common

and different variances, the Bayesian information cri-

terion (BIC) [24] was also computed. This is an in-

formation criterion that penalizes more severely over

fitting than the most used Akaike information crite-

rion. Smaller values of BIC are obtained for better

fitted models.

5.1 Applications to financial data

There are several applications concerning economical

data linked with Gaussian mixtures. In a very recent

work, [25] uses Gaussian mixture returns for portfo-

lio construction. In this paper we consider the model

of daily log-returns, a well known problem in finance.

Log-returns are defined as xt=ln(Xt)−ln(Xt−1),

where Xtrepresents the close index value of the t-th

day. Previous works like [26, 27, 28] suggest a wide

set of possible models, but Gaussian mixtures (with a

small number of components, preferably only two, to

avoid over-fitting) are a common choice. Let us con-

sider the daily log-returns from the PSI20 stock in-

dex. The data set comprehends the time gap between

2012/03/16 and 2017/03/17, roughly five years, and a

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.30

Miguel Felgueiras, João Martins, Rui Santos

E-ISSN: 2224-2880

278

Volume 23, 2024

Table 1: Estimated Gaussian mixture for the PSI20

data set.

bµ1bµ2bσ1bσ2bw

0.0011 −0.0024 0.0091 0.0176 0.6417

total of 1278 observations. Parameters estimates are

displayed in Table 1.

When analysing the histogram presented in Figure

2, the data can be considered as unimodal.

Figure 2: Histogram for PSI20 log-returns data.

When testing H0:σ2

1=σ2

2vs H1:σ2

16=σ2

we obtain a p-value = 0.0110 and accordingly (re-

member that the test is quite conservative) we should

reject H0and conclude for different variances. The

BIC measure, which greatly penalizes over fitting is

BIC =−7515.6568.For a Gaussian mixture with

the same variance we get BIC =−7510.9287 and,

as expected, the model with different variances yields

better results.

Next we present a similar example, for the daily

log-returns from the SP500 stock index. The data set

comprehends the same time gap but with a total of

1259 observations. The estimates are presented in Ta-

ble 2.

Table 2: Estimated Gaussian mixture for the SP500

data set.

bµ1bµ2bσ1bσ2bw

0.0007 −0.0001 0.0039 0.0104 0.5263

The data is clearly unimodal, as putted in evidence

by the histogram in Figure 3.

When testing H0:σ2

1=σ2

2vs H1:

σ2

16=σ2

2,we obtain a p-value <10−4and there-

fore we should reject H0and conclude for different

variances. The BIC measure value when σ2

16=σ2

Figure 3: Histogram for SP500 log-returns data.

is BIC =−8654.4748 and when σ2

1=σ2

2we get

BIC =−8545.2453. Again, the model with differ-

ent variances yields better results.

5.2 Application to Biometric Data

Let us consider the Davis dataset, available in the

R package “car”. It contains measured heights and

weights of 200 adults, men (112) and women (88),

engaged in regular exercise. Note that in line 12,

weight and height were switched as they appear to

be reversed in the original dataset. The descriptive

statistics for height (in centimeters) obtained from the

dataset are presented in Table 3.

Table 3: Descriptive statistics for the variable Height

in the “Davis” dataset.

Male Female

Mean 178.0114 164.7143

Standard Deviation 6.4407 5.6591

Proportion 0.44 0.56

Firstly we tested the normality of the variable

Height for both subsets (male and female) with Lil-

liefors normality test [24]. For the male and females

subsets we obtained, respectively, p-value = 0.8720

and p-value = 0.1818.

When testing H0:σ2

M=σ2

Fvs H1:σ2

M6=

σ2

Fwith F test, we obtain p-value = 0.1979. As a

consequence, variances should be considered as equal

for both sex.

For illustrative purpose, we will be considering for

now on that the data is “mixed”, that is, that we do

not know if an individual is male or female. The his-

togram presented in Figure 4 shows an unimodal data

set

If we fit a two component Gaussian mixture to the

dataset, we get as estimates for the model components

(Table 4).

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.30

Miguel Felgueiras, João Martins, Rui Santos

E-ISSN: 2224-2880

279

Volume 23, 2024

Figure 4: Histogram for the “Davis” height data.

Table 4: Estimated Gaussian mixture for the “Davis”

height dataset.

bµ1bµ2bσ1bσ2bw

177.6408 164.7896 6.7108 5.7621 0.4494

When testing H0:σ2

1=σ2

2vs H1:σ2

16=σ2

using the K-Stest for the beta distribution we obtain a

p-value = 0.5638 and consequently we should not re-

ject H0.Therefore, evidence to reject variance equal-

ity was not found, and we can not conclude for un-

equal variances. The BIC statistics is BIC = 1461.1

if σ2

16=σ2

2and BIC = 1457.1if σ2

1=σ2

2.Thence,

the model with common variance for both compo-

nents yields better results. This result corroborate the

obtained for the F test, that is, variances should not be

considered as different.

6 Conclusion

Finite Gaussian mixtures with the same variance can

be written has the convolution between a discrete

variable and a zero mean Gaussian variable. It might

be possible to decompose the mixture in this kind of

convolution, if we have an idea about the discrete

variable that is present. As stated, common examples

concern Poisson or binomial data as the discrete vari-

able, added with a Gaussian white noise.

For unimodal mixtures, Theorem 2 allows us to

approximate the mixture to a beta distribution, when

some conditions are fulfilled. This approximation re-

duce the number of unknown parameters from 2Nto

four, which can be interesting when working with a

large number of subpopulations. Besides, beta distri-

bution characterizations become available.

Finally, when only two subpopulations are consid-

ered, Theorem 2 can be used to test variance equality

in unimodal mixtures. The test was applied to three

different data sets with good results.

Together with the mean equality test presented in

[9], the variance equality test for Gaussian mixtures

can be very useful to deal with real data sets, since

mean and variance equality are some of the most com-

mon hypothesis in statistics and, as far as we know,

this variance equality test was not yet available for

Gaussian mixtures.

References:

[1] Pearson, K (1894). Contributions to the mathe-

matical theory of evolution, Philosophical Trans-

actions of the Royal Society of London A, 185,

71–110.

[2] Pearson, K (1895). Contributions to the mathe-

matical theory of evolution. II. Skew variations

in homogeneous material, Philosophical Trans-

actions of the Royal Society of London A, 186,

343–414.

[3] Andrews, D; Mallows, C (1974). Scale Mixtures

of Normal Distributions, Journal of the Royal

Statistical Society B, 36, 1, 99–102.

[4] Bakirov, N; Székely, G (2006). Student’s t-test for

Gaussian scale mixtures, Journal of Mathemati-

cal Sciences, 139, 3, 6497–6505.

[5] Viroli, C; McLachlan, GJ (2019). Deep Gaus-

sian mixture models. Statistics and Computing

29, 43–51.

[6] Fesl, B; Joham, M; Hu, S; Koller, M; Turan, N,

Utschick, W (2022). Channel Estimation based

on Gaussian Mixture Models with Structured Co-

variances. 56th Asilomar Conference on Signals,

Systems, and Computers, 533–537.

[7] Dempster, A; Laird, N; Rubin, D (1977). Maxi-

mum likelihood from incomplete data via the EM

algorithm, Journal of the Royal Statistical Society

B, 39, 1–37.

[8] Frühwirth-Schnatter, S (2006). Finite Mixture

and Markov Switching Models, Springer, New

York.

[9] Felgueiras, M; Martins, J; Santos, R (2017).

Gaussian Scale Mixtures, Journal of Numerical

Analysis, Industrial and Applied Mathematics,

11, 1–10.

[10] Felgueiras, M; Santos, R; Martins, J (2014).

Some Results on Gaussian Mixtures, AIP Confer-

ence Proceedings, 1618, 523–526.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.30

Miguel Felgueiras, João Martins, Rui Santos

E-ISSN: 2224-2880

280

Volume 23, 2024

[11] Guerrero-Colón, J; Simoncelli, C; Portilla, J

(2008). Image denoising using Mixtures of Gaus-

sian scale mixtures, 15th IEEE International

Conference on Image Processing, 565–568.

[12] Jansson, P (1997). Deconvolution of Images and

Spectra, Academic Press, San Diego.

[13] Grantyn, R; Shapovalov, A; Shiriaev, B (1984).

Relation between structural and release parame-

ters at frog sensory-motor synapse, The Journal

of Physiology, 349, 459–474.

[14] Shapovalov, A; Shiriaev, B (1980). Dual mode

of junctional transmission at synapses between

single primary afferent fibres and motoneurones

in the amphibian, The Journal of Physiology, 306,

1–15.

[15] Murtagh, F; Starck, J; Bijaoui, A (1995). Image

restoration with noise suppression using a mul-

tiresolution support, Astronomy and Astrophysics,

Supplement Series, 112, 179–189.

[16] Eisenberger, I (1964). Genesis of Bimodal Dis-

tributions, Technometrics, 6, 357–363.

[17] Johnson, N; Kotz, S; Balakrishnan, N (1994).

Continuous Univariate Distributions, Volume I,

Wiley, New York.

[18] Behboodian, J (1970). On the modes of a mix-

ture of two normal distributions, Technometrics,

12, 131–139.

[19] Everitt, B; Hand, D (1981). Finite Mixture Dis-

tributions, Chapman & Hall, London.

[20] Karlis, D; Xekalaki, E (2003). Mixtures Every-

where. In Stochastic Musings: Perspectives from

the Pioneers of the Late 20th Century, 78–95,

Lawrence, London.

[21] Carnahan, J (1989). Maximum Likelihood Es-

timation for the 4-Parameter Beta Distribution,

Communications in Statistics - Simulation and

Computation, 18, 2, 513–536.

[22] Bury, K (1999). Statistical Distributions in En-

gineering, Cambridge University Press, New

York.

[23] Johnson, N; Kotz, S; Balakrishnan, N (1995).

Continuous Univariate Distributions, Volume II,

Wiley, New York.

[24] Sheskin, D (2002). Handbook of Parametric and

Nonparametric Statistical Procedures, Chapman

& Hall, Boca Raton.

[25] Luxenberg, E; Boyd, S (2024). Portfolio con-

struction with Gaussian mixture returns and expo-

nential utility via convex optimization. Optimiza-

tion and Engineering 25, 555–574.

[26] Behr, A; Pötter, U (2009). Alternatives to the

normal model of stock returns: Gaussian mix-

ture, generalized logF and generalized hyperbolic

models, Annals of Finance, 5, 49–68.

[27] Kon, S (1984). Models of stock returns – a com-

parison. Journal of Finance, 39, 1, 147–165.

[28] Rachev, S; Mittnik, S (2000). Stable Paretian

Models in Finance, Wiley, New York.

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present re-

search, at all stages from the formulation of the prob-

lem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

This work is partially financed by national funds

through FCT – Fundação para a Ciência e a Tec-

nologia under the project UIDB/00006/2020.

DOI: 10.54499/UIDB/00006/2020

(https://doi.org/10.54499/UIDB/00006/2020).

Conflicts of Interest

The authors have no conflicts of interest to

declare that are relevant to the content of this

article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International , CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.30

Miguel Felgueiras, João Martins, Rui Santos

E-ISSN: 2224-2880

281

Volume 23, 2024