A Class of Population Mean Estimators in Stratified Random Sampling:
A Case Study on Fine Particulate Matter in the North of Thailand
NUANPAN LAWSON1*, NATTHAPAT THONGSAK2
1Department of Applied Statistics, Faculty of Applied Science,
King Mongkut’s University of Technology North Bangkok,
1518 Pracharat 1 Road, Wongsawang, Bangsue, Bangkok 10800,
THAILAND
2State Audit Office of the Kingdom of Thailand,
Bangkok, 10400,
THAILAND
Abstract: - Residents of Thailand’s upper northern have been facing hazardous air quality with the amount of
fine particulate matter rising several times higher than the standards of the World Health Organization for many
years which is classified as a level that severely affects public health. The dust problem is an urgent issue in
Thailand that needs to be solved. Assessment of pollution data in advance can help the Thai government in
planning to abolish and prevent ongoing dust problems for Thai citizens. A new class of population mean
estimators is proposed under stratified random sampling. The bias and mean square error of the proposed
estimators are studied using a Taylor series approximation. A simulation study and an application to air
pollution data in the north of Thailand to investigate the performance of the estimators. The results from the air
pollution data in the north of Thailand present that the proposed estimators offer the highest efficiency
concerning others.
Key-Words: - Ratio estimator, Fine particulate matter, Mean square error, Stratified random sampling,
Population mean, Thailand.
Received: November 3, 2023. Revised: November 19, 2023. Accepted: December 15, 2023. Published: March 8, 2024.
1 Introduction
Thailand’s air pollution has become an increasing
concern at present due to the amount of dust
exceeding Thailand’s and the World Health
Organization’s guidelines. Multiple provinces in
the north where a 24-hour average of fine particulate
matter with a diameter less than 2.5 microns
(PM2.5) and fine particulate matter with a diameter
less than 10 microns (PM10) on a 24-hour average
exceed the standard of 50 micrograms per cubic
meter and the air quality index (AQI) exceeds the
standard value of 100 which is classified as levels
that severely affect human health. In northern
Thailand, there is an abundance of provinces
tackling pollution from dust, due to seasonal
agricultural practices and geography consisting of
mountainous terrain accounting for the trapping of
pollution that passes into the area. There is a
significant correlation between pollution and
citizens’ health and quality of life, as shown by an
increased incidence of upper respiratory symptoms
and lung cancer, a worrying problem. Lung cancer
due to PM2.5 pollution affects genetically
susceptible people at a rapid and unprecedented rate,
which is an increasing concern amongst the
population. The Thai government needs to monitor
the air quality data and assess the situation closely
including implementing measures to control burning
in forest areas, farmland, and areas along the
highway strictly. Nevertheless, the government
must enact a strict policy requesting people's
cooperation to refrain from burning waste and
agricultural waste in preparation for farming to
prevent forest fires which can lead to dust issues in
the north of Thailand.
Possessing air pollution data can assist in policy
planning and preventing harm to life. Thailand's air
pollution data have been investigated by many
researchers. Carbon monoxide is used to estimate
the average PM2.5 in Dindang, Thailand using the
known prior information in simple random sampling
without replacement (SRSWOR), [1]. The best
estimators with the smallest mean square error
(MSE) that were suggested by [1], are based on the
known median of the auxiliary variable and quartile
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2024.23.19
Nuanpan Lawson, Natthapat Thongsak
E-ISSN: 2224-2880
160
Volume 23, 2024
average. The [1], estimators assist in saving budget
and time with a small sampling fraction. The
transformation technique to create combined
estimators was called upon to estimate the average
PM2.5 via nitrogen dioxide in Chiang Rai under
double sampling which results in improving the
performance of the combined estimators concerning
the single ones, [2]. The air pollution from vehicles
in Selangor, Malaysia, an area with heavy traffic
congestion has been studied in [3]. They found that
PM10 and ozone are the key factors contributing to
air pollution in Selangor although the air pollution
there seemed to be reduced in the period of their
study, [4], [5].
In sample surveys the ratio and regression
estimators for estimating population mean, an
average of a specified characteristic of a group, are
well known in assisting to heighten the efficiency of
estimators when an available auxiliary variable has
a positive relation to a study variable, a ratio
estimator was introduced for estimating the study
variable population mean under SRSWOR when the
population mean of the auxiliary variable is present,
[6]. To increase the precision of the population
mean estimator, some parameters of an auxiliary
variable such as coefficient of variation, correlation,
and kurtosis are applied to modify the usual ratio
estimator, [7], [8], [9]. Five ratio estimators using a
regression estimator and known parameters to
estimate the population mean were proposed by
[10], [11], [12], [13].
Stratified random sampling is one of the
sampling techniques that is suitable for use when the
units in the population are homogenous within the
same stratum and heterogeneous between different
strata. In the technique of stratified sampling, a
population comprising N units is subdivided into
distinct subpopulations, for example, it could be
divided into geographical characteristics, age group,
and gender. The subpopulations are called strata.
Dividing strata is useful when we want to find the
size of each stratum. The samples in each stratum
are selected via simple random sampling. Under
stratified random sampling, there are two types of
ratio estimator; a separate and combined one. Ratio
estimators were adopted by [14], under SRSWOR
that were suggested by [7], [8], [9], which resulted
in four separate ratio estimators under stratified
random sampling and found that the proposed
estimators are more efficient than the usual separate
ratio estimator. More works that proposed to modify
ratio estimator under stratified random sampling can
be seen in [15], [16], [17], [18].
In the prevailing study, we aim to propose a
new class of separate ratio estimators utilizing some
parameters of an auxiliary variable under stratified
random sampling. We acquired the formula of bias
and mean square error of the proposed class of
estimators up to the first-degree Taylor
approximation. A simulation study and an
application to air pollution in the north of Thailand
are considered to demonstrate the capacity of the
proposed class of estimators.
2 Existing Estimators
2.1 The Usual Separate Ratio Estimator
The usual separate ratio estimator for the population
mean is
RS
1
ˆ,
Lh
hh
hh
X
Y W y x



(1)
where
1
h
N
h i h
i
X x N
is the population mean of an
auxiliary variable in stratum
; 1,2,3,...,h h L
,
and
1
h
n
h i h
i
y y n
are the sample
means of the auxiliary and study variables in
stratum
h
based on a sample of size
h
n
,
respectively,
h
h
N
WN
is the stratum weight, and
h
N
is the population size in stratum
h
, such that
1
.
L
h
h
NN
The bias and MSE of
RS
ˆ
Y
are
2
RS
1
ˆ,
L
h h h xh h xh yh
h
Bias Y W Y C C C


2 2 2 2
RS
1
ˆ2,
L
h h h yh xh h xh yh
h
MSE Y W Y C C C C

where
xh xh h
C S X
is the auxiliary variable’s
population coefficient of variation in stratum
h
,
xyh
h
xh yh
S
SS
is the population correlation
coefficient between the auxiliary and study variables
in stratum
h
,
2
1
1
1
h
N
xh i h
i
h
S x X
N

,
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2024.23.19
Nuanpan Lawson, Natthapat Thongsak
E-ISSN: 2224-2880
161
Volume 23, 2024
2
1
1
1
h
N
yh i h
i
h
S y Y
N

, and
1
1
1
h
N
xyh i h i h
i
h
S x X y Y
N
2.2 The Adjusted Ratio, [14], Estimators
Ratio estimators proposed by [8], [9], were adopted
under SRSWOR to stratified random sampling, [14].
The [14], estimators are:
1
1
TL
ˆ,
Lh xh
hh
hh xh
XC
Y W y xC



(2)
2
2
TL2
1
ˆ,
Lhh
hh
hhh
Xx
Y W y xx




(3)
2
12
TL3
ˆ,
Lh h xh
hh
hh h xh
x X C
Y W y x x C




(4)
2
1
T
2
L4
ˆ,
Lxh h h
hh
hxh h h
C X x
Y W y C x x




(5)
where
4
2
1
24
( 1) 3( 1)
( 1)( 2)( 3) ( 2)( 3)
h
N
h h i
ih
h
h h h xh h h
N N x X N
xN N N S N N


is the population coefficient of kurtosis of the
auxiliary variable in stratum
h
.
The biases and MSEs of the [14], estimators are:
2
2
1
1
TL
ˆ,
Lhh
h h h xh h xh yh
hh xh h xh
XX
Bias Y W Y C C C
X C X C







TL2
2
2
122
ˆ,
Lhh
h h h xh h xh yh
hh h h h
XX
Bias Y W Y C C C
X x X x








2
22
2
22
TL3
1
ˆ,
Lh h h h
h h h xh h xh yh
hh h xh h h xh
x X x X
Bias Y W Y C C C
x X C x X C









T
122
4
2
2
L
ˆ,
Lxh h xh h
h h h xh h xh yh
hxh h h xh h h
C X C X
Bias Y W Y C C C
C X x C X x








2
2 2 2 2
1TL
1
ˆ2,
Lhh
h h h yh xh h xh yh
hh xh h xh
XX
MSE Y W Y C C C C
X C X C






2
2
TL2
2 2 2
122
ˆ2,
Lhh
h h h yh xh h xh yh
hh h h h
XX
MSE Y W Y C C C C
X x X x







2
22
2 2 2 2
122
TL3
ˆ2,
Lh h h h
h h h yh xh h xh yh
hh h xh h h xh
x X x X
MSE Y W Y C C C C
x X C x X C








2
2 2 2 2
2
L4
12
T
ˆ2.
Lxh h xh h
h h h yh xh h xh yh
hxh h h xh h h
C X C X
MSE Y W Y C C C C
C X x C X x







The usual separate ratio in (1) and [14], estimators
in (2) to (5) in a general form are:
R
1
ˆ,
Lh h h
hh
hh h h
A X D
Y W y A x D



(6)
The bias and MSE of the
R
ˆ
Y
in general forms are"
22
R
1
ˆ,
L
h h h h xh h h xh yh
h
Bias Y W Y C C C

(7)
2 2 2 2 2
R
1
ˆ2.
L
h h h yh h xh h h xh yh
h
MSE Y W Y C C C C
(8)
where
0 and
hh
AD
are constants or functions of
the auxiliary variable in stratum
h
, and
hh
h
h h h
AX
A X D
.
3 Proposed Estimator
A new group of separate ratio estimators is shown
under stratified random sampling inspired by the
work of [10], under SRS.
Reg
1
ˆ,
Lh h h
h h h h h
hh h h
A X D
Y W y b X x A x D





(9)
where
h
b
is a sample regression coefficient of
h
in
stratum
h
.
Let
0
hh
h
h
yY
Y
then
0
1
h h h
yY

, let
1
hh
h
h
xX
X
then
1
1
h h h
xX

, such that
2 2 2 2
0 1 0 1
01
0, , ,
.
h h y x
xy
E E E C E C
E C C
Rewriting (9) in terms of
01
and
hh

,
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2024.23.19
Nuanpan Lawson, Natthapat Thongsak
E-ISSN: 2224-2880
162
Volume 23, 2024
Reg 0 1
11
1
0 1 1
1
22
0 1 1 1
1
0 1 1
1
22
0 1 1 1
ˆ11
1
1
Lh h h
h h h h h h
hh h h h
L
h h h h h h h h h
h
L
h h h h h h h h h h h
h
L
h h h h h h h h h h
h
h h h h h h h h h h
A X D
Y W Y b X A X D
W Y Y b X
W Y Y b X
W Y Y b X Y
Y b X












2h
Y
The estimation error of
Reg
ˆ
Y
is:
Reg 0 1 1
1
2 2 2
0 1 1 1
ˆL
h h h h h h h h h
h
h h h h h h h h h h hh
Y Y W Y b X Y
Y b X Y
The
Reg
ˆ
Y
bias
Reg Reg
ˆˆ
Bias Y E Y Y
2
1
,
L
h h h h h h h xh h xh yh
h
W Y K C C C


(10)
The
Reg
ˆ
Y
MSE
2
Reg Reg
ˆˆ
MSE Y E Y Y
2
0 1 1
1
2
2
0 1 1
1
2 2 2 2 2 2 2 2 2
0 1 1 0 1
1
22
0 1 1
2
22
L
h h h h h h h h h
h
L
h h h h h h h h h
h
L
h h h h h h h h h h h h h h
h
h h h h h h h h h
E W Y b X Y
W E Y b X Y
W E Y b X Y b X Y
Y b X Y




2
2 2 2 2
1
2
L
h h h yh h h h xh h h h h xh yh
h
W Y C K C K C C
(11)
Some members of
Reg
ˆ
Y
are represented in Table 1.
Table 1. Some of the proposed estimators,
Reg
ˆ, 1,2,...,5
i
Yi
4 Theoretical Comparison
The effectiveness of the suggested estimator is
evaluated in comparison to the currently employed
estimators; the usual separate ratio, and Tailor and
Lone estimators. The MSE is used as a criterion in
comparison with the new estimator in (11) with the
existing ones in the general form in (8) as follows.
The suggested
Reg
ˆ
Y
exhibits superior performance
to the existing
R
ˆ
Y
if:
Reg R
ˆˆ
MSE Y MSE Y
2
22
2 2 2 2 2 2 2
11
22
2 2 2 2 2 2
11
2
2 2 2 2
2
2
2
2
LL
yh h h h xh
h h h h h h yh h xh h h xh yh
hh
h h h h xh yh
LL
h h h xh
h h h h h h h xh h h xh yh
hh
h h h h xh yh
h h h h h h xh h
C K C
W Y W Y C C C C
K C C
KC
W Y W Y C C C
K C C
W Y K C



















2 2 2
11
22
LL
xh h h h h h h h xh yh h h xh yh
hh
C W Y K C C C C


2
2 2 2 2 2 2
11
2
LL
h h h xh h h h h h h h h xh yh h h h h
hh
W Y C K W Y C C K


5 A Simulation Study
The simulation studies illustrate and compare the
effectiveness of the suggested estimators against
those already in existence. A population of size
2,000N
is divided into 3 strata and variable
( , )XY
are generated via bivariate normal
distribution for each stratum. The strata parameters
are:
1ststratum:
1 1 1 1 1 1
1,000, 400, 500, 0.3, 1.4, 0.8
xy
N X Y C C
2nd stratum:
2 2 2 2 2 2
600, 550, 700, 1.0, 1.6, 0.6
xy
N X Y C C
3rd stratum:
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2024.23.19
Nuanpan Lawson, Natthapat Thongsak
E-ISSN: 2224-2880
163
Volume 23, 2024
3 3 3 3 3 3
400, 550, 350, 1.1, 0.9, 0.4
xy
N X Y C C
A sample of sizes
100, 200, 400, and 600n
are allocated from the population
2,000N
using
SRSWOR. We allocate the sample sizes n to each
stratum using proportional allocation. For
100,n
the allocated sample sizes for the 1st, 2nd, and 3rd
strata are
1 2 3
50, 30, 20n n n
, for
200,n
the
allocated sample sizes for the 1st, 2nd and 3rd strata
are
1 2 3
100, 60, 40n n n
, for
400n
the
allocated sample sizes for the 1st, 2nd and 3rd strata
are
1 2 3
200, 120, 80n n n
and for
600n
the
allocated sample sizes for the 1st, 2nd and 3rd
stratums are
1 2 3
300, 180, 120n n n
,
respectively. R program, [19], is applied to repeat
the simulation 10,000 times.
The new and existing biases and MSEs are
calculated by:
10,000
1
1
ˆˆ
,
10,000 i
i
Bias Y Y Y

10,000 2
1
1
ˆˆ
.
10,000 i
i
MSE Y Y Y

The biases and MSEs are represented in Table 2.
The results in Table 2 showed that the
introduced estimators worked well because they
gave both less bias and MSE than the existing ones.
All proposed calculations assisting with unique
known parameters of the auxiliary variable gave
similar results for both biases and MSEs. The larger
sample sizes gave smaller biases and MSEs
concerning smaller sample sizes. We can see that
the introduced estimators utilizing the sample
regression estimator are more effective than the
usual separate ratio and [14], ones under stratified
random sampling.
Table 2. Biases and MSEs of the estimators
6 An Application to Air Pollution
Data in Northern Thailand
Northern Thailand air pollution data from 2003-
2020 are applied to assess the effectiveness of the
newest estimators, [20]. The monthly average
density of nitrogen dioxide NO2 (mg per square
metre) is to be the auxiliary variable and the
concentration of PM2.5 (micron per cubic metre) is
considered as the study variable with a population
1,728N
units. The population parameters are
summarized below.
2
39.34, 2.48, 1.23, 0.30, 3.36,
yx
Y X C C x
0.68.
We considered eight provinces in northern
Thailand as the strata to study the average amount
of PM2.5. The provinces that are included in the
study are Chiang Mai, Chiang Rai, Lamphun,
Lampang, Phrae, Phayao, Nan, and Mae Hong Son
(
216, 1,2,...,8
h
Nh
). A sample
520n
is
randomly picked from a population
1,728N
using
SRSWOR. The samples of sizes
65
h
n
based on
proportional allocation are randomly taken from
each strata using SRSWOR. The parameters in each
stratum are showcased in Table 3.
Table 3. Population parameters for each province in
northern Thailand
The estimated PM2.5 and percentage relative
efficiencies (PREs) of the estimators against the
usual separate ratio estimator are depicted in Table
4.
The new group of estimators gave a better
performance which produced higher PREs
compared to all existing estimators. The proposed
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2024.23.19
Nuanpan Lawson, Natthapat Thongsak
E-ISSN: 2224-2880
164
Volume 23, 2024
estimator
Reg5
ˆ
Y
that used the benefit of both
xh
C
and
2hx
gave the highest PRE in this situation and
gave a closer estimated PM2.5 to the population.
The results revealed that the recommended
estimators achieved much more than the existing
ones based on the air pollution dataset.
Table 4. Estimated values of PM2.5 and PREs of the
estimators
7 Conclusion
A new class of separate ratio estimators for
predicting population mean are investigated through
this study under stratified random sampling. Some
available insights on the auxiliary variable have
been implemented to increase the efficiency of the
population mean estimator. The outcomes of the
simulation study and the application to air pollution
data in northern Thailand indicated that the
suggested estimators performed more effectively
than the typical separate ratio approach and [14]
estimators under stratified random sampling. As
expected, larger sample sizes resulted in smaller
MSEs for all situations. The top-performing
estimator outperformed all other existing estimators,
delivering nearly thrice the efficiency. In subsequent
research, additional established auxiliary variables
may be employed to predict the population means of
the variable under study, and the new estimators can
be formulated to suit more intricate survey
frameworks. e.g., cluster sampling and stratified
single-stage cluster sampling. Nevertheless, the
proposed estimators can help estimate other
application data in many areas of interest.
Acknowledgement:
We appreciate all unknown referees for their
valuable comments.
References:
[1] Lawson, N., Improved ratio type estimators
using some prior information in sample
surveys: a case study of fine particulate matter
in Thailand, WSEAS Transactions on
Systems, Vol. 22, 2023. pp. 538-542.
[2] Thongsak, N. and Lawson, N., A combined
family of dual to ratio estimators using a
transformed auxiliary variable, Lobachevskii
Journal of Mathematics, Vol.43, No.9, 2022,
pp. 2621-2633.
[3] Fadzil, A., Shuhaili, A., Ihsan, S. Z. and Faris,
W. F., Air pollution study of vehicles
emission in high volume traffic: Selangor,
Malaysia as a case study, WSEAS
Transactions on Systems, Vol. 12, 2013. pp.
67-84.
[4] Austin, W., Carattini, S, Gomez-Mahecha, J.
and Pesko, M.F., The effects of
contemporaneous air pollution on COVID-19
morbidity and mortality, Journal of
Environmental Economics and
Management, Vol. 119, 2023, pp. 102815.
[5] Beloconi, A. and Vounatsou, P, Long-term air
pollution exposure and COVID-19 case-
severity: An analysis of individual-level data
from Switzerland. Environmental Research,
Vol. 216, 2023, pp. 114481.
[6] Cochran, W.G. Sampling Techniques. 3rd
edition. India: Wiley Eastern Limited,
1940.
[7] Sisodia, B. V. S., and Dwivedi ,V. K., A
modified ratio estimator using coefficient of
variation of auxiliary variable, Journal of
the Indian Society of Agricultural Statistics,
Vol.33, No.1, 1981, pp.13-18.
[8] Upadhyaya, L. N., and Singh, H. P., Use of
transformed auxiliary variable in estimating
the finite Population Mean, Biometrical
Journal, Vol.41, No.5, 1999, pp. 627-636.
[9] Singh, H. P., Tailor, R., Tailor, R., and
Kakran, M. S., An improved estimator of
population mean using power transformation,
Journal of the Indian Society of Agricultural
Statistics, Vol. 58, No. 2, 2004, pp. 223-230.
[10] Kadilar, C. and Cingi, H., Ratio estimators in
simple random sampling, Applied
Mathematics and Computation, 2004, Vol.
151, pp. 893-902.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2024.23.19
Nuanpan Lawson, Natthapat Thongsak
E-ISSN: 2224-2880
165
Volume 23, 2024
[11] Kadilar, C. and Cingi, H., An improvement in
estimating the population mean by using the
correlation coefficient, Hacettepe Journal of
Mathematics and Statistics, 2006, Vol. 35, pp.
103 -109.
[12] Nangsue, N., Adjusted ratio and regression
type estimators for estimation of population
mean when some observations are missing,
International Scholarly and Scientific
Research & Innovation, Vo.3, No.5,
2009, pp. 334-337.
[13] Koç, T. and Koç, H. A new class of
quantileregression ratio-type estimators for
finite population mean in stratified
random sampling, Axioms, Vol. 12, No.
7, 2023, pp. 713.
[14] Tailor, R. and Lone, H. A., Separate ratio-type
estimators of population mean in stratified
random sampling, Journal of Modern Applied
Statistical Methods, 2014, Vol. 13, No.1,
pp.223-233.
[15] Kadilar, C. and Cingi, H., Ratio estimators in
stratified random sampling, Journal of
Modern Applied Statistical Methods, 2003,
Vol.45, No.2, pp. 218-225.
[16] Kadilar, C. and Cingi, H., A new ratio
estimator in stratified random sampling,
Communications in Statistics-Theory and
Methods, 2005, Vol. 34, No.3, pp. 597-602.
[17] Singh, R.V.K, and Ahmed, A., Ratio-type
estimators in stratified random sampling using
auxiliary attribute, Proceedings of the
International MultiConference of Engineers
and Computer Scientists 2014 Vol I, IMECS
2014, March 12 - 14, 2014, Hong Kong.
[18] Sharma, V. and Kumar, S., Simulation study
of ratio type estimators in stratified
randomsampling using multi-auxiliary
information, Thailand Statistician, 2020, Vol.
18, No. 3, pp.281-289.
[19] R Core Team, R: A language and environment
for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria, 2021,
[Online], https://www.R-project.org/
(Accessed Date: November 5, 2023)
[20] Inness, A., Ades, M., Agustí-Panareda, A.,
Barré, J., Benedictow, A., Blechschmidt, A.-
M., Dominguez, J. J., Engelen, R., Eskes, H.,
Flemming, J., Huijnen, V., Jones, L., Kipling,
Z., Massart, S., Parrington, M., Peuch, V.-H.,
Razinger, M., Remy, S., Schulz, M., and
Suttie, M. The CAMS Reanalysis of
Atmospheric Composition. Atmospheric
Chemistry and Physics, 2019, Vol. 19. No. 6,
pp. 3515-3556.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of funding for research presented in a
scientific article or scientific article itself
This work was funded by King Mongkut’s
University of Technology North Bangkok, Thailand.
Contract number 671159.
Conflict of Interest
The author has no conflicts of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2024.23.19
Nuanpan Lawson, Natthapat Thongsak
E-ISSN: 2224-2880
166
Volume 23, 2024