Improved Ratio Type Estimators using some Prior Information in Sample
Surveys: A Case Study of Fine Particulate Matter in Thailand
NUANPAN LAWSON
Department of Applied Statistics, Faculty of Applied Science,
King Mongkut’s University of Technology North Bangkok,
1518 Pracharat 1 Road, Wongsawang, Bangsue, Bangkok 10800,
THAILAND
Abstract: - Air pollution affects Thai people's health and social life nowadays as it exceeds the standards levels
of both Thailand and the World Health Organization. Estimating air pollution data can benefit understanding
and determining policies to help deal with this issue. Prior knowledge from past surveys or censuses could be
useful for increasing the effect of the estimation. Improved ratio estimators utilizing prior knowledge in simple
random sampling without replacement have been advocated. The property of the mean square error of the
proposed class of estimators is obtained. We applied the proposed estimators to the fine particulate matter data
in Dindang in 2019. The results from the air pollution data illustrate the improved ratio type estimators work
better with respect to the existing estimator using some prior information. Existing knowledge of the quartile
average and the median of the auxiliary variable gives rise to the best estimators with the lowest mean square
errors for estimating fine particulate matter. Nevertheless, the proposed estimators are useful for small sampling
fractions which can help in financial and time-consuming.
Key-Words: - ratio estimators, prior information, auxiliary variable, fine particulate matter, air pollution,
sample surveys
Received: August 24, 2022. Revised: April 14, 2023. Accepted: May 2, 2023. Published: May 29, 2023.
1 Introduction
A sample survey is an essential aspect of statistics
for the inference of the population based on the
sample. In general, inferential statistics concerns the
population parameters e.g. the mean, total, and
proportion. Each value collected from the
population concerns the interested parameter.
Sometimes some prior information is available from
previous sample surveys or conducting a small
survey and can be supportive for the estimation of
parameters in sample surveys. A sample mean (
y
)
of a study variable Y is employed to estimate the
population mean Y based on the sample. Utilizing
known prior information was shown by [1], who
introduced utilizing the available coefficient of
variation in the population of Y(y
C) in simple
random sampling without replacement (SRSWOR).
The, [1], estimator is
*
ˆ,
SS
YKy (1)
where

1
*2
1
Sy
KC
 ,

1/nN
n
, n and
Nare the sample and population sizes.
The mean square error (MSE) of ˆ
Y is
2
2
2
1
ˆ.
1
y
S
y
C
f
MSE Y Y
nC


 (2)
However, the positive correlation between the
auxiliary variable
X
and Y(
) can be used in
estimating a population mean. An example of the
popular estimators using this is the ratio estimators
proposed by [2]. The ratio estimator is
ˆ,
Ry
YX
x
(3)
where
x
is a sample mean of
X
.
The MSE of ˆR
Yis
222
1
ˆ2,
Ryxxy
f
MSE Y Y C C C C
n






(4)
where
x
Cis the coefficient of variation of X.
In [3], the author introduced the estimators inspired
by [1], employing a known y
C. Prasad’s estimators
are
1
*
ˆ,
PS
y
YKX
x
(5)
Nuanpan Lawson
E-ISSN: 2224-2678
538
Volume 22, 2023
2
*
ˆ,
PP
y
YKX
x
(6)
where

**
2
11
1
xy
PSxy
y
CC
K
KCC
C
 

,
X
is the population mean of
X
.
The MSE of 1
ˆ
P
Y and 2
ˆ
P
Y are
1
222
2
12 /
1
ˆ,
1
xy
Pyx
y
CC
f
MSE Y Y C C
nC








(7)
2
22
222
2
2
1
1
ˆ1
2
.
1
x
Pyx
y
xy
y
C
f
M
SE Y Y C C
nC
CC
C





(8)
The estimator 2
ˆP
Yis more efficient than 1
ˆP
Y. Many
researchers also suggested to include some known
parameters in their studies to make more efficient
estimators [4], [5], [6], [7], [8].
Air pollution issues have become detrimental to
the point where they affect Thai people's health. It
is an increasing concern due to the amount of fine
particulate matter with a diameter smaller than 2.5
microns (PM 2.5), that exceeds standards both in
Bangkok and the northern region which have
exceeded the standard value in Thailand and also the
world's standard. According to the World Health
Organization’s criteria, the 24-hour average should
not be more than 25 micrograms per cubic meter
while the average of Thailand, the 24-hour average
is less than 50 micrograms per cubic meter. The
continuous levels of dust affect health in the long
term causing chronic respiratory problems, lung
cancer, and chronic cardiovascular disease. In [9],
the authors studied how to estimate PM2.5 in
Bangkok, Thailand through the density of ozone
when the data are missing using the new estimator
under SRSWOR. In, [10], the authors also estimated
PM2.5 in Bangkok, Thailand using carbon
monoxide by utilizing the number of respondents,
sample size, and constant for estimating PM2.5. In,
[11], the author suggested utilizing the
transformation of an auxiliary variable to estimate
carbon monoxide by PM2.5 in Nan, Thailand. In,
[12], the author studied how to estimate PM2.5
through nitrogen dioxide in Chiang Rai using the
transformed combined estimators under double
sampling.
Utilizing known prior information, a class of
ratio type estimators under SRSWOR is suggested.
The MSE of the new estimators is obtained using
the Taylor series. The pollution data from Dindang
in 2019 is applied in the study using the MSE as a
criterion.
2 Proposed Estimator
Using the idea of [3], a class of ratio type estimators
utilizing the known population coefficient variation
Y has been proposed. The proposed estimator is
*
ˆ = ,
NP P
A
XD
YKy
Ax D



(9)
where

**
2
11
1
xy
PSxy
y
CC
K
KCC
C
 

,
A
and
D
are the available information for instance the
coefficient of variation of X ()
x
C, the coefficient
of the kurtosis of X

2
x
, the coefficient of the
skewness of X

2
x
, the correlation coefficient
between Xand
Y,
the inter-quartile range of
X ()
r
Q, the semi-quartile range of X ()
d
Q, the
quartile average of X ()
a
Q, median of X()
M
or
others.
The Taylor series approach is used to study the
MSE of the estimator to get Theorem 1.
Theorem 1. The approximated MSE of the
proposed estimator ˆ
N
P
Yin equation (9) for
population mean Yis
22 2
22
2
22
2
1
1
ˆ1
2
,
1
x
N
Py
y
xxy
y
WC
f
M
SE Y Y C
nC
WC W CC
C





where
A
X
W
A
XD
,

1/nN
n
.
Proof of theorem 1
Let
0/eyYY and
1/exXX then
01
0Ee Ee
,

22
01y
f
Ee C
n
,

22
11
x
f
Ee C
n
and

01 11 .
x
yyx
ff
Eee C CC
nn

 To find the
MSE of ˆ,
N
P
Y we write equation (9) in terms of e’s
then the 2
ˆ
()
NS
E
YY of the ˆ
N
P
Yis shown as below.
Nuanpan Lawson
E-ISSN: 2224-2678
539
Volume 22, 2023
The MSE of
ˆ
NP
Y
is
2
22 2
22
2
22 2
ˆˆ
()
1
1
1
2
,
1
NP NP
x
y
y
xxy
y
MSE Y E Y Y
WC
fYC
nC
WC W CC
C






Some possible proposed estimators that we
considered in this study
ˆ , i = 1, 2, ..., 10
i
NP
Y
with
A
and
D
are in Table 1.
Table 1. The estimators
ˆ , i = 1, 2, ..., 10
i
NP
Y
.
Estimator
A
D
1
*
ˆx
NP P x
XC
YKy
xC



1 x
C


2
2
*
2
ˆ
x
NP P x
X
YKy
x




1

2x


3
2
*
2
ˆx
x
NP P x
x
XC
YKy xC





2x
x
C


4
2
*
2
ˆxx
NP P xx
CX
YKy
Cx




x
C

2x
 
 
5
22
*
22
ˆxx
NP P xx
X
YKy x







1x

2x
6
*
ˆ
NP P
X
YKy
x



1
7
*
ˆ
r
NP P r
XQ
YKy
xQ



1 r
Q
8
*
ˆ
d
NP P d
XQ
YKy
xQ



1 d
Q
9
*
ˆ
a
NP P a
XQ
YKy
xQ



1 a
Q
10
*
ˆ
NP P
XM
YKy
xM



1
M
3 Efficiency Comparison
The MSE of the estimator is considered to compare
ˆ
NP
Y
with the, [3], estimator,
2
ˆ
P
Y, because the
estimator
2
ˆ
P
Y is more efficient than
1
ˆ.
P
Y
The estimator ˆNP
Y is more efficient than
2
ˆP
Yif
2
ˆˆ
() ()
NP P
MSE Y MSE Y
22 2
2222
22
1
12
11
xyx xy
yy
WC
fYCWCWCC
nC C



 

 



22
222
22
1
12
11
xyx xy
yy
C
fYCCCC
nC C



 

 



22 2 222
22
12
11
xyx xy
yy
WC
CWC WCC
CC




22
22
22
12
11
xyx xy
yy
CCC CC
CC






*
1
2
x
Sy
WC
KC
4 An Empirical Study
The pollution data from Dindang, Thailand in
January 2019, [13], are used to see how the
proposed estimators work with the real world data
over the existing estimators. PM2.5 (mpcm) is
considered as Y and carbon monoxide CO (ppm) is
considered as X in this study to see how the
estimators perform. The details of the data are as
follows:
Y 51.78,
y
C0.45, X 1.91,
x
C
0.33, and
0.4. Samples, using SRSWOR, of sizes n 15, 30,
60, 150, and 300 are taken from the population size
N
900 using the R program, [14]. Figure 1 shows
the plot between PM2.5 and CO data. We can see
that there is a positive relationship between PM2.5
and CO. The results are presented in Table 2 and
Figure 2 respectively.
Nuanpan Lawson
E-ISSN: 2224-2678
540
Volume 22, 2023
Fig. 1: A scatter plot between PM2.5 and CO in
Dindang
Table 2. The MSE of the estimators for the pollution
Data
Estimator Samplesize
15 30 60 150 300
y 35.16 17.28 8.34 2.98 1.19
ˆ
R
Y 33.40 16.42 7.93 2.83 1.13
ˆS
Y 34.70 17.17 8.32 2.98 1.19
1
ˆ
P
Y 33.22 16.37 7.91 2.83 1.13
2
ˆ
P
Y 33.18 16.36 7.91 2.83 1.13
1
ˆNP
Y 30.98 15.29 7.40 2.64 1.06
2
ˆ
NP
Y
30.01 14.84 7.19 2.57 1.03
3
ˆ
NP
Y
32.48 16.02 7.75 2.77 1.11
4
ˆ
NP
Y
32.23 15.95 7.72 2.76 1.11
5
ˆ
NP
Y
30.10 14.89 7.21 2.58 1.03
6
ˆ
NP
Y
30.66 15.14 7.32 2.62 1.05
7
ˆ
NP
Y
29.55 14.60 7.06 2.53 1.01
8
ˆ
NP
Y
30.61 15.11 7.31 2.61 1.05
9
ˆ
NP
Y
29.11 14.39 6.97 2.49 1.00
10
ˆNP
Y 29.10 14.39 6.97 2.49 1.00
From Table 2, the results illustrated that the
proposed estimators work well in this situation and
performed superior to the existing estimators. A
small sample size can lead to more errors compared
to a big sample size in general. The proposed
estimator
9
ˆ
NP
Y
utilizing the known quartile average
of X and the proposed estimator 10
ˆNP
Y utilizing the
median of Xperformed the best among other
proposed estimators. Using some available
auxiliary information can assist in increasing the
accuracy and gives fewer errors.
Fig. 2: Percentage relative efficiency of the
estimators with respect to the sample mean
estimator
Figure 2 showed that the percentage relative
efficiency of the proposed estimators that perform
the best concerning the sample mean was at least
around twenty percent more efficient at all levels of
sampling fractions. Increasing sample sizes do not
affect the efficiency of the estimators although
bigger sizes lead to a smaller MSE as shown in
Table 2 but it gave the same rate of efficiency for
the estimators. We can imply that using the
proposed estimators to conduct a small survey can
increase the performance of the estimators and
reduce time consumption and save on budget.
5 Conclusion
A general form of estimator is introduced in this
study based on prior knowledge of variables for
estimating population mean. The MSE of the new
estimator is expressed and showed the performance
via an application to air pollution data in Dindang,
Thailand. We can see that prior information helps to
increase the performance of the estimators, yielding
the least MSE. In this scenario, the best proposed
estimators that gave the lowest MSE are the ones
that use prior knowledge of the quartile average and
the median of Xfor estimating fine particulate
matter. Large sample sizes give more accurate
results when compared to smaller sample sizes in
terms of MSE. However, in terms of efficiency, we
can see that the proposed estimators can result in the
same efficiency at all levels of sampling fractions
and therefore it benefits small surveys to study the
variable of interest and can save time and financial
costs. Utilizing some known prior information could
benefit by reducing the MSE of the population mean
estimator which ultimately results in greater
efficiency. We can apply the proposed estimators
using other available prior information and it can
Nuanpan Lawson
E-ISSN: 2224-2678
541
Volume 22, 2023
also be useful in other survey designs. Nevertheless,
the application to real data can be used in future
studies by applying the new estimator so that the
variable of interest can be estimated.
Acknowledgment:
Thank you to the referees for the helpful comments.
References:
[1] Searls, D.T., The Utilization of a Known
Coefficient of Variation in the Estimation
Procedure, Journal of the American Statistical
Association, Vol. 59, 1964, pp. 1225-1226.
[2] Cochran, W.G., The Estimation of the Yields of
the Cereal Experiments by Sampling for the Ratio
of Grain to Total Produce, The Journal of
Agricultural Science, Vol.30, No.2, 1940, pp. 262
– 275.
[3] Prasad, B., Some Improved Ratio Type Estimators
of Population Mean and Ratio in Finite Population
Sample Surveys. Communication in Statistics –
Theory and Methods, Vol.18, No.1, 1989, pp. 379–
392.
[4] Soponviwatkul, K. and Lawson, N., New Ratio
Estimators for Estimating Population Mean in
Simple Random Sampling Using a Coefficient of
Variation, Correlation Coefficient and a
Regression Coefficient. Gazi University Journal of
Science, Vol. 30, No.4, 2017, pp. 610-621.
[5] Lawson, N., Ratio Estimators of Population
Means Using Quartile Function of Auxiliary
Variable Using Double Sampling, Songklanakarin
Journal of Science and Technology, Vol.41, No.1,
2019, pp. 117-122.
[6] Lawson, N., An Alternative Family of Combined
Estimators for Estimating Population Mean in
Finite Populations. Lobachevskii Journal of
Mathematics, Vol.42, No.13, 2021, pp. 3150-3157.
[7] Ponkaew, C. and Lawson, N., New Generalized
Regression Estimators Using a Ratio Method and
its Variance Estimation for Unequal Probability
Sampling Without Replacement in the Presence of
Nonresponse. Current Applied Science and
Technology, Vol.23,No.2, 2023, pp. 1-27.
[8] Lawson, N., An Improved Class of Population
Mean Estimators by Utilizing Some Prior
Information in Simple Random Sampling Using
Searl's approach, Lobachevskii Journal of
Mathematics, Vol.43, No.11, 2022, pp. 3376–
3383.
[9] Chodjuntug, K. and Lawson, N., Imputation for
Estimating the Population Mean in the Presence of
Nonresponse, With Application to Fine Particle
Density in Bangkok, Mathematical Population
Studies, Vol.29. No.4, 2022, pp. 204 – 22.
[10] Chodjuntug, K. and Lawson, N., The Chain
Regression Exponential Type Imputation Method
for Mean Estimation in the Presence of Missing
Data, Songklanakarin Journal of Science and
Technology, Vol.44, No.4, 2022, pp. 1109-1118.
[11] Thongsak, N. and Lawson, N., Bias and Mean
Square Error Reduction by Changing the Shape of
the Distribution of an Auxiliary Variable:
Application to Air Pollution Data in Nan,
Thailand, Mathematical Population Studies, 2022.
[12] Thongsak, N. and Lawson, N. Classes of
Combined Population Mean Estimators Utilizing
Transformed Variables Under Double Sampling:
An Application to Air Pollution in Chiang Rai,
Thailand, Songklanakarin Journal of Science and
Technology, 44(5), 1390-1398.
[13] Pollution Control Department, Thailand’s air
quality and situation reports. Bangkok, Thailand,
http://air4thai.pcd.go.th/webV2/history/, 2019.
[14] R Core Team, R: A language and environment for
statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. https://www.R-
project.org/, 2021.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The author contributed in the present research, at all
stages from the formulation of the problem to the
final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
This research was funded by the National Science,
Research and Innovation Fund (NSRF), and King
Mongkut’s University of Technology North
Bangkok Contract no. KMUTNB-FF-66-56.
Conflict of Interest
The author has no conflicts of interest to declare that
are relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
Nuanpan Lawson
E-ISSN: 2224-2678
542
Volume 22, 2023