Rainfall Data Fitting based on An Improved Mixture Cosine Model
with Markov Chain
THITIPONG KANCHAI1, NAHATAI TEPKASETKUL1, TIPPATAI PONGSART2,
WATCHARIN KLONGDEE1
1Department of Mathematics, Faculty of Science, Khon Kaen University,
THAILAND
2Department of Statistics, Faculty of Science, Khon Kaen University,
THAILAND
Abstract: - This article proposes a model that uses the adjusted mixture cosine model of two components with
Markov chain (MC2MC) for predicting the monthly rainfall with actual data from Khon Kaen meteorological
station (381201) in Khon Kaen province, Thailand. The data considers 31 years of historical data from January
1991 to December 2021. The evaluation is measured by the root mean square error () and the values.
We found that the mixture cosine model has  and values of 70.72 and 52.49%, respectively, and the
MC2MC model has  and values of 42.43 and 82.53%, respectively.
Key-Words: - Markov chain, mixture cosine, rainfall, imputation, missing data
Received: April 15, 2022. Revised: November 17, 2022. Accepted: December 19, 2022. Published: January 26, 2023.
1 Introduction
The rainfall data is essential for meteorological
parameters. It significantly impacts our daily lives,
causing issues with flooding and drought. One
particularly has an impact on farming. We are
currently dealing with climate change, which
impacts rainfall. Because of these, agricultural
yields are less specific, and crop insurance is made
to reduce the risk of loss in unexpected events. As a
result, analysis of rainfall is frequently carried out
for a variety of applications, including the impact of
rainfall on agricultural yields, [1], in addition, for
building crop insurance as a weather index
insurance, [2]. Crop insurance is challenging due to
the presence of missing rainfall data. For this
reason, data imputation has attracted a lot of
attention from researchers to fill in the missing
values with estimation. The traditional prediction
approaches include regression, [3], [4], [5], [6], [7],
machine learning, [8], [9], [10], and neural
networks, [11], [12], [13].
Each year, the rainfall records significantly
increase, especially in rainy periods. This behavior
repeats on a yearly basis. Therefore, the overall
characteristic of the rainfall data can be said to be a
time series with a seasonal pattern. The
characteristic of the seasonality was captured using
either sine, cosine, or mixture cosine functions.
Researchers often choose sine and cosine functions
to estimate the data that have seasonal components,
[11], [14], [15]. Moreover, the parameter vector of
the mixture cosine model can be obtained using the
differential evolution algorithm, [16].
In 1906, Markov chain was named after Andrei
A. Markov, who first published his result, [17].
Markov chain is a stochastic process of a
mathematical model in probability behavior. Many
authors have used Markov chain to improve the
model for fitting data. In 2014, Sous et al., [18],
improved Grey model (1,1) using Markov chain and
middle points matrix for forecasting gold prices. In
2019, Azizah et al., [19], proposed an application of
Markov chain for predicting rainfall data at West
Java using data mining approach. In 2021, Yutong,
[20], proposed applications of Markov chain in
weather and market share forecasts.
In this research, we propose a model that uses the
adjusted mixture cosine model of two components
with Markov chain (MC2MC) for predicting the
monthly rainfall. The rainfall data from Khon Kaen
meteorological station (381201) in Khon Kaen
province, Thailand, are chosen for illustration. Khon
Kaen is located in northeastern Thailand, as shown
in the red line in Fig. 1. The data considers 31 years
of historical data from 1991 to 2021.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.4
Thitipong Kanchai, Nahatai Tepkasetkul,
Tippatai Pongsart, Watcharin Klongdee
E-ISSN: 2224-3402
28
Volume 20, 2023
Fig. 1: Location of Khon Kaen province, Thailand.
2 Materials and Method
2.1 Data
According to the historical data series of the
monthly rainfall, we have data from January 1991 to
December 2021 that has complete data for 346
months and missing data for 26 months. Fig. 2
shows the arrangement of the data.
Fig. 2: The arrangement of the monthly rainfall data.
Next, Table 1 shows the summary of statistical
information of monthly rainfall data from January
1991 to December 2021.
Table 1. Statistic analysis of monthly rainfall data.
Variable
Details
Min
Mean
Max
Rainfall
(mm.)
Monthly rainfall
data from
January 1991 to
December 2021.
0
109.7
457.1
2.2 Mixture Cosine Model
From Fig. 2, we found that the monthly rainfall data
can be represented as a time series. Moreover, it has
a behavior like seasonal. Therefore, we shall
consider our data as a periodic function and choose
the mixture cosine model of components
formulated by
󰇡
󰇢
 
where  are real numbers, and
󰇝󰇞 represent the months
with the peaks of each year. Since the cosine
function has a period equal to 12 months, we choose


In this article, we shall estimate the
parameter vector 󰇛󰇜 as
the following procedure.
1. Consider ,
, and 
2. For each 󰇛󰇜use the
differential evolution (DE) algorithm
without crossover (population size 
and differential weight ) to estimate
the parameters,  with
minimizing the root mean square error
󰇛󰇜 given by
󰇛󰇜
󰇛󰇜

where 󰇡
󰇢
 
3. Choose 󰇛󰇜

󰇛󰇜󰇛󰇜
2.3 Adjust Mixture Cosine Model with
Markov Chain
We adjust the mixture cosine model with Markov
chain to fit the monthly rainfall data. Firstly, we
construct the transition probability matrix by the
residual error () of actual data () and predicted
data () of mixture cosine model, i.e.,
where  and is the amount of data.
We separate the residual errors into states.
Define percentile of󰇝󰇞 and 
percentile of 󰇝󰇞. The length of the interval () is
calculated by 
.
Each interval of the state is calculated as follows:
State 1 (): if 
State 2 ():
, if
State 3 ():
, if 

State ():
, if 󰇛󰇜
󰇛󰇜.
State ():
, if
󰇛󰇜

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.4
Thitipong Kanchai, Nahatai Tepkasetkul,
Tippatai Pongsart, Watcharin Klongdee
E-ISSN: 2224-3402
29
Volume 20, 2023
Let  be a matrix given by,
which is the number of in state and is in
state where  are not missing data and
 Next, let be the number of data
belonging to the state such that

 
Therefore, the transition probability of moving
one step from the  state to the  state is given
by 

where. Thus, the transition
probability matrix is denoted by .
Let 󰇟 󰇠󰆒 where is the
represented value of state  given by
󰇛󰇜
for all 
Therefore, we can adjust the mixture cosine
model of components with Markov chain, shortly
called an MCmMC model, which is formulated by
where , 󰇟󰇠, and is
predicted value of the mixture cosine model.
3 Results
The mixture cosine model experiment uses 346
months of rainfall data. We determine the mixture
cosine's parameters and function using the smallest
sum square error based on the actual data. The
mixture cosine model is fitted via differential
evolution the root mean square error value as
displayed in Table 2.
We obtain the best mixture cosine model for
fitting the monthly rainfall data when 
, and .
It follows that:
 and .
We then have

󰇛󰇜

󰇛󰇜.
A comparison of the mixture cosine model with
actual data is presented in Fig. 3.
Fig. 3: Graph of mixture cosine model.
The residual error of the mixture cosine model
and actual data is shown in Fig. 4.
Fig. 4: The residual for mixture cosine model.
As mentioned in section 2.3, we separate the
residual errors into states. We have
, , and 
Each interval of the state is calculated as follows:
State 1 (): if 
State 2 (): , if 
State 3 (): , if 
State 4 (): , if 
State 5 (): , if 
State 6 (): , if 
State (): , if .
State (): , if 
The matrix of represented value for each state is
obtained by:








WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.4
Thitipong Kanchai, Nahatai Tepkasetkul,
Tippatai Pongsart, Watcharin Klongdee
E-ISSN: 2224-3402
30
Volume 20, 2023
Table 2. The fitting results in terms of the root mean square error of each and .
1
2
3
6
7
8
9
10
11
12
1
230.31
225.80
212.81
125.80
107.11
115.77
143.10
174.81
202.41
220.32
2
-
229.72
221.54
141.31
113.87
108.16
125.08
157.73
187.33
212.21
3
-
-
216.91
149.81
118.17
99.42
106.05
132.48
162.29
190.54
4
-
-
-
149.75
117.67
91.88
85.31
103.33
132.08
160.77
5
-
-
-
141.14
114.77
89.23
73.16
77.64
99.24
125.06
6
-
-
-
125.14
107.76
90.20
76.62
70.72
76.66
92.84
7
-
-
-
-
102.22
96.40
92.93
89.01
86.15
84.83
8
-
-
-
-
-
107.06
117.03
118.11
116.04
109.74
9
-
-
-
-
-
-
137.30
147.73
150.36
145.18
10
-
-
-
-
-
-
-
170.14
178.51
177.06
11
-
-
-
-
-
-
-
-
196.00
203.47
12
-
-
-
-
-
-
-
-
-
215.72
Therefore, we have the matrix () as mentioned
in Section 2.3 shown below:
 

 
 
 



The transition probability matrix is obtained by:


 
 
 
 
 
 
 
Therefore, we obtain the MC2MC model
illustrated in the equation below:

where and
󰇧
󰇛󰇜󰇨
󰇧
󰇛󰇜󰇨

Fig. 5 shows the graphs of the actual data, the
mixture cosine model, and the MC2MC model. The
lines were derived from actual data, the mixture
cosine model, and the MC2MC model using blue,
red, and purple, respectively. The -axis and -axis
of each graph in Fig. 5 are the number of the month
and amount of rainfall (mm.), respectively.
Fig. 5: Comparison of the actual data, the mixture
cosine model, and the MC2MC model.
Fig. 6: Actual and simulated rainfall for MC2MC
model.
Fig. 6 shows the actual and generated monthly
rainfall data in Khon Kaen province, Thailand. The
red and blue lines are based on actual data and the
MC2MC model, respectively. The number of
months and amount of rainfall (mm.) represent the
-axis and -axis of Fig. 6.
4 Measuring the Quality of Fitting
To evaluate the performance of a statistical learning
method on a given data set, we would like to
measure how well its predictions match the actual
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.4
Thitipong Kanchai, Nahatai Tepkasetkul,
Tippatai Pongsart, Watcharin Klongdee
E-ISSN: 2224-3402
31
Volume 20, 2023
data. The evaluation is measured by the root mean
square error and the R-square is how well the
regression model explains observed data.
The root mean square error () is defined
by

󰇛󰇜

and the R-square () is defined by:
󰇛󰇜

󰇛󰇜

where are the actual value, predicted value
of the model, and mean of actual value,
respectively, and . The  and
of the models are illustrated in Table 3.
Table 3. Evaluation value of the models
Model

Mixture cosine
70.72
52.49%
MC2MC
42.43
82.53%
Table 3 shows the performance of the mixture
cosine model and the MC2MC model for fitting the
actual data.
5 Conclusion
The proposed model uses the adjusted mixture
cosine model of two components with Markov chain
(MC2MC) for predicting the monthly rainfall data
from Khon Kaen meteorological station (381201) in
Khon Kaen province, Thailand. The data considers
31 years of historical data from January 1991 to
December 2021. We found that the mixture cosine
model has  and values of 70.72 and
52.49%, respectively, and the MC2MC model has
 and values of 42.43 and 82.53%,
respectively. According to these findings, the
MC2MC model has a 40.00% better  than the
mixture cosine model. The MC2MC model can
describe the monthly rainfall data since it has an
acceptance rate of 
The application of this work can be utilized to
anticipate the missing variables or to predict the
value of the periodic data such as annual rainfall,
daily temperature, or the number of tourists visiting
the famous place.
Acknowledgement:
The first author would like to express gratitude to
the Science Achievement Scholarship of Thailand
(SAST) for financial assistance for this paper.
References:
[1] Verón, Santiago R., Diego de Abelleyra, and
David B. Lobell, Impacts of Precipitation and
Temperature on Crop Yields in the Pampas,
Climatic Change, Vol.130, 2015, pp. 235
245.
[2] Kath, Jarrod, Shahbaz Mushtaq, Ross Henry,
Adewuyi Adeyinka, and Roger Stone, Index
Insurance Benefits Agricultural Producers
Exposed to Excessive Rainfall Risk, Weather
and Climate Extremes, Vol.22, 2018, pp. 19.
[3] S. Prabakaran, P. N. Kumar, and P. S. M.
Tarun, RAINFALL PREDICTION USING
MODIFIED LINEAR REGRESSION, ARPN
Journal of Engineering and Applied Sciences,
Vol. 12, No.12, 2017, pp. 3715-3718.
[4] J.Refonaa, M. Lakshmi, Raza Abbas, and
Mohammad Raziullha, Rainfall Prediction
using Regression Model, ijrte, Vol.8, No.2S3,
2019, pp. 543546.
[5] R. E. Chandler and H. S. Wheater, Analysis of
rainfall variability using generalized linear
models: A case study from the west of
Ireland: GENERALIZED LINEAR
MODELING OF DAILY RAINFALL, Water
Resour. Res., Vol.38, No.10, 2002, pp. 10-1-
1011.
[6] R. Coe and R. D. Stern, Fitting Models to
Daily Rainfall Data, J. Appl. Meteor., Vol.21,
No.7, 1982, pp. 10241031.
[7] N. Sethi and K. Garg, Exploiting Data Mining
Technique for Rainfall Prediction, IJCSIT,
Vol.5, No.3, 2014, pp. 39823984.
[8] C. M. Liyew and H. A. Melese, Machine
learning techniques to predict daily rainfall
amount, J Big Data, Vol.8, No.153, 2021, pp.
1-11.
[9] N. Oswal, Predicting Rainfall using Machine
Learning Techniques, Atmospheric and
Oceanic Physics, 2021, pp. 1-23.
[10] N. Salaeh et al., Long-Short Term Memory
Technique for Monthly Rainfall Prediction in
Thale Sap Songkhla River Basin, Thailand,
Symmetry, Vol.14, No.8, 2022, pp. 1-24.
[11] P. Chan Chiu, A. Selamat, O. Krejcar, K.
Kuok Kuok, E. Herrera-Viedma, and G.
Fenza, Imputation of Rainfall Data Using the
Sine Cosine Function Fitting Neural Network,
International Journal of Interactive
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.4
Thitipong Kanchai, Nahatai Tepkasetkul,
Tippatai Pongsart, Watcharin Klongdee
E-ISSN: 2224-3402
32
Volume 20, 2023
Multimedia and Artificial Intelligence, Vol.6,
No.7, 2021, pp. 39-48.
[12] Shakib Badarpura, Abhishek Jain, Aniket
Gupta, Deepali Patil, and SHREE L.R
TIWARI COLLEGE OF ENGINEERING,
Rainfall Prediction using Linear approach
Neural Networks and Crop Recommendation
based on Decision Tree, IJERT, Vol.9, No.4,
2020, pp. 394-399.
[13] R. Venkata Ramana, B. Krishna, S. R. Kumar,
and N. G. Pandey, Monthly Rainfall
Prediction Using Wavelet Neural Network
Analysis, Water Resour Manage, Vol.27,
No.10, 2013, pp. 36973711.
[14] H. Jin, Q. Shao, and S. Crimp, Daily rainfall
data infilling with a stochastic model, 23rd
International Congress on Modelling and
Simulation, Canberra, ACT, Australia, 2019.
[15] K. Mammas and D. Lekkas, Rainfall
Generation Using Markov Chain Models;
Case Study: Central Aegean Sea, Water,
Vol.10, No.7, 2018, pp. 856-866.
[16] J. Ilonen, J.-K. Kamarainen, and J. Lampinen,
Differential Evolution Training Algorithm for
Feed-Forward Neural Networks, Neural
Processing Letters, Vol.17, 2003, pp. 93-105.
[17] W. K. Ching and M. K. Ng, Markov chains:
models, algorithms and applications. New
York, N.Y: Springer, 2006.
[18] S. Sous, T. Thongjunthug, and W. Klongdee,
Gold Price Forecasting Based on the
Improved GM (1,1) Model with Markov
Chain by Average of Middle Point, KKU Sci.
J., Vol.42, No.3, 2014, pp. 693-699.
[19] A. Azizah, R. Welastika, A. N. Falah, B. N.
Ruchjana, and A. S. Abdullah, An Application
of Markov Chain for Predicting Rainfall Data
at West Java using Data Mining Approach,
IOP Conf. Ser.: Earth Environ. Sci. 303,
2019, pp. 1-10.
[20] X. Yutong, Applications of Markov Chain in
Forecast, J. Phys.: Journal of Physics:
Conference Series, 2021.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
-Thitipong Kanchai carried out the
conceptualization, investigation, methodology,
software, writing-original draft, and writing-review
& editing.
-Nahatai Tepkasetkul guides the differential
evolution in Matlab and writing-review & editing.
-Tippatai Pongsart carried out the conceptualization,
investigation, methodology, and writing-review &
editing.
-Watcharin Klongdee carried out the
conceptualization, investigation, methodology,
writing-original draft, and writing-review & editing.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.4
Thitipong Kanchai, Nahatai Tepkasetkul,
Tippatai Pongsart, Watcharin Klongdee
E-ISSN: 2224-3402
33
Volume 20, 2023
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
Conflict of Interest
The authors have no conflicts of interest to declare
that are relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
The first author would like to express gratitude to
the Science Achievement Scholarship of Thailand
(SAST) for financial assistance for this paper.