The Development of Forecasting Models for Life Insurance Data by

Employing Time-series Analysis and Machine Learning Technique

SUPIKA HUADSRI1, WIKANDA PHAPHAN1,2

1Department of Applied Statistics, Faculty of Applied Science,

King Mongkut’s University of Technology North Bangkok,

Bangkok 10800,

THAILAND

2Research Group in Statistical Learning and Inference

King Mongkut’s University of Technology North Bangkok

Bangkok 10800

THAILAND

Abstract: - This article is conducted with the primary objective of investigating and comparing various

forecasting models, aiming to identify the optimal model for life insurance data. For this investigation, we have

employed a comprehensive dataset containing monthly direct premium data from the Thai life insurance sector,

spanning from January 2003 to December 2022. Our approach involves the development of time-series models

to forecast direct premiums, initially employing the SARIMAX framework. Subsequently, we have introduced

an additional time-series forecasting model that incorporates SVR, collectively referred to as the SVR-

SARIMAX model. The evaluation criteria used for model comparison encompass the Mean Absolute

Percentage Error (MAPE), Root Mean Square Error (RMSE), and the Coefficient of Determination (R2). The

results of our analysis demonstrate that the SARIMAX model outperforms both the SVR and SVR-SARIMAX

models, primarily due to the linear pattern in the relationship between the independent and dependent variables.

Nevertheless, it is noteworthy that the proposed SVR-SARIMAX model exhibits an improvement in prediction

accuracy compared to the standalone non-linear model (SVR), even though the linear model (SARIMAX) still

demonstrates superior accuracy.

Key-Words: - Combined Model, Hybrid Model, Support Vector Regression, SARIMAX, Time Series

Forecasting, Life Insurance Business Growth.

Received: October 8, 2023. Revised: December 15, 2023. Accepted: February 17, 2024. Published: March 22, 2024.

1 Introduction

In the contemporary context, heightened

uncertainties in people's lives underscore the

significance of life insurance, offering a means to

mitigate emerging risks. Life insurance enterprises

play a pivotal role by collecting premiums from

individuals seeking coverage, subsequently

protecting returns. Presently, life insurance

manifests in three primary types: ordinary life

insurance catering to middle to high-income earners,

industrial life insurance tailored for middle to low-

income earners, and group life insurance primarily

covering a company's employees with good

premium rates. The intricacies of life insurance

extend across five categories: term life insurance,

whole life insurance, endowment life insurance,

annuity life insurance, and investment-unit link life

insurance.

The financial sustenance of life insurance

businesses heavily relies on premiums, constituting

a substantial portion of their revenue. Consequently,

premium trends serve as key indicators reflecting

the growth of life insurance businesses. However,

the utilization of statistical data from the Office of

Insurance Commission (OIC) website, [1], presented

challenges, stemming from incomplete information

and inaccuracies in premium data over specific

years. These challenges may result from errors in

data collection processes, the data analysis program,

or other potential factors.

The growth of life insurance is mainly dependent

on the risk of insured people, [2], they analyze the

hidden correlations among variables and use them

for the risk calculation of an individual customer in

the life insurance business. Widely utilized data

mining techniques are employed, [3], to identify

fraudulent claims in auto insurance, and, [4],

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

196

Volume 23, 2024

analyze the factors affecting the demand for life

insurance using descriptive statistics and a panel

data model. In addition, [5], investigate the socio-

demographic determinants of household

expenditures on life insurance in Malaysia

employing Cragg's two-part regression model, [6],

predict policyholders' lapse decision of life

insurance contracts using the random forest and the

logistic model, [7], predicted risk in the life

insurance business with supervised learning

algorithms. In light of these reviews, statistical

methods and machine learning are interesting

methods for analyzing insurance data.

Our research methodology involves forecasting

direct premiums using time series data on monthly

life insurance business premiums in Thailand

spanning January 2003 to December 2022.

Recognizing the limitations of this data,

characterized by its chronological nature and

scarcity of data, thus this article aims to exploration

of suitable time series forecasting models and the

machine learning technique for predicting future

direct premiums in the life insurance businesses

with the ultimate goal is forecasting the growth of

the life insurance business in Thailand through

direct premiums.

2 Materials and Methods

2.1 Related Research

The ensemble machine learning models and

SARIMAX to anticipate particulate matter (PM2.5)

in Bangkok, Thailand published by [8]. Support

vector regression (SVR), XGBoost, K-nearest

neighbors, random forests, artificial neural

networks, and many more machine learning models

are included in the methodologies. The results

indicate that the random forest model obtains the

highest Pearson correlation coefficient (PCC) the

lowest mean absolute error (MAE), and the root

mean squared error (RMSE) in the training data.

Nonetheless, the prophet and gradient boosting

models outperform the other candidate models in the

test data.

A decomposition method with SARIMA and the

decomposition method with SARIMAX models

developed by [9], indicate that the decomposition

method with the SARIMAX model outperforms the

decomposition method with SARIMA with the

lowest mean absolute percentage error (MAPE).

A decomposition method with the SARIMAX

model and Artificial Neural Network (ANN), named

DEC-SARIMAX-ANN introduced by [10]. A

comparative analysis was conducted with ANN,

SARIMA, SARIMAX, DEC-SARIMA, DEC-

SARIMAX, DEC-SARIMA-ANN, and DEC-

SARIMAX-ANN. The results indicate that the

DEC-SARIMAX-ANN performs effectively and

exhibits the lowest MAPE. They conclude that the

combined model will achieve more accurate

forecasting compared to a single forecasting

method.

The daily electricity consumption in Thailand

using a multiple regression model, an ANN, and an

SVR introduced by [11], they suggested that the

SVR is the best model to forecast the daily

electricity consumption in Thailand.

2.2 Data Description and Preparation

All data were collected from online sources such as

the Office of Insurance Commission (OIC), [1] and

the Office of the Economic Development Council

and National Society. The information includes

monthly data on Thailand's life insurance business

from January 2003 to December 2022, see all

factors in Table 1. Tables 2 provide their basic

statistical measurements.

Since the dataset is time series data with

relatively limited information. Therefore, the data

will be divided into two parts for the modelling

process. The first part is the training set data, which

includes information on direct insurance premiums

in the life insurance business from January 2003 to

December 2021. The second part is the test set data,

which includes information from January 2022 to

December 2022.

The training set data will be used for creating

forecasting models, while the test set data will be

employed to evaluate the performance of these

models to identify the most suitable model for

predicting the growth of the life insurance business

in Thailand via direct premiums.

Fig. 1: Time series plots for the life insurance

(January 2003 to December 2022)

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

197

Volume 23, 2024

Figure 1 depicts a time series plot of direct

premiums for life insurance businesses in Thailand,

demonstrating multiplicative seasonal variation.

2.3 SARIMAX

The SARIMAX model, [12], [13], which stands for

Autoregressive Integrated Moving Average with

Exogenous Variable Model, is used to forecast time

series data with seasonal variation by selecting a

suitable forecasting model. This model treats the

correlation function as a subset of time series data

and handles non-stationary data by introducing

variance to convert it to a stationary state. In

addition, the SARIMAX model considers external

influences when correcting for anomalous data or

outliers. External factors, also known as exogenous

variables, are analyzed using Multiple Linear

Regression (MLR) with the following equations:

0 1 1, 2 2, ,

...

t t t k k t t

y x x x w

   

     

, (1)

here

t

w

is the stochastic residual, which is a proxy

for the variables that may affect the Y variable but is

not included in the regression model the time series

of the error (

t



) can be written in terms of the

ARIMA model as follows:

( ) ( )

( ) ( )(1 ) (1 )

s

qQ

tt

s d s D

pP

BB

wB B B B







  

. (2)

The model of SARIMAX (p, d, q)(P,D,Q)S has the

following equations :

0 1 1, 2 2, ,

...

t t t k k t

y x x x

   

    

( ) ( )

( ) ( )(1 ) (1 )

s

qQ t

s d s D

pP

BB

B B B B







  

. (3)

2.4 SVR

The Support Vector Regression (SVR), [14], is a

technique that uses support methods. Support Vector

Machine (SVM), [15], is used to analyze the

regression between input vectors and output

variables, which can be used for time series

forecasting, [16], by changing the class

classification. SVR is a method of predicting values

using SVM. The goal is to find a linear relationship

between the n-dimensional input vector

()

n

n x R

and the output variable

YR

and because SVR is

modified from SVM, therefore, the SVR regression

equation is similar to the hyperplane equation of

SVM as follows:

( ) ( )

T

f x w x b





. (4)

The coefficients w and b are estimated by

minimizing the regularized risk function.

 

2

1

11

( ) ( ), 2

N

i

R C C L f x y w

N









(5)

Or

 

1

11

( ) ( ), 2

NT

i

R C C L f x y w w

N









(6)

In this article,



is the insensitive loss function

is used to find the Hyperplane High dimensional

feature space equation by estimating the maximum

distance of data, [17].

 

( ) ,

( ), 0,

f x y

L f x y









()f x y

otherwise





.

(7)

SVR aims to find the lowest value.

2

*

1

( ) ( ) 2

N

i

R C C w





  



(8)

Using the Lagrange multipliers (



,

*



) to solve

the problem of Equation (8) with conditions

according to Equation (7). Therefore, the estimated

function from the training data set will be able to

create SVR equations to predict values. Output from

the input vector from Equation (4), where the weight

vector (

w

) is as in Equation (9).

*

1

( ) ( )

N

i i i

t

wx

  







, (9)

for

, 1,2,...,

i

x i n

.

The parameter can be considered a threshold and

plays a role in the trade-off between the empirical

risk and the flatness. Moreover,

C

and



are both

predefined and have a significant impact on the

predicting performance. Using the Lagrange

multiplier and the Karush-Kuhn-Tucker criteria will

yield the SVR function's general form:

**

1

( ) ( ) ( , )

N

i i i i i j

i

R K x x b

   



    



.

(10)

The value of

( , )

ij

K x x

equals the inner product

of vectors

i

x

and

j

x

the feature space,

()

i

x



and

()

j

x



. The Gaussian radial basis function (RBF) is

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

198

Volume 23, 2024

the most common kernel function which is written

as:

2

( , ) exp 2

ij

xx

K x x













(11)

SVR has been more widely used in research due

to its high accuracy and applicability to data with

linear and non-linear relationships. Additionally, the

SVR method has a fast processing speed and is

suitable for small data sets. Therefore, this article

selects the SVR for predicting life insurance data.

2.5 Performance Criterion

This article uses the mean absolute percentage error

(MAPE), [18], coefficient of determination (R2), and

root-mean-squared error (RMSE) to evaluate the

performance of the forecasting models and they are

as follows:

2

1

ˆ

()

n

tt

t

yy

RMSE n







,

(12)

1

ˆ

1100

ntt

it

yy

MAPE ny









, (13)

2

21

2

1

ˆ

()

1

()

n

it

tn

t

yy

R

yy











. (14)

Here

t

y

represents actual data at time t.

ˆt

y

represents the predicted value at time t.

y

represents the average of actual data.

n

represents the number of actual data.

The better model will have the highest R2 value,

lowest RMSE, and lowest MAPE.

2.6 Software Utilized

The results presented in this article were acquired

through the utilization of Python in Google Colab.

The principal functions employed in this research

are enumerated as follows:

- The pmdarima library was used for auto_arima.

- The statsmodels.tsa.statespace.sarimax library

was utilized for forecasting data with the

SARIMAX model.

- The statsmodels.api library was employed for

the estimation of various statistical models.

- sklearn.svm was used for data forecasting with

the SVR model.

- sklearn.metrics were applied for calculating

RMSE, MAPE, and R2.

3 Combined Forecast Model

A classic statistical model that can capture a linear

trend is the time-series model such as the

SARIMAX model which is used when the data has

many independent variables. However, nonlinear

patterns in time series data are captured using

machine learning algorithms like SVR. To capture

both linear and nonlinear patterns, this article then

proposes the idea of combined predictions, which

involves using a machine learning technique in

conjunction with a time-series model. To create a

combined forecast model, this combination is

achieved by choosing appropriate weights for each

forecasting technique. The combined forecast model

can be expressed using the following formula:

1

ˆ

m

j jt

j

SVM SARIMAX wy



 

(15)

This article uses the simple average method for

weighting two forecasting techniques. This method

gives each forecasting technique equal weight when

combined, hence the combined forecast model

formula becomes:

1

ˆ

2

m

jt

j

SVM SARIMA

y

X

 

(16)

Consequently, the accuracy of the combined

forecast is contingent on the average of two

forecasting values from two forecasting techniques.

For instance, at time t=1, the resultant model

obtained by combining SARIMAX and SVR is:

11

45,883, 424 47,992, 221

2

SVR SARIMAX

y





(17)

Then the combined forecast model procedure is

shown in Figure 2.

4 Results and Discussion

This article seeks to assess the efficacy of

forecasting insurance data by comparing the

Combined Model, Support Vector Regression, and

SARIMAX models. The analysis of the data was

conducted using Python in Google Colab.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

199

Volume 23, 2024

Consequently, the research findings were divided

according to the following research methodologies.

Fig. 2: Diagram for forecasting in the combined

model

Figure 3 illustrates the Pearson correlation of

each feature. Pearson correlation indicates the

strength of linear association, ranging from -1 to 1,

and illustrates that X2, X4, X6, X7, X9, X10, X11,

and X12 exhibit a high association with Y.

Meanwhile, X3 and X5 demonstrate a moderate

association with Y, and X1 and X8 show a low

association.

Table 3 displays the evaluation of forecast

accuracy for SARIMAX models applied to life

insurance data. The best-performing SARIMAX

model was SARIMAX(2, 1, 2)x(1, 0, 2, 3), with

RMSE and MAPE values of 3,508,870.19 and 5.62,

respectively, representing the lowest error in the

insurance data. Furthermore, this model achieved

the highest R2 value of 0.78 compared to others.

Notably, the differences in R2 values were minimal

among the models, indicating their competitiveness.

Thus, the SARIMAX(2, 1, 2)x(1, 0, 2, 3) model

stands out as the best choice for forecasting in the

context of insurance data using the SARIMAX

approach.

Figure 4 checks the assumption of error of the

dependent variable for time series data. It is clear

that the variance of errors is constant and the

average variance is equal to zero. Additionally, the

errors also have a normal distribution.

Table 4 reveals that the R2 values for SVR,

SARIMAX, and SVR-SARIMAX models are

0.1820, 0.7802, and 0.6007, respectively. Similarly,

the RMSE values distinctly indicate that SARIMAX

models outperform both SVR and SVR-SARIMAX

models. Table 5 and Figure 5 reported that the

predicted values of SARIMAX are closer to the

actual value than both SVR and SVR-SARIMAX

models.

Table 6 presents forecasts for actual values 12

months ahead in 2023 utilizing SVR, SARIMAX,

and SVM-SARIMAX models by each independent

variable are average over the past three months.

5 Conclusion

In conclusion, the results presented in this article

suggest that the linear forecasting model (time series

model) may be more suitable than non-linear

forecasting models (machine learning) for life

insurance data, as the independent variables exhibit

a high association with the dependent variable.

Moreover, a proposed model, combining a linear

forecasting model with non-linear forecasting

models, demonstrates an enhancement in prediction

accuracy compared to standalone non-linear models,

even though linear models still display higher

accuracy. It is conceivable that the non-linear

forecasting model may not be optimal, and there

could exist other non-linear forecasting models that

might be more suitable for life insurance data than

SVR.

For future research, it is recommended to explore

other machine learning techniques, such as the

Multilayer Perceptron and Recurrent Neural

Network, to further enhance the accuracy of

predictive models. Additionally, interval forecasting

like [19] and [20] cloud be extended.

Acknowledgement:

The authors would like to express their gratitude to

King Mongkut’s University of Technology North

Bangkok for supporting funding. Contract no.

KMUTNB-67-BASIC-16.

References:

[1] Office of Insurance Commission (OIC),

March 30, 2023. [online]. Available

https://www.oic.or.th/th/industry/statistic/.

[2] V. Hiwase and Avinash Agrawal, Review on

application of data mining in life insurance,

International Journal of Engineering &

Technology, Vol. 7, 2018, pp. 159-162.

[3] Teerawat Simmachan, Weerapong Manopa,

Pailin Neamhom, Achiraya Poothong and

Wikanda Phaphan, Detecting Fraudulent

Claims in Automobile Insurance Policies by

Data Mining Techniques, Thailand

Statistician, Vol. 21, No.3, 2023, pp. 552-568.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

200

Volume 23, 2024

[4] Sara Emamgholipour, Mohammad Arab,

Zahra Mohajerzadeh, Life insurance demand:

Middle East and North Africa, International

Journal of Social Economics, Vol. 44, 2017,

pp. 521-529.

[5] Andrew Tan, Steven Yen, Abdul Hasan, and

Kamarudin Muhamed, Demand for Life

Insurance in Malaysia: An Ethnic Comparison

Using Household Expenditure Survey Data,

Asia-Pacific Journal of Risk and Insurance,

Vol.8, No. 2, 2014, pp. 179-204.

[6] Michele Azzone, Emilio Barucci, Giancarlo

Moncayo, and Daniele Marazzina, A machine

learning model for lapse prediction in life

insurance contracts, Expert Systems with

Applications, Vol. 191, 2022, pp. 116261.

[7] Noorhannah Boodhun, and Manoj Jayabalan,

Risk prediction in life insurance industry

using supervised learning algorithms,

Complex & Intelligent Systems, Vol.4, 2018,

pp. 145–154.

[8] Patchanok Srisuradetchai, and Wararit

Panichkitkosolkul, Using Ensemble Machine

Learning Methods to Forecast Particulate

Matter (PM2.5) in Bangkok, Thailand, In

Surinta, O., Kam Fung Yuen, K. (eds) Multi-

disciplinary Trends in Artificial Intelligence,

Lecture Notes in Computer Science, Springer,

2022, pp. 204-215.

[9] Chalermrat Nontapa, Chainarong Kesamoon,

Nicha Kaewhawong, and Peerasak

Intrapaiboon, A New Time Series Forecasting

Using Decomposition Method with SARIMAX

Model, In Yang, H., Pasupa, K., Leung,

A.CS., Kwok, J.T., Chan, J.H., King, I. (eds)

Neural Information Processing,

Communications in Computer and

Information Science, Springer, 2020.

[10] Chalermrat Nontapa, Chainarong Kesamoon,

Nicha Kaewhawong and Peerasak

Intrapaiboon, A New Hybrid Forecasting

Using Decomposition Method with

SARIMAX Model and Artificial Neural

Network, International Journal of

Mathematics and Computer Science, Vol.16,

No.4, 2021, pp.1341-1354.

[11] Warut Pannakkong, Thanyaporn

Harncharnchai, Jirachai Buddhakulsomsiri,

Forecasting Daily Electricity Consumption in

Thailand Using Regression, Artificial Neural

Network, Support Vector Machine, and

Hybrid Models, Energies, Vol. 15, No. 9,

2022, pp. 3105.

[12] Yupaporn Areepong, and Rapin Sunthornwat,

Forecasting modeling of the number of

cumulative COVID-19 cases with deaths and

recoveries removal in Thailand, Science,

Engineering and Health Studies, Vol. 15,

2021, pp. 21020004.

[13] Ahmed Elshewey, Mahmoud Shams,

Abdelghafar Elhady, Samaa Shohieb,

Abdelaziz Abdelhamid, Abdelhameed

Ibrahim and Zahraa Tarek, A Novel WD-

SARIMAX Model for Temperature

Forecasting Using Daily Delhi Climate

Dataset, Sustainability, Vol. 15, No. 1, 2023,

pp. 757.

[14] Haiying Huang, Wuyi Zhang, Gaochao Deng,

James Chen, Predicting Stock Trend Using

Fourier Transform and Support Vector

Regression, In Proceedings of 2014 IEEE

17th International Conference on

Computational Science and Engineering,

Chengdu, China, 2014, pp. 213-216.

[15] Cortes Corinna, and Vapnik, Vladimir,

Support-vector networks, Machine Learning,

Vol.20, No. 3, 1995, pp. 273–297.

[16] Hanifah Muthiah, Umu Sa’adah, and Achmad

Efendi, Support Vector Regression (SVR)

Model for Seasonal Time Series Data, In

Proceedings of the Second Asia Pacific

International Conference on Industrial

Engineering and Operations Management,

Surakarta, Indonesia, 2021, pp. 3191-3200.

[17]Esperanza Gonzalo1, Zulima Muñiz, Paulino

Nieto , Antonio Sánchez and Marta

Fernández, Hard-Rock Stability Analysis for

Span Design in Entry-Type Excavations with

Learning Classifiers, Materials, Vol. 9, No.7,

2016, pp. 531.

[18] Arnaud Myttenaere, Boris Golden, Bénédicte

Grand, and Fabrice Rossi, Mean absolute

percentage error for regression models,

Neurocomputing, Vol.192, 2016, pp. 38-48.

[19] Patchanok Srisuradetchai, A Novel Interval

Forecast for K-Nearest Neighbor Time Series:

A Case Study of Durian Export in Thailand,

IEEE Access, Vol. 12, 2024, pp. 2032-2044.

[20] Patchanok Srisuradetchai, Wikanda Phaphan,

Using Monte-Carlo Dropout in Deep Neural

Networks for Interval Forecasting of Durian

Export, WSEAS Transactions on Systems and

Control, Vol. 19, 2024, pp. 10-21.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

201

Volume 23, 2024

Table 1. Description of Variables

Variable

Name

Measurement

Level

Description

Unit

Y

Ratio Scale

Number of direct premiums received from the life insurance business in Thailand at

month t

baht

X1

Ratio Scale

The policy interest rate at month t

percentage

X2

Ratio Scale

The sum insured as of month t

thousand baht

X3

Ratio Scale

The number of life insurance companies as of month t

companies

X4

Ratio Scale

The number of life insurance policies as of month t

income

X5

Ratio Scale

The gross domestic product at month t

billions of baht

X6

Ratio Scale

The advertising expenditures of life insurance companies

thousand baht

X7

Ratio Scale

The number of first-year premiums of the policy Ordinary life insurance at month t

thousand baht

X8

Ratio Scale

The number of first-year premiums of the policy Industrial life insurance as of month t

thousand baht

X9

Ratio Scale

The number of first-year premiums of the policy Group life insurance at month t

thousand baht

X10

Ratio Scale

The number of insurance premiums for the next year of the policy Ordinary life

insurance at month t

thousand baht

X11

Ratio Scale

The number of insurance premiums for the next year of the policy Industrial life

insurance as of month t

thousand baht

X12

Ratio Scale

The number of insurance premiums for the next year of the policy Group life insurance

at month t

thousand baht

Table 2. Descriptive statistics of life insurance data from 1 January 2003 to 31 December 2022

Variable Name

Range

Mean

S.D.

Min

Max

Y

619,864,190

175,497,200

154,368,300

8,764,067

628,628,300

X1

5

2

1

0

5

X2

3,428,145,960

959,894,500

791,486,800

47,732,210

3,475,878,000

X3

3

23

1

21

24

X4

4,196,430

1,403,044

989,971

109,870

4,306,300

X5

3,098

2,964

926

1,432

4,530

X6

6,660,806

1,111,564

1,033,270

18,390

6,679,196

X7

93,637,950

26,873,540

23,217,760

982,630

94,620,580

X8

1,527,540

465,044

361,715

14,967

1,542,507

X9

11,142,343

2,771,273

2,403,750

155,289

11,297,630

X10

394,847,817

103,999,500

94,079,360

5,492,579

400,340,400

X11

7,322,318

3,388,361

2,149,980

341,733

7,664,051

X12

20,195,549

5,603,258

4,558,032

340,678

20,536,230

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

202

Volume 23, 2024

Fig. 3: Correlation heatmap of each features

Table 3. Evaluating forecast accuracy of SARIMAX models

Model

RMSE

MAPE

R2

SARIMAX(2, 1, 2)x(1, 0, 2, 3)

3,508,870.19

5.62

0.78

SARIMAX(0, 1, 0)x(2, 0, 0, 6)

3,843,282.55

6.40

0.73

SARIMAX(3, 1, 2)x(2, 0, 2, 9)

3,599,253.03

5.80

0.76

Fig. 4: Standardized residual for the train “Y”

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

203

Volume 23, 2024

Table 4. Comparison of five methods for the test dataset

Criteria

Models

SVR

SARIMAX

SVM-SARIMAX

RMSE

8,138,648.99

3,508,870.19

4,729,896.98

MAPE

12.2349

5.6287

7.3210

R2

0.1820

0.7802

0.6007

Table 5. An example forecasting of 12 months in 2022

Month

Actual

Models

SVR

SARIMAX

SVR-SARIMAX

Jan

49,519,730.95

45,883,424

47,992,221

46,937,822

Feb

45,345,018.00

30,977,795

47,574,761

39,276,278

Mar

54,127,046.22

55,990,949

59,759,573

57,875,261

Apr

40,895,672.55

46,729,081

41,344,472

44,036,776

May

45,810,000.37

39,035,039

50,083,351

44,559,195

Jun

52,133,255.38

52,571,441

55,912,274

54,241,857

Jul

44,210,264.68

39,184,606

46,458,020

42,821,313

Aug

51,026,476.78

52,571,441

52,060,688

52,316,064

Sep

53,524,523.88

39,035,039

53,265,264

46,150,151

Oct

48,611,505.07

47,571,569

43,697,959

45,634,764

Nov

52,847,154.98

47,571,569

50,118,001

48,844,785

Dec

71,902,982.29

56,899,184

65,723,074

61,311,129

Fig. 5: Time Series Forecasting

Table 6. An example forecasting of 12 months in 2023

Month

Actual

Models

SVR

SARIMAX

SVR-SARIMAX

Jan

-

69,016,819

61,509,332

65,263,075

Feb

-

45,980,880

57,807,382

51,894,131

Mar

-

48,906,920

54,470,340

51,688,630

Apr

-

49,074,751

51,227,676

50,151,213

May

-

47,571,569

49,029,952

48,300,760

Jun

-

55,543,190

49,801,474

52,672,332

Jul

-

39,184,606

48,842,697

44,013,651

Aug

-

39,184,606

50,721,533

44,953,069

Sep

-

55,543,190

51,473,279

53,508,234

Oct

-

39,184,606

51,225,199

45,204,902

Nov

-

39,035,039

50,958,835

44,996,937

Dec

-

47,571,569

49,029,760

48,300,664

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

204

Volume 23, 2024

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

Both authors made equal contributions to the current

study.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

This research was funded by King Mongkut’s

University of Technology North Bangkok. Contract

no. KMUTNB-67-BASIC-16.

Conflict of Interest

The authors have no conflicts of interest to declare.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2024.23.23

Supika Huadsri, Wikanda Phaphan

E-ISSN: 2224-2880

205

Volume 23, 2024