Bootstrap Methods for Claims Reserving:

R Language Approach

ORIANA ZAÇAJ, ENDRI RAÇO, KLEIDA HAXHI, ETLEVA LLAGAMI, KOSTAQ HILA

Faculty of Mathematical Engineering and Physics Engineering,

Polytechnic University of Tirana,

Tirana, 1069,

ALBANIA

Abstract Bootstrap methods have been used by actuaries for a long time to predict future claims cash flows and

their variability. This work aims to illustrate the use of bootstrap methods in practice, taking as an example the

claims development data of the personal accident portfolio from the largest insurance company in Albania, over

a period of 10 years. It is not the objective of this work to provide a theoretical analysis of the bootstrap

methods, rather, this work focuses on highlighting the benefits of using bootstrap methods to predict the

distribution of future claims development, and estimate the standard error, for a better risk assessment of

liabilities within insurance companies. This work is divided into two well-differentiated phases: the first is to

select the theoretical probability distribution that best fits the available claims dataset. Comparison of

distributions is facilitated by the possibilities offered by the R programming languages. Both, the maximum

likelihood parameter estimation method and the chi-square goddess goodness of fit test, are used to specify the

probability distribution that best fits the data, among a family of predefined distributions. The results show that

the Gamma distribution better describes the claim development data. The next phase is to use bootstrap

methods, based on the selected distribution, to estimate the ultimate value of claims, the claims reserve, and

their standard error.

Key-Words: - Claim Reserving, Bootstrapping, R, Distribution Fitting

Received: June 17, 2021. Revised: March 19, 2022. Accepted: April 18, 2022. Published: May 20, 2022.

1 Introduction

Claim modeling is a very important part in

estimating liabilities of an insurance company. It

will give the company an estimate of the capital

needed to fulfill its obligations. Assessing better

liabilities and assets of insurance companies will

help them analyze their different insurance

portfolios, project future new products, maintain an

adequate solvency position, and estimate the need

for additional reinsurance cover.

After a claim occurs, a period is needed to reach

its final settlement. This period is known as the

development period of claim [1]. It is the aim of

claim reserving to estimate that period and the

ultimate value of the claim at the settlement date. In

addition to that, it is necessary to estimate the

variability and the Value at Risk (VaR) of the

estimated reserve.

When we analyze outstanding claims at different

points in time, we divide them into two main

groups: Claims that already have been incurred but

aren’t reported yet (IBNR), and claims that have

been reported but aren’t settled yet to the date of

calculation (RBNS). For the second group, we have

some information about the accident date and claim

reporting dates, together with estimated values at

different reporting dates. But the ultimate claim

amount and the final settlement date are still

unknown. For the first group, the only information

we can use is the past data on the development of

claims from the accident date through its reporting

date until its settlement date. In both cases, it is

necessary to use the available information on actual

and past claim data to project future cashflows of

these claims.

The most traditional method of reserving

outstanding claims, especially in long-term

business, is the Chain Ladder method [1] [2] [3]. It

is a distribution-free method that gives a point

estimator of reserves. This means that it doesn’t

give information on the risk that the estimated

reserves will differ from the real reserves.

To analyze the variability of reserves, the Mack

model [4] [5] is the most common. The method

calculates the standard error and confidence

intervals for reserves based on the estimated ones

from the chain ladder results.

Meanwhile, bootstrap techniques [4] [6] [7] are a

very good tool for predicting the distribution of

claims and claim reserves. These techniques also

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2022.21.30

Oriana Zaçaj, Endri Raço,

Kleida Haxhi, Etleva Llagami, Kostaq Hila

E-ISSN: 2224-2880

252

Volume 21, 2022

estimate the standard error of these predictions.

They are a very good alternative to the Mack model.

Both methods do not change the estimate of

claim reserves, but variability and standard error are

calculated with different assumptions. In the Mack

model [5], the distribution of the underlying data is

not predicted, but only the first two moments are

specified. In the bootstrap model [6], we calculate

the standard error as the standard deviation of the

expected cash flows that would be obtained if we

could repeat the experiment several times, from

going backward in time and repeating the claim

experience, each time evaluating the mean reserves.

Differently from the Mack model, the bootstrap

model predicts the fitting distribution of the data,

and the difference between the fitting values and the

real incurred data gives a signal of the deviation of

the actual data from the model framework [4].

We will introduce these techniques using claims

from personal accident claims in an insurance

company in Albania. We will start by finding which

distributions better fit these claims [4] [8] [9] [10].

Then we will estimate reserves and their variability

using bootstrap models [4] [6] [11] as an alternative

to the Mack Chain Ladder method [4] [5] [11] for

the claim development triangles. We will highlight

the differences between the estimation of reserves

and their variability at different levels of confidence

error on the models, so that we can find the best

assessment for those claims

2 Materials and Methods

2.1 Claim Distribution

915 claims from the personal accident portfolio of

an Albanian insurance company, incurred during the

period 2005 – 2021, were taken into consideration.

The volume of claims varies between Albanian lek

(ALL) 5,000 to ALL 3,000,000. The average value

of these claims was ALL 578,140. Fig. 1 shows the

empirical density of these claims. There is a

skewness (1.59) that results in a tail on the right in

the empirical density of the claims.

Fig. 1: The empirical density of claims incurred

during years 2005 – 2021

Different theoretical distribution functions [4] [8]

[9] [10] were used to find the best distributions that

fit the amounts of claims. Fig. 2 and Fig. 3 below,

show the first impression of which distribution fits

best our data.

Fig. 2: Fitting Distributions to claims data

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2022.21.30

Oriana Zaçaj, Endri Raço,

Kleida Haxhi, Etleva Llagami, Kostaq Hila

E-ISSN: 2224-2880

253

Volume 21, 2022

Fig. 3: Empirical density vs theoretical density functions

According to the results indicated in Fig. 3, the

Gamma distribution is the closest distribution to the

available data.

Moreover, if we go through the distribution

parameters for each individual theoretical

distribution as shown in Table 1, it results that

Gamma distribution has one of the lowest Akaike

Information Criterion (AIC), the one of the lowest

square error, and the lowest Bayesian Information

Criterion (BIC) coefficient among other

distributions. Although the AIC for the gamma

distribution is not the lowest, it is much lower than

that for other distributions that have the closest BIC

or sum-square error.

Table 1. Estimation results of statistics from

different theoretical distributions for claim amounts

Distribution

Sum square

error

AIC

BIC

Gamma

2.607959 10-12

3160.272483

-30624.153540

Burr

3.221322 10-12

3219.839974

-30424.064648

Beta

3.800095 10-12

3132.843308

-30272.875312

Normal

2.055142 10-11

3230.413043

-28742.081182

Lognormal

2.465234 10-11

3569.305206

-28568.785330

Exponential law

3.642174 10-12

3144.688824

-30318.531627

Exponential

4.606323 10-12

3171.578434

-30110.463552

Power law

7.580380 10-12

3047.870805

-29647.852603

Cauchy

1.136920 10-11

3308.384372

-29283.781086

If we perform the Chi–squared test from R [12]

as in

Table 2, the closest distribution fitting our data is

the Gamma distribution.

Table 2. Results of the Chi-squared test

Goodness-of-fit

statistics

Gamma

Weibull

Lognormal

Kolmogorov-Smirnov

0.04643869

0.04765527

0.07975269

Cramer-von Mises

0.52057243

0.56061974

1.06145749

Anderson-Darling

3.61644563

3.81781521

8.55914819

From all graphical and statistical tests, the

Gamma distribution results as the best fit for the

distribution of claim data. We expect that this will

lead to the same estimates as the standard

distribution-free chain ladder method.

2.2 Claim Reserves

For the same set of data, we take into consideration

the two models: the Mack model [5] and the

bootstrap model [6].

After calculating the reserves with the chain

ladder method, we will estimate and analyze the

variability of those reserves with both models

2.2.1 Mack Model

The triangle of cumulative incurred claims and

incremental claims in thousands of Albanian lek

during the period 2012 - 2021 is shown in Table 3

and Table 4.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2022.21.30

Oriana Zaçaj, Endri Raço,

Kleida Haxhi, Etleva Llagami, Kostaq Hila

E-ISSN: 2224-2880

254

Volume 21, 2022

Table 3. Incurred and reported claim amounts from the year of the accident to the reporting year (incremental

data) in thousands of Albanian lek

Year

1

2

3

4

5

6

7

8

9

10

2012

35,132

2,437

691

1,512

128

150

84

13

3

1

2013

17,034

5,008

41

926

610

346

267

150

27

2014

18,874

1,336

600

384

320

208

164

106

2015

26,956

3,211

876

691

140

175

146

2016

50,844

10,915

552

389

300

129

2017

46,010

6,515

257

103

180

2018

31,274

7,540

1,841

386

2019

47,524

8,915

1,206

2020

56,289

8,455

2021

65,901

Table 4. Incurred and reported claim amounts from the year of the accident to the reporting year (cumulative

data) in thousands of Albanian lek

Year

1

2

3

4

5

6

7

8

9

10

2012

35,132

37,569

38,260

39,772

39,900

40,050

40,134

40,147

40,150

40,151

2013

17,034

22,042

22,083

23,009

23,619

23,965

24,232

24,382

24,409

2014

18,874

20,210

20,810

21,194

21,514

21,722

21,886

21,992

2015

26,956

30,167

31,043

31,734

31,874

32,049

32,195

2016

50,844

61,759

62,311

62,700

63,000

63,129

2017

46,010

52,525

52,782

52,885

53,065

2018

31,274

38,814

40,655

41,041

2019

47,524

56,439

57,645

2020

56,289

64,744

2021

65,901

Table 5 gives the prediction of the claim

development from the accident date to its final

settlement. The predicted values in Table 5 are

obtained from the incurred values as the average

between every two subsequent periods [3]. For

example, the value ALL 22,002 of the claim

incurred in year 2014 predicted to accumulate in

year 2022, is calculated as the ratio between the

column 9 and column 8, then added to the claim

incurred in year 2014 and accumulated to the year

2021

22,002=21,992*(1+(40,150+24,409)/(40,147+24,382))

The models used in Table 6 show the

summarizing results for the ultimate value of the

claims, the claim reserve, and the variability of

reserves using the Mack model. The ultimate value

of the claims and the claim reserve in Table 6 is

obtained from Table 5. We calculate the value of the

claim reserve for each accident year as the

difference between the ultimate claim value with the

reported value for that accident year in Table 5. For

example, the claim reserve value of ALL 3,803,519

in Table 6 for the accident year 2020 is calculated as

the difference (68,547,057 – 64,743,538) in Table 5.

As a result, the estimated ultimate claims, using the

Chain Ladder technique, is ALL 488,105,288,

giving an estimate of the claim reserve ALL

23,832,820.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2022.21.30

Oriana Zaçaj, Endri Raço,

Kleida Haxhi, Etleva Llagami, Kostaq Hila

E-ISSN: 2224-2880

255

Volume 21, 2022

Table 5. Ultimate claims in thousands of Albanian Lek

Year

1

2

3

4

5

6

7

8

9

10

2012

35,132

37,569

38,260

39,772

39,900

40,050

40,134

40,147

40,150

40,151

2013

17,034

22,042

22,083

23,009

23,619

23,965

24,232

24,382

24,409

24,411

2014

18,874

20,210

20,810

21,194

21,514

21,722

21,886

21,992

22,002

22,004

2015

26,956

30,167

31,043

31,734

31,874

32,049

32,195

32,295

32310

32,313

2016

50,844

61,759

62,311

62,700

63,000

63,129

63,483

63,681

63,711

63,716

2017

46,010

52,525

52,782

52,885

53,065

53,362

53,662

53,829

53,854

53,858

2018

31,274

38,814

40,655

41,041

41,339

41,570

41,804

41,934

41,954

41,957

2019

47,524

56,439

57,645

58,590

59,015

59,345

59,678

59,865

59,892

59,897

2020

56,289

64,744

65,973

67,054

67,540

67,919

68,300

68,513

68,545

68,550

2021

65,901

76,753

78,210

79,491

80,068

80,517

80,969

81,221

81,259

81,265

Table 6. Process and parameter variables

Year

Ultimate

Reserve

Process SD

CV

Parameter SD

CV

Total SD

CV

2012

40,151,086

0

2013

24,410,145

958

4,908

513%

3,827

400%

6,224

650%

2014

22,003,293

10,907

20,038

184%

11,889

109%

23,299

214%

2015

32,311,438

116,832

101,619

87%

62,763

54%

119,439

102%

2016

63,713,298

584,089

221,294

38%

175,071

30%

282,172

48%

2017

53,856,734

791,433

291,280

37%

186,322

24%

345,775

44%

2018

41,955,234

914,185

408,629

45%

197,405

22%

453,813

50%

2019

59,894,530

2,249,596

891,103

40%

446,058

20%

996,510

44%

2020

68,547,057

3,803,519

1,228,770

32%

618,388

16%

1,375,602

36%

2021

81,262,472

15,361,301

3,583,797

23%

1,656,889

11%

3,948,276

26%

Total

488,105,288

23,832,820

3,931,805

16%

2,537,855

11%

4,679,722

20%

From Table 6, we can also observe that we have

a coefficient of variation (CV) of 20% and a

standard deviation (SD) of 4,679,722, showing a

considerable variation from the estimated reserve.

The coefficient of variation is higher during the first

calendar years due to the low value of reserves. The

standard error increases during years 2019 – 2021,

which is clearly observed during year 2021. These

results can be summarized through graphs, as shown

in Fig. 4 and Fig. 5.

Fig. 4: Chain ladder developments by origin period

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2022.21.30

Oriana Zaçaj, Endri Raço,

Kleida Haxhi, Etleva Llagami, Kostaq Hila

E-ISSN: 2224-2880

256

Volume 21, 2022

Fig. 5: Development of reserve and its standard error for the Mack model

From the graphs in Fig. 4 and Fig. 5, there are no

trends in the residuals of the Mack model. As it is

shown in Table 6 and Fig. 4, the standard error (SE)

is slightly visible during period 2018 – 2020, and

more noticeable during the last year (2021). The

confidence intervals for the estimated reserve at

different levels, result as in Table 7:

Table 7. Confidence intervals for claim reserves at

different confidence levels

Confidence level

Confidence interval

0.75

(18,451,140: 29,214,501)

0.95

(14,660,565: 33,005,075)

0.99

(11,782,536: 35,883,105)

0.995

(10,682,801: 36,982,839)

2.2.2 Bootstrapping

The bootstrap technique with 999 simulations is

used to the incremental claim data incurred during

the period 2012 – 2021 from Table 3 and Table 4.

Gamma as the best fitting distribution to the dataset

was used for process distribution.

After bootstrapping, the results compared with

Mack model are shown in Table 8. The reserves

calculated with bootstrapping (ALL 24,802,264) are

slightly higher than the chain ladder reserve

(24,802,264 − 23,832,820)/23,832,820 = 0.0406 =

4%. The standard error and variation from

bootstrapping are much higher than Mack during

period 2012 – 2018, but it is lower during year

2021.

Table 7. Data Summary for Mack and Bootstrap

Year

Mean IBNR

(Mack)

Mean IBNR

(bootstrap)

Mack SE

Bootstrap SE

CV

(Mack)

CV (Bootstrap)

2012

-

2013

958

2,864

6,224

57,658

650%

2013%

2014

10,907

16,692

23,299

152,148

214%

912%

2015

116,832

131,812

119,439

315,712

102%

240%

2016

584,089

632,954

282,172

722,245

48%

114%

2017

791,433

869,364

345,775

774,275

44%

89%

2018

914,185

947,746

453,813

767,830

50%

81%

2019

2,249,596

2,429,705

996,510

1,271,573

44%

52%

2020

3,803,519

4,000,632

1,375,602

1,503,605

36%

38%

2021

15,361,301

15,770,495

3,948,276

3,272,987

26%

21%

Total

23,832,820

24,802,264

4,679,722

4,983,026

20%

Graphically, the result of the bootstrapping is

shown in Fig. 6. The best fitting distribution result is

the gamma distribution with parameters α=25.36386

and β=1.015098 10-03 Apart from the year 2021,

there is not much difference between the real and

simulated values.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2022.21.30

Oriana Zaçaj, Endri Raço,

Kleida Haxhi, Etleva Llagami, Kostaq Hila

E-ISSN: 2224-2880

257

Volume 21, 2022

Fig. 6: Summarizing Results for the Bootstrap model

To calculate the Value at Risk (VaR), we use the

bootstrap IBNR quantiles at 75%, 90%, 95%, and

99.5% as in Table 9. The highest VaR is recorded in

the last year (2021).

Table 8. Value at Risk (VaR) at different confidence

intervals

Year

IBNR75%

IBNR95%

IBNR99%

IBNR99.5%

2012

-

2013

0

7,376

152,917

275,180

2014

530

128,366

493,993

636,302

2015

184,111

793,624

1,297,922

1,471,080

2016

998,648

1,990,579

2,993,631

3,348,337

2017

1,267,387

2,259,353

3,057,382

3,455,701

2018

1,383,022

2,448,951

3,683,693

4,028,342

2019

3,113,885

4,728,319

5,780,345

6,505,809

2020

4,973,068

6,814,957

8,390,076

8,860,049

2021

17,754,400

21,063,575

24,601,077

25,908,415

Total

29,675,051

40,235,101

50,451,035

54,489,215

4 Conclusion

Claims distribution and claims reserving play a very

important role in the solvency and operations of an

insurance company. Actuaries must be able to assess

risk reserve to the level of prudency required from

the legal framework and from their own risk

assessment criteria within the insurance company.

The aim of this study is divided into two parts.

The first one is to analyze the empirical data and to

find the best distribution. When fitting techniques

with several theoretical distributions and using

different diagnostic techniques with the help of the

statistical language it resulted that the gamma

distribution was the best fitting distribution to the

claim data.

The second is to analyze and compare results of

claim reserves and their variability, to the Mack

chain ladder model, before and after introducing

bootstrap techniques, with real data from an

Albanian insurance company. With the Mack

model, the value of claim reserves is the same as in

the chain ladder method. The coefficient of

variability is 20% which shows a considerable

variability of the estimated claim reserves. When

applying the bootstrap technique with 999 iterations,

the gamma distribution was used to fit the

distribution of claim reserves, as it resulted in the

best fitting among other theoretical distributions.

With the bootstrapping model, it resulted that the

estimation of reserves differs only by 4% from the

claim reserve calculated with the chain ladder

model, but the prediction errors were much higher.

This is mainly because the Mack model is a

distribution-free technique, whereas with the

bootstrap model the Gamma distribution was used

with the predictions as it resulted as the best

distribution fitting our dataset, giving a more

accurate estimate of the variability of the claim

reserves. Value at Risk was estimated at different

levels of confidence, showing higher values,

especially for the year 2021 which has to develop in

the succeeding years.

References:

[1] “Claims Reserving Manual”, The Institute of

Actuaries, London, 1989

[2] T. Verdonck, M. Van Wouwe, J. Dhaene, “A

Robustification of the Chain-Ladder Method”,

North American Actuarial Journal 13(2), 280–

298, 2009

[3] B. Weindorfer, “A Practical Guide to Use of

the Chain-ladder method for determining

technical provisions for outstanding reported

claims in non-life insurance”, University of

Applied Science of Vienna, 2012

[4] E.A. Valdez,” International Actuarial

Association, Stochastic Modeling: Theory and

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2022.21.30

Oriana Zaçaj, Endri Raço,

Kleida Haxhi, Etleva Llagami, Kostaq Hila

E-ISSN: 2224-2880

258

Volume 21, 2022

Reality from an Actuarial Perspective”, Annals

of Actuarial Science, vol. 5, no. 2, pp. 313-315,

2011

[5] T. Mack,” Distribution-free Calculation of the

Standard Error of Chain Ladder Reserve

Estimates”, ASTIN Bulletin: The Journal of the

IAA, vol.23, no.2, pp. 213 - 225,

doi:10.2143/AST.23.2.2005092, 1993

[6] P. England, R. Verrall,” Stochastic claim

reserving in general insurance”. British

Actuarial Journal, vol. 8, no. 3, pp. 443-518,

2002, doi:10.1017/S1357321700003809

[7] P. England, R. Verrall, “More on stochastic

reserving in general insurance”. GIRO

Convention, 2004

[8] M. Boenn,” fitteR: Fit Hundreds of Theoretical

Distributions to Empirical Data”, note =” R

package version 0.1.0”, 2017

[9] M.L. Delignette-Muller, C.Dutang, ”

fitdistrplus: An R Package for Fitting

Distributions”, Journal of Statistical Software,

vol.64, no.4, pp. 1–34, 2015

[10] S. A. Klugman, H. Panjer, G.E.Willmot, ” Loss

Models: From Data to Decisions”, Wiley Series

in Probability and Statistics, vol. 715, 3nd ed.,

2012, ISBN: 0470391332, 9780470391334,

doi:10.1002/9780470391341

[11] M. Gesmann, D. Murphy, Y. Zhang, A.

Carrato, M. Wuthrich, F. Concina, E. Dal

Moro,” Chain Ladder: Statistical Methods and

Models for Claims Reserving in General

Insurance”, note =” R package version 0.2.15”,

2002

[12] R. C. Team,” R: A Language and Environment

for Statistical Computing”, R Foundation for

Statistical Computing, Vienna, Austria,

https://www.R-project.org/

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

Oriana Zaçaj collected the statistics, analysed the

statistical data and the outputs from the statistical

tests working and comparing with similar papers.

Endri Raço executed the algorithms [9] [11] [12] to

the statistical data, finding their best distributional

features.

Kleida Haxhi, Etleva Llagami, Kostaq Hila

reviewed the results and statistical analyses

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2022.21.30

Oriana Zaçaj, Endri Raço,

Kleida Haxhi, Etleva Llagami, Kostaq Hila

E-ISSN: 2224-2880

259

Volume 21, 2022