Bootstrap Methods for Claims Reserving:
R Language Approach
ORIANA ZAÇAJ, ENDRI RAÇO, KLEIDA HAXHI, ETLEVA LLAGAMI, KOSTAQ HILA
Faculty of Mathematical Engineering and Physics Engineering,
Polytechnic University of Tirana,
Tirana, 1069,
ALBANIA
Abstract Bootstrap methods have been used by actuaries for a long time to predict future claims cash flows and
their variability. This work aims to illustrate the use of bootstrap methods in practice, taking as an example the
claims development data of the personal accident portfolio from the largest insurance company in Albania, over
a period of 10 years. It is not the objective of this work to provide a theoretical analysis of the bootstrap
methods, rather, this work focuses on highlighting the benefits of using bootstrap methods to predict the
distribution of future claims development, and estimate the standard error, for a better risk assessment of
liabilities within insurance companies. This work is divided into two well-differentiated phases: the first is to
select the theoretical probability distribution that best fits the available claims dataset. Comparison of
distributions is facilitated by the possibilities offered by the R programming languages. Both, the maximum
likelihood parameter estimation method and the chi-square goddess goodness of fit test, are used to specify the
probability distribution that best fits the data, among a family of predefined distributions. The results show that
the Gamma distribution better describes the claim development data. The next phase is to use bootstrap
methods, based on the selected distribution, to estimate the ultimate value of claims, the claims reserve, and
their standard error.
Key-Words: - Claim Reserving, Bootstrapping, R, Distribution Fitting
Received: June 17, 2021. Revised: March 19, 2022. Accepted: April 18, 2022. Published: May 20, 2022.
1 Introduction
Claim modeling is a very important part in
estimating liabilities of an insurance company. It
will give the company an estimate of the capital
needed to fulfill its obligations. Assessing better
liabilities and assets of insurance companies will
help them analyze their different insurance
portfolios, project future new products, maintain an
adequate solvency position, and estimate the need
for additional reinsurance cover.
After a claim occurs, a period is needed to reach
its final settlement. This period is known as the
development period of claim [1]. It is the aim of
claim reserving to estimate that period and the
ultimate value of the claim at the settlement date. In
addition to that, it is necessary to estimate the
variability and the Value at Risk (VaR) of the
estimated reserve.
When we analyze outstanding claims at different
points in time, we divide them into two main
groups: Claims that already have been incurred but
aren’t reported yet (IBNR), and claims that have
been reported but aren’t settled yet to the date of
calculation (RBNS). For the second group, we have
some information about the accident date and claim
reporting dates, together with estimated values at
different reporting dates. But the ultimate claim
amount and the final settlement date are still
unknown. For the first group, the only information
we can use is the past data on the development of
claims from the accident date through its reporting
date until its settlement date. In both cases, it is
necessary to use the available information on actual
and past claim data to project future cashflows of
these claims.
The most traditional method of reserving
outstanding claims, especially in long-term
business, is the Chain Ladder method [1] [2] [3]. It
is a distribution-free method that gives a point
estimator of reserves. This means that it doesn’t
give information on the risk that the estimated
reserves will differ from the real reserves.
To analyze the variability of reserves, the Mack
model [4] [5] is the most common. The method
calculates the standard error and confidence
intervals for reserves based on the estimated ones
from the chain ladder results.
Meanwhile, bootstrap techniques [4] [6] [7] are a
very good tool for predicting the distribution of
claims and claim reserves. These techniques also
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.30
Oriana Zaçaj, Endri Raço,
Kleida Haxhi, Etleva Llagami, Kostaq Hila
E-ISSN: 2224-2880
252
Volume 21, 2022
estimate the standard error of these predictions.
They are a very good alternative to the Mack model.
Both methods do not change the estimate of
claim reserves, but variability and standard error are
calculated with different assumptions. In the Mack
model [5], the distribution of the underlying data is
not predicted, but only the first two moments are
specified. In the bootstrap model [6], we calculate
the standard error as the standard deviation of the
expected cash flows that would be obtained if we
could repeat the experiment several times, from
going backward in time and repeating the claim
experience, each time evaluating the mean reserves.
Differently from the Mack model, the bootstrap
model predicts the fitting distribution of the data,
and the difference between the fitting values and the
real incurred data gives a signal of the deviation of
the actual data from the model framework [4].
We will introduce these techniques using claims
from personal accident claims in an insurance
company in Albania. We will start by finding which
distributions better fit these claims [4] [8] [9] [10].
Then we will estimate reserves and their variability
using bootstrap models [4] [6] [11] as an alternative
to the Mack Chain Ladder method [4] [5] [11] for
the claim development triangles. We will highlight
the differences between the estimation of reserves
and their variability at different levels of confidence
error on the models, so that we can find the best
assessment for those claims
2 Materials and Methods
2.1 Claim Distribution
915 claims from the personal accident portfolio of
an Albanian insurance company, incurred during the
period 2005 2021, were taken into consideration.
The volume of claims varies between Albanian lek
(ALL) 5,000 to ALL 3,000,000. The average value
of these claims was ALL 578,140. Fig. 1 shows the
empirical density of these claims. There is a
skewness (1.59) that results in a tail on the right in
the empirical density of the claims.
Fig. 1: The empirical density of claims incurred
during years 2005 – 2021
Different theoretical distribution functions [4] [8]
[9] [10] were used to find the best distributions that
fit the amounts of claims. Fig. 2 and Fig. 3 below,
show the first impression of which distribution fits
best our data.
Fig. 2: Fitting Distributions to claims data
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.30
Oriana Zaçaj, Endri Raço,
Kleida Haxhi, Etleva Llagami, Kostaq Hila
E-ISSN: 2224-2880
253
Volume 21, 2022
Fig. 3: Empirical density vs theoretical density functions
According to the results indicated in Fig. 3, the
Gamma distribution is the closest distribution to the
available data.
Moreover, if we go through the distribution
parameters for each individual theoretical
distribution as shown in Table 1, it results that
Gamma distribution has one of the lowest Akaike
Information Criterion (AIC), the one of the lowest
square error, and the lowest Bayesian Information
Criterion (BIC) coefficient among other
distributions. Although the AIC for the gamma
distribution is not the lowest, it is much lower than
that for other distributions that have the closest BIC
or sum-square error.
Table 1. Estimation results of statistics from
different theoretical distributions for claim amounts
Distribution
Sum square
error
AIC
Gamma
2.607959 10-12
3160.272483
Burr
3.221322 10-12
3219.839974
Beta
3.800095 10-12
3132.843308
Normal
2.055142 10-11
3230.413043
Lognormal
2.465234 10-11
3569.305206
Exponential law
3.642174 10-12
3144.688824
Exponential
4.606323 10-12
3171.578434
Power law
7.580380 10-12
3047.870805
Cauchy
1.136920 10-11
3308.384372
If we perform the Chi–squared test from R [12]
as in
Table 2, the closest distribution fitting our data is
the Gamma distribution.
Table 2. Results of the Chi-squared test
Goodness-of-fit
statistics
Gamma
Weibull
Lognormal
Kolmogorov-Smirnov
0.04643869
0.04765527
0.07975269
Cramer-von Mises
0.52057243
0.56061974
1.06145749
Anderson-Darling
3.61644563
3.81781521
8.55914819
From all graphical and statistical tests, the
Gamma distribution results as the best fit for the
distribution of claim data. We expect that this will
lead to the same estimates as the standard
distribution-free chain ladder method.
2.2 Claim Reserves
For the same set of data, we take into consideration
the two models: the Mack model [5] and the
bootstrap model [6].
After calculating the reserves with the chain
ladder method, we will estimate and analyze the
variability of those reserves with both models
2.2.1 Mack Model
The triangle of cumulative incurred claims and
incremental claims in thousands of Albanian lek
during the period 2012 - 2021 is shown in Table 3
and Table 4.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.30
Oriana Zaçaj, Endri Raço,
Kleida Haxhi, Etleva Llagami, Kostaq Hila
E-ISSN: 2224-2880
254
Volume 21, 2022
Table 3. Incurred and reported claim amounts from the year of the accident to the reporting year (incremental
data) in thousands of Albanian lek
Year
1
2
3
4
5
6
7
8
9
10
2012
35,132
2,437
691
1,512
128
150
84
13
3
1
2013
17,034
5,008
41
926
610
346
267
150
27
2014
18,874
1,336
600
384
320
208
164
106
2015
26,956
3,211
876
691
140
175
146
2016
50,844
10,915
552
389
300
129
2017
46,010
6,515
257
103
180
2018
31,274
7,540
1,841
386
2019
47,524
8,915
1,206
2020
56,289
8,455
2021
65,901
Table 4. Incurred and reported claim amounts from the year of the accident to the reporting year (cumulative
data) in thousands of Albanian lek
Year
1
2
3
4
5
6
7
8
9
10
2012
35,132
37,569
38,260
39,772
39,900
40,050
40,134
40,147
40,150
40,151
2013
17,034
22,042
22,083
23,009
23,619
23,965
24,232
24,382
24,409
2014
18,874
20,210
20,810
21,194
21,514
21,722
21,886
21,992
2015
26,956
30,167
31,043
31,734
31,874
32,049
32,195
2016
50,844
61,759
62,311
62,700
63,000
63,129
2017
46,010
52,525
52,782
52,885
53,065
2018
31,274
38,814
40,655
41,041
2019
47,524
56,439
57,645
2020
56,289
64,744
2021
65,901
Table 5 gives the prediction of the claim
development from the accident date to its final
settlement. The predicted values in Table 5 are
obtained from the incurred values as the average
between every two subsequent periods [3]. For
example, the value ALL 22,002 of the claim
incurred in year 2014 predicted to accumulate in
year 2022, is calculated as the ratio between the
column 9 and column 8, then added to the claim
incurred in year 2014 and accumulated to the year
2021
22,002=21,992*(1+(40,150+24,409)/(40,147+24,382))
The models used in Table 6 show the
summarizing results for the ultimate value of the
claims, the claim reserve, and the variability of
reserves using the Mack model. The ultimate value
of the claims and the claim reserve in Table 6 is
obtained from Table 5. We calculate the value of the
claim reserve for each accident year as the
difference between the ultimate claim value with the
reported value for that accident year in Table 5. For
example, the claim reserve value of ALL 3,803,519
in Table 6 for the accident year 2020 is calculated as
the difference (68,547,057 – 64,743,538) in Table 5.
As a result, the estimated ultimate claims, using the
Chain Ladder technique, is ALL 488,105,288,
giving an estimate of the claim reserve ALL
23,832,820.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.30
Oriana Zaçaj, Endri Raço,
Kleida Haxhi, Etleva Llagami, Kostaq Hila
E-ISSN: 2224-2880
255
Volume 21, 2022
Table 5. Ultimate claims in thousands of Albanian Lek
Year
1
2
3
4
5
6
7
8
9
10
2012
35,132
37,569
38,260
39,772
39,900
40,050
40,134
40,147
40,150
40,151
2013
17,034
22,042
22,083
23,009
23,619
23,965
24,232
24,382
24,409
24,411
2014
18,874
20,210
20,810
21,194
21,514
21,722
21,886
21,992
22,002
22,004
2015
26,956
30,167
31,043
31,734
31,874
32,049
32,195
32,295
32310
32,313
2016
50,844
61,759
62,311
62,700
63,000
63,129
63,483
63,681
63,711
63,716
2017
46,010
52,525
52,782
52,885
53,065
53,362
53,662
53,829
53,854
53,858
2018
31,274
38,814
40,655
41,041
41,339
41,570
41,804
41,934
41,954
41,957
2019
47,524
56,439
57,645
58,590
59,015
59,345
59,678
59,865
59,892
59,897
2020
56,289
64,744
65,973
67,054
67,540
67,919
68,300
68,513
68,545
68,550
2021
65,901
76,753
78,210
79,491
80,068
80,517
80,969
81,221
81,259
81,265
Table 6. Process and parameter variables
Year
Ultimate
Reserve
Process SD
CV
Parameter SD
CV
Total SD
CV
2012
40,151,086
0
2013
24,410,145
958
4,908
513%
3,827
400%
6,224
650%
2014
22,003,293
10,907
20,038
184%
11,889
109%
23,299
214%
2015
32,311,438
116,832
101,619
87%
62,763
54%
119,439
102%
2016
63,713,298
584,089
221,294
38%
175,071
30%
282,172
48%
2017
53,856,734
791,433
291,280
37%
186,322
24%
345,775
44%
2018
41,955,234
914,185
408,629
45%
197,405
22%
453,813
50%
2019
59,894,530
2,249,596
891,103
40%
446,058
20%
996,510
44%
2020
68,547,057
3,803,519
1,228,770
32%
618,388
16%
1,375,602
36%
2021
81,262,472
15,361,301
3,583,797
23%
1,656,889
11%
3,948,276
26%
Total
488,105,288
23,832,820
3,931,805
16%
2,537,855
11%
4,679,722
20%
From Table 6, we can also observe that we have
a coefficient of variation (CV) of 20% and a
standard deviation (SD) of 4,679,722, showing a
considerable variation from the estimated reserve.
The coefficient of variation is higher during the first
calendar years due to the low value of reserves. The
standard error increases during years 2019 2021,
which is clearly observed during year 2021. These
results can be summarized through graphs, as shown
in Fig. 4 and Fig. 5.
Fig. 4: Chain ladder developments by origin period
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.30
Oriana Zaçaj, Endri Raço,
Kleida Haxhi, Etleva Llagami, Kostaq Hila
E-ISSN: 2224-2880
256
Volume 21, 2022
Fig. 5: Development of reserve and its standard error for the Mack model
From the graphs in Fig. 4 and Fig. 5, there are no
trends in the residuals of the Mack model. As it is
shown in Table 6 and Fig. 4, the standard error (SE)
is slightly visible during period 2018 2020, and
more noticeable during the last year (2021). The
confidence intervals for the estimated reserve at
different levels, result as in Table 7:
Table 7. Confidence intervals for claim reserves at
different confidence levels
Confidence level
Confidence interval
0.75
(18,451,140: 29,214,501)
0.95
(14,660,565: 33,005,075)
0.99
(11,782,536: 35,883,105)
0.995
(10,682,801: 36,982,839)
2.2.2 Bootstrapping
The bootstrap technique with 999 simulations is
used to the incremental claim data incurred during
the period 2012 2021 from Table 3 and Table 4.
Gamma as the best fitting distribution to the dataset
was used for process distribution.
After bootstrapping, the results compared with
Mack model are shown in Table 8. The reserves
calculated with bootstrapping (ALL 24,802,264) are
slightly higher than the chain ladder reserve
(24,802,264 23,832,820)/23,832,820 = 0.0406 =
4%. The standard error and variation from
bootstrapping are much higher than Mack during
period 2012 2018, but it is lower during year
2021.
Table 7. Data Summary for Mack and Bootstrap
Year
Mean IBNR
(Mack)
Mean IBNR
(bootstrap)
Mack SE
Bootstrap SE
CV
(Mack)
CV (Bootstrap)
2012
-
-
2013
958
2,864
6,224
57,658
650%
2013%
2014
10,907
16,692
23,299
152,148
214%
912%
2015
116,832
131,812
119,439
315,712
102%
240%
2016
584,089
632,954
282,172
722,245
48%
114%
2017
791,433
869,364
345,775
774,275
44%
89%
2018
914,185
947,746
453,813
767,830
50%
81%
2019
2,249,596
2,429,705
996,510
1,271,573
44%
52%
2020
3,803,519
4,000,632
1,375,602
1,503,605
36%
38%
2021
15,361,301
15,770,495
3,948,276
3,272,987
26%
21%
Total
23,832,820
24,802,264
4,679,722
4,983,026
20%
20%
Graphically, the result of the bootstrapping is
shown in Fig. 6. The best fitting distribution result is
the gamma distribution with parameters α=25.36386
and β=1.015098 10-03 Apart from the year 2021,
there is not much difference between the real and
simulated values.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.30
Oriana Zaçaj, Endri Raço,
Kleida Haxhi, Etleva Llagami, Kostaq Hila
E-ISSN: 2224-2880
257
Volume 21, 2022
Fig. 6: Summarizing Results for the Bootstrap model
To calculate the Value at Risk (VaR), we use the
bootstrap IBNR quantiles at 75%, 90%, 95%, and
99.5% as in Table 9. The highest VaR is recorded in
the last year (2021).
Table 8. Value at Risk (VaR) at different confidence
intervals
Year
IBNR75%
IBNR95%
IBNR99%
IBNR99.5%
2012
-
-
-
-
2013
0
7,376
152,917
275,180
2014
530
128,366
493,993
636,302
2015
184,111
793,624
1,297,922
1,471,080
2016
998,648
1,990,579
2,993,631
3,348,337
2017
1,267,387
2,259,353
3,057,382
3,455,701
2018
1,383,022
2,448,951
3,683,693
4,028,342
2019
3,113,885
4,728,319
5,780,345
6,505,809
2020
4,973,068
6,814,957
8,390,076
8,860,049
2021
17,754,400
21,063,575
24,601,077
25,908,415
Total
29,675,051
40,235,101
50,451,035
54,489,215
4 Conclusion
Claims distribution and claims reserving play a very
important role in the solvency and operations of an
insurance company. Actuaries must be able to assess
risk reserve to the level of prudency required from
the legal framework and from their own risk
assessment criteria within the insurance company.
The aim of this study is divided into two parts.
The first one is to analyze the empirical data and to
find the best distribution. When fitting techniques
with several theoretical distributions and using
different diagnostic techniques with the help of the
statistical language it resulted that the gamma
distribution was the best fitting distribution to the
claim data.
The second is to analyze and compare results of
claim reserves and their variability, to the Mack
chain ladder model, before and after introducing
bootstrap techniques, with real data from an
Albanian insurance company. With the Mack
model, the value of claim reserves is the same as in
the chain ladder method. The coefficient of
variability is 20% which shows a considerable
variability of the estimated claim reserves. When
applying the bootstrap technique with 999 iterations,
the gamma distribution was used to fit the
distribution of claim reserves, as it resulted in the
best fitting among other theoretical distributions.
With the bootstrapping model, it resulted that the
estimation of reserves differs only by 4% from the
claim reserve calculated with the chain ladder
model, but the prediction errors were much higher.
This is mainly because the Mack model is a
distribution-free technique, whereas with the
bootstrap model the Gamma distribution was used
with the predictions as it resulted as the best
distribution fitting our dataset, giving a more
accurate estimate of the variability of the claim
reserves. Value at Risk was estimated at different
levels of confidence, showing higher values,
especially for the year 2021 which has to develop in
the succeeding years.
References:
[1] “Claims Reserving Manual”, The Institute of
Actuaries, London, 1989
[2] T. Verdonck, M. Van Wouwe, J. Dhaene, “A
Robustification of the Chain-Ladder Method”,
North American Actuarial Journal 13(2), 280–
298, 2009
[3] B. Weindorfer, “A Practical Guide to Use of
the Chain-ladder method for determining
technical provisions for outstanding reported
claims in non-life insurance”, University of
Applied Science of Vienna, 2012
[4] E.A. Valdez,” International Actuarial
Association, Stochastic Modeling: Theory and
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.30
Oriana Zaçaj, Endri Raço,
Kleida Haxhi, Etleva Llagami, Kostaq Hila
E-ISSN: 2224-2880
258
Volume 21, 2022
Reality from an Actuarial Perspective”, Annals
of Actuarial Science, vol. 5, no. 2, pp. 313-315,
2011
[5] T. Mack,” Distribution-free Calculation of the
Standard Error of Chain Ladder Reserve
Estimates”, ASTIN Bulletin: The Journal of the
IAA, vol.23, no.2, pp. 213 - 225,
doi:10.2143/AST.23.2.2005092, 1993
[6] P. England, R. Verrall,” Stochastic claim
reserving in general insurance”. British
Actuarial Journal, vol. 8, no. 3, pp. 443-518,
2002, doi:10.1017/S1357321700003809
[7] P. England, R. Verrall, “More on stochastic
reserving in general insurance”. GIRO
Convention, 2004
[8] M. Boenn,” fitteR: Fit Hundreds of Theoretical
Distributions to Empirical Data”, note =” R
package version 0.1.0”, 2017
[9] M.L. Delignette-Muller, C.Dutang,
fitdistrplus: An R Package for Fitting
Distributions”, Journal of Statistical Software,
vol.64, no.4, pp. 1–34, 2015
[10] S. A. Klugman, H. Panjer, G.E.Willmot, ” Loss
Models: From Data to Decisions”, Wiley Series
in Probability and Statistics, vol. 715, 3nd ed.,
2012, ISBN: 0470391332, 9780470391334,
doi:10.1002/9780470391341
[11] M. Gesmann, D. Murphy, Y. Zhang, A.
Carrato, M. Wuthrich, F. Concina, E. Dal
Moro,” Chain Ladder: Statistical Methods and
Models for Claims Reserving in General
Insurance”, note =” R package version 0.2.15”,
2002
[12] R. C. Team,” R: A Language and Environment
for Statistical Computing”, R Foundation for
Statistical Computing, Vienna, Austria,
https://www.R-project.org/
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
Oriana Zaçaj collected the statistics, analysed the
statistical data and the outputs from the statistical
tests working and comparing with similar papers.
Endri Raço executed the algorithms [9] [11] [12] to
the statistical data, finding their best distributional
features.
Kleida Haxhi, Etleva Llagami, Kostaq Hila
reviewed the results and statistical analyses
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2022.21.30
Oriana Zaçaj, Endri Raço,
Kleida Haxhi, Etleva Llagami, Kostaq Hila
E-ISSN: 2224-2880
259
Volume 21, 2022