
3.3 Robust FGLS Estimator
The problems mentioned above also commonly
occur in real panel data, namely, the failure to fulfill
the assumed assumptions, or the presence of atypical
observations. Classical estimation methods of PDM,
in particular FGLS, may be seriously affected with
the presence of outliers. It’s important to detect the
existence of outliers in panel data, and the use of
robust estimation methods in that case can make a
difference in terms of the results accuracy, as they
are less affected by the outliers behaviour.
Some robust procedures have been proposed for
PDM, like [13], [14], [15], [16], where the authors
adapted robust regression methods [7] to PDM.
The methods early proposed for PDM are robust
against casewise outliers, but not against cellwise
outliers. The new robust estimation method
proposed is robust against both types of outliers, and
constitutes a good alternative to estimate PDM when
data include casewise or cellwise outliers.
In this paper, the authors propose a RFGLS
(Robust Feasible Generalized Least Squares)
estimator, resulting from the implementation of
robust procedures in the three steps that support the
process of obtaining the FGLS estimator. This
proposal results from a refinement of the method
proposed by [17], and it is a consequence of the
application of more robust methods in the RFGLS
computation process. In this case, the authors have
selected the median per variable when replacing the
identified outliers in the UBF, and have used the
cellMCD method, which is robust to cellwise and
casewise outliers.
The RFGLS algorithm can be summarized in the
following subsection:
3.3.1 Robust Feasible Generalized Least
Squares Algorithm
1. Estimate the Pooled Model parameters using
Least Trimmed Squares (LTS) estimator and
compute the residuals.
2. Estimate the errors covariance matrix using the
robust covariance matrix estimator cellMCD
applied to the residuals obtained in the previous
step.
3. Filter the original data matrix using the
univariate-and-bivariate filter (UBF).
4. Obtain the cleaned data matrix, replacing each
identified outlier by UBF in the former step by
the median of the corresponding variable.
5. Estimate the model parameters by FGLS from
the cleaned data matrix obtained at the fourth
step and using the robust estimated covariances
matrix obtained at the second step.
4 Simulation Study
For the evaluation of the performance of the
proposed robust estimator, RFGLS, the authors run a
simulation study. A data set was randomly
generated, and next suffered a contamination
process; a number of outliers was included in the
simulated samples; this was done in two distinct
ways and considering different percentages of
contamination. In the simulation settings the authors
followed the papers [13], [14] and [18].
All the calculations were carried out with the R
project [19]. The authors used the packages plm,
specific for analysing panel data, and the robustbase,
GSE and cellWise for some of the robust methods
implemented and cellwise outlier detection.
4.1 Settings
The explanatory variables values were generated
from a multivariate (dimension three) standard
normal. For the parameter vector, the values were
generated with β= (−1,0,1) and µi, according
with a N(0,1) distribution. The errors values were
generated according with a N(0,1) distribution. The
dependent variables values were obtained according
to the RE model, defined e.g. in [1].
The panel data sets were generated with 240
observations, resulting from two scenarios for the
dimensions: (N= 8 and T= 30), and (N= 12 and
T= 20).
4.2 Contamination
In the sample generation process, three different
values of percentages of contamination were
considered to insert into the samples: 0%, 5%, and
10%. The contamination process was completely
random over all observations of the panel data, and it
was introduced including outliers in two different
ways as follows:
1. the contamination is led only on y(to originate
vertical outliers), by adding to some of the y
initially generated, a term generated according
with N(50,1);
2. the contamination is led on yand x(to originate
bad leverage points), by replacing the
explanatory variables values, corresponding to
the observations already contaminated in y, by
points coming from a k-variate N(10, I)
distribution, with k= 3.
A total of M= 100 replications for each of the 10
sampling schemes was conducted, resulting in a total
of 20 scenarios and 200 runs.
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2024.23.34
Anabela Rocha, M. Cristina Miranda,
Manuela Souto De Miranda