Improving Visual Object Detection using General UFIR Filtering
ELI G. PALE-RAMON1, LUIS J. MORALES-MENDOZA2, OSCAR G. IBARRA-MANZANO1,
RENE FABIÁN VÁZQUEZ-BAUTISTA2, JORGE A. ORTEGA-CONTRERAS1,
YURIY S. SHMALIY1
1Department of Electronics Engineering
Universidad de Guanajuato
Salamanca, 36855
MEXICO
2Faculty of Electronics and Telecommunications Engineering
Universidad de Guanajuato
Poza Rica
MEXICO
Abstract: Object detection is a fundamental task in computer vision, which involves the identification and local-
ization of objects within image frames or video sequences. The problem is complicated by large variations in the
video camera bounding box, which can be thought of as colored measurement noise (CMN). In this paper, we use
the general unbiased finite impulse response (GUFIR) approach to improve detection performance under CMN.
The results are compared to the general Kalman filter (GKF) and two detection methods: “Faster-RCNN” and
“Tensorflow PASCAL Visual Object Classes (VOC)”. Experimental testing is carried out using the benchmark
data ”Car4”. It is shown that GUFIR significantly improves the detection accuracy and demonstrates the proper-
ties of the effective tool for visual object tracking.
Key-Words: Object detection, colored measurement noise, precision, relative error, estimation.
Received: March 17, 2024. Revised: August 11, 2024. Accepted: September 8, 2024. Published: November 13, 2024.
1 Introduction
Object detection is a key task in computer vision, [1],
[2], [3], [4], [5], that involves identifying objects
and their locations in image frames or video se-
quences, [6], [7], [8], [9]. The problem arises in var-
ious research areas, including autonomous driving,
surveillance, medical imaging, and robotics, among
others, [10], [11], [12], [13]. The annoying thing is
that discrepancies usually arise between the estimated
positions of objects and the truth, primarily due to en-
vironmental factors, [14], [15], [16], [17]. Since dis-
crepancies are not white, they can be treated as col-
ored measurement noise (CMN), making accurate de-
tection difficult, [18], [19], [20], [21], [22].
The goal is thus to discusses the potential for re-
fining object detections through post-processing us-
ing filtering methods to improve detection or track-
ing accuracy under CMN. We do it by using the un-
biased finite impulse response (UFIR) filtering ap-
proach, [23], [24]. We use the general Kalman fil-
ter (GKF) as a benchmark and perform a comparative
analysis with two widely-used detection tools called
“Faster R-CNN in CNKT” and “Tensorflow PASCAL
Visual Object Classes (VOC)”, [25], [26], [27], [28],
[29], [30]. Testing is provided using video sequences,
utilizing classic methods of object initialization and a
combination of region labeling and contour search for
object detection. The object position is represented
by a bounding box (BB), whose coordinates serve as
inputs for filtering algorithms. For the “Faster R-
CNN in CNKT” and “Tensorflow PASCAL VOC” al-
gorithms, the Visual Object Tagging Tool (VOTT) is
employed. The VOTT facilitates the gathering of BB
values from each frame. These assets are then con-
verted into the “Faster R-CNN in CNKT” and “Ten-
sorflow PASCAL VOC” formats for model training
and obtaining object detection information.
2 Object Detection Process
The object detection process utilizes information ex-
tracted from images or video sequences to identify
and locate objects. Pose information used with this
aim is collected in the video camera BB, [31], [32],
which is one of the most important tasks, [33], [34],
[35]. The process starts with region labeling, divides
the image into regions and identifies boundaries be-
tween them. Object is then described through the
properties, requiring the extraction of parameters and
properties for representation. Next, an object detec-
tion algorithm is applied to analyze the image and
predict the location of objects in the scene. Post-
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2024.20.7
Eli G. Pale-Ramon, Luis J. Morales-Mendoza,
Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,
Jorge A. Ortega-Contreras, Yuriy S. Shmaliy
E-ISSN: 2224-3488
60
Volume 20, 2024
processing techniques, like detection refinement and
filtering algorithms such as Kalman, can be used to
improve detection precision, [36], [37], [38], [39],
[40]. Finally, the results of object detection are eval-
uated for the ground truth data. Evaluation metrics
include precision, Root Mean Square Error (RMSE),
and Center of the Rectangle (CoR), among others,
[41], [42].
2.1 Bounding Box Information
The BB is a rectangular frame that describes the loca-
tion of a detected object within an image or a video
sequence. The BB is represented by four coordi-
nates: the x-coordinate and y-coordinate of the top-
left corner of the box, and the width and height of the
box, [43]. The BBs are commonly used to locate ob-
jects and provide spatial information about their posi-
tions.
2.2 Ground truth
The ground truth (GT) is the actual, real, or correct ob-
ject position in a scene, [44]. The GT can be obtained
by manual annotation, through a reference algorithm,
or by automatic annotation using specialized software
tools. Generating a reliable and valid reference anno-
tation can be a time-consuming and complicated pro-
cedure. The ground truth includes information such
as the object locations, and their coordinates in the
image or video sequence. The GT serves as a bench-
mark for evaluating the accuracy, and effectiveness
of detection. It is also used to evaluate tracking algo-
rithms, [45].
3 Performance Evaluation
The detection performance can be evaluated using the
precision metric, RMSE, and CoR, [46]. The RMSE
is a measure of the variation between truth values and
estimated values, [47]. It is computed by taking the
square root of the average of the squared differences
between each truth value yiand its corresponding es-
timated value ˆyifor N(observations), where irep-
resents the i-th measurement out of Ntotal observa-
tions. Precision is a measure of how well the esti-
mated positions align with the ground truth, [42], [48].
Precision can be quantified using intersection over
union (IoU), which indicates the percentage of over-
lap of the predicted BB over the true BB (TBB). To
compute the precision, it needs comparing the IoU re-
sults with an established threshold, [42], [48], [49],
[50], which can be done as follows:
IoU =IA(TBB EBB)IA (1)
Precision =ȈTPTP + ΣFP (2)
where IA is the area of intersection between the BB
of the target object, the TBB, and the estimated BB
(EBB). Also TP is true positive and FP is false posi-
tive. The center of the rectangle is a metric to measure
the distance between the estimated and ground truth
BB (GT BB). This metric determines whether the de-
tection is true or false. It is positive when the center of
EBB is within the geometrical limit of the GT BB. The
results can be presented by the percentage of CoR of
estimated BB that is within the ground truth BB, [41].
The estimation error in object detection indicates the
difference between the EBB and the ground truth po-
sition. The estimation error can be measured using
diverse metrics. The relative error is a statistical met-
ric, which is calculated as the result of the difference
between the estimated value and GT value divided by
the GT value, [51], and is often expressed in percent-
age.
4 State-Space Model with
Measurement Disturbances
Input data are the object position information saved
in the BB. Since these data are heavily disturbed by
the environment, we consider the measurement dis-
turbances as CMN. The object coordinates are stored
at the BB coordinates Xc,Yc,Xw, an Yh. To detect
the object in the discrete-time index n, we need mea-
surements of the BB coordinates at every n. There-
fore, we represent the object dynamics in discrete-
time state-space by the following state and observa-
tion equations:
xn=F xn1+Bwn,(3)
vn= Ψvn1+ξn,(4)
yn=Hxn+vn,(5)
where xnRK,K= 8, is the partitioned state vector
xn=XT
cYT
cXT
wYT
hT,(6)
in which the vector components are defined by
Xc=xc
Vxc, Yc=yc
Vyc, Xw=xw
VXw, Yh=yh
Vyh.
The correspondent velocities VXc, VYc, VXw, and VYh
are considered constant, following, [52]. The system
matrix Fis a block diagonal with the components
¯
F=1τ
0 1 ,(7)
where the block is repeated for the ”x”,”y”, ”width”,
and ”height” spatial dimensions, and τis the sampling
time. The system noise matrix ¯
Bis defined for each
of the states by
¯
B=τ2
2
τ,(8)
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2024.20.7
Eli G. Pale-Ramon, Luis J. Morales-Mendoza,
Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,
Jorge A. Ortega-Contreras, Yuriy S. Shmaliy
E-ISSN: 2224-3488
61
Volume 20, 2024
ad the observation matrix is
H=
10000000
00100000
00001000
00000010
.(9)
We suppose that the object noise wn N (0, Q)
R4,
wn= [wxcnwycnwxwnwyhn]T,
has the known covariance Q. Since the measurement
data are heavily affected by CMN, we treat the mea-
surement noise vnR4,
vn= [vxcnvycnvxwnvyhn]T,
as colored and represent by the Gauss-Markov
model (4), where the components in the diagonal col-
oredness factor matrix
Ψ = diag [ψxcψycψxwψyh]
are chosen such that vnremains stationary. The driv-
ing zero mean white Gaussian noise ξn N (0, R)
R4in (4),
ξn= [ξxcnξycnξxwnξyhn]T,(10)
has the known covariance R. The noise vectors wn
and ξnare mutually uncorrelated, so the property
E{wnξT
k}= 0 holds for all nand k.
The standard UFIR and KF algorithms cannot be
applied to models with CMN. Therefore, we first con-
vert the model (3)–(5) to the standard form with white
Gaussian noise components. To this end, we will
use the measurement differencing approach proposed
in [53], in the form developed for the Euler forward
method-based state model (3) in [54].
4.1 State-Space Model Transformation
To avoid CMN in the observation equation (5), we use
the measurement differencing approach, [53], [55],
and introduce a new observation vector zn=yn
Ψyn1as
zn=Hxn+vnΨHxn1Ψvn1,(11)
zn=¯
Hxn+ ¯vn,(12)
where ¯
H=HΓ,Γ = ΨHF 1, and ¯vn= ΓBwn+
ξn. In the new observation equation (12), the noise ¯vn
is white Gaussian with the properties:
E{¯vn¯vT
n}= ΓΦ + R , (13)
E{¯vnwT
n}= ΓBQ , (14)
E{wn¯vT
n}=QBTΓT,(15)
where Φ = BQBTΓT. So, ¯vnand wnare time-
correlated.
To implement the robust UFIR filter and optimal
KF for the new state space model (3) and (12), we
need new bias correction gains for time-correlated
¯vnand wn. Our transformations will be based on
the following measures: ˆx
nˆxn|n1is the a pri-
ori estimate, ˆxnˆxn|nis the a posteriori esti-
mate, ϵ
n=xnˆx
nis the a priori estimation er-
ror, ϵn=xnˆxnis the a posteriori estimation error,
P
nPn|n1=E{ϵ
nϵT
n}is the a priori error co-
variance, and PnPn|n=E{ϵnϵnT}is the a poste-
riori error covariance.
5 General Filters for CMN
In this section, we will follow, [24], and develop the
GKF and GUFIR algorithms for CMN using mea-
sured information contained in BBs.
5.1 General KF for CMN
There are two options to develop GKF under time-
correlated wnand ˆvn, [24]: 1) derive a new bias cor-
rection gain or 2) de-correlate the noise vectors. Since
the ultimate algorithms are equivalent and do not have
significant advantages over each other, we will base
our developments on the first option, which implies a
new bias correction gain. A pseudo code of the GKF
developed for object detection under Gauss-Markov
CMN with time-correlated wnand ˆvnis listed as Al-
gorithm 1, [21], [24]. In the predict phase, this algo-
Algorithm 1: GKF for Object Detection un-
der CMN with Time-Correlated wnand ¯vn
Data: yn,ˆx0,P0,Q,R
Result: xn,Pn
1begin
2Γ = ΨHF 1;¯
H=HΓ;
Φ = BQBTΓT;
3for n= 1,2,· · · do
4zn=ynΨyn1;
5P
n=F Pn1FT+BQBT;
6Sn=¯
HP
n¯
HT+R+HΦ + ΦT¯
HT;
7Kn= (P
n¯
HT+ Φ)S1
n;
8ˆx
n=Fˆxn1;
9ˆxn= ˆx
n+Kn(zn¯
Hˆx
n);
10 Pn= (IKn¯
H)P
nKnΦT;
11 end for
12 end
rithm computes the a priori state estimate ˆx
nand the
a priori state estimation error covariance P
n. In the
update phase, it uses the new observation equation zn,
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2024.20.7
Eli G. Pale-Ramon, Luis J. Morales-Mendoza,
Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,
Jorge A. Ortega-Contreras, Yuriy S. Shmaliy
E-ISSN: 2224-3488
62
Volume 20, 2024
computes the innovation covariance Sn, and updates
the bias correction gain Kn, the a posteriori estimate
ˆxn, and the a posteriori error covariance Pn. The sub-
sequent minimization of the trace of Pnby Kngives
the optimal gain Knfor the GKF. Note that the zero
coloredness factor Ψ=0makes Γ=0and Φ=0
and converts GKF to the standard KF.
5.2 General UFIR Filter for CMN
Unlike GKF, the general UFIR (GUFIR) filter does
not require any prior knowledge about noise, ex-
cept for the zero mean assumption, and initial val-
ues. Therefore, wnand ¯
vncan be ignored in the
model (3) and (12). This means that the GUFIR is in-
variant to time-correlation between wnand ¯vn, [24],
[56]. The GUFIR filter, being of the FIR type, op-
erates without the feedback. The unbiasedness con-
dition assumes orthogonality between the linear esti-
mator and the observation. In this sense, the struc-
ture of GUFIR resembles the Gaussian least squares.
Moreover, a GUFIR filter does not require initial con-
ditions, [57]. However, the GUFIR filter cannot ig-
nore CMN, which violates the zero-mean assumption
at short horizons, [24]. Also, the GUFIR filter pro-
cesses data over the averaging horizon [m, n]of N
points, from m=nN+ 1 to n, and minimizes the
MSE when the horizon is set optimally as Nopt, [58].
Note that the FIR filter theory is given in [24].
A pseudo code of the GUFIR filter developed for
CMN in [21], [24], and modified for object detection
is listed as Algorithm 2. The GUFIR filter operation
is divided into two parts: 1) batch initial values and 2)
iterative update. The short batch forms are used to ini-
tialize the iterations. Accordingly, the algorithm re-
quires a short measurement vector ym,s = [ymys]T,
where s=nN+K, and an auxiliary partitioned
matrix CNcomputed by [24], [57]
CN=
¯
HF (N1)
.
.
.
¯
HF 1
¯
H
,(16)
The initial state xsis also computed in batch form
(line 7).
Similarly to GKF, iterations in the GUFIR are per-
formed in two phases: predict and update. In the
predict phase, only the a priori state estimate xsis
computed. Recall that GUFIR does not require the
noise statistics. In the update phase, the state esti-
mate is combined with the actual observation state
to refine the state. The a posteriori state estimate is
updated iteratively using the generalized noise power
gain (GNPG) Gl, the new observation zl, and the bias
correction gain Kl. Finally, the a posteriori state es-
timate ¯xlgoes to the GUFIR filter output.
Algorithm 2: GUFIR Filtering Algorithm
for Object Detection under CMN
Data: yn
Result: ˆxn
1begin
2Γ = ΨHF 1;¯
H=HΓ;
Φ = BQBTΓT;
3for n=N1, N, · · · do
4m=nN+ 1 ,s=nN+K;
5Gs= (CT
NCN)1;
6Ym,s = [ymym+1 . . . ys]T;
7¯xs=GsCT
NYm,s ;
8for l=s+ 1 : ndo
9zl=ylΨyl1;
10 Gl= [ ¯
HT¯
H+ (F Gl1FT)1]1;
11 Kl=Gl¯
HT;
12 ¯x
l=F¯xl1;
13 ¯xl= ¯x
l+Kl(zl¯
H¯x
l);
14 end for
15 ˆxn= ¯xn;
16 end for
17 end
6 Experimental Results
To perform testing of the proposed algorithms, we
choose the benchmark data “Car4” available from
[59]. Before starting detecting objects using GUFIR
filter and GKF, we tune them under the following as-
sumptions. By analyzing the car trajectory, we com-
pute the standard deviation of the acceleration noise to
be σw= 3 m/s2and that suppose that the CMN orig-
inates from white Gaussian noise with the standard
deviation of σv= 2 m. To obtain the ground truth,
we manually annotate the positions of the object us-
ing VOTT, [60], for the sample time of τ= 1 s and
the coloredness factor of Ψ=0.3. This procedure
gives Q=σ2
w,R=σ2
V, and Nopt = 20. The object
dynamics is described by (3) as shown in Section 4.
Figure 1 illustrates the trajectories based on the
centroids of the BBs detected by GKG, GUFIR,
faster-RCNN, and TPVOC algorithms.
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2024.20.7
Eli G. Pale-Ramon, Luis J. Morales-Mendoza,
Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,
Jorge A. Ortega-Contreras, Yuriy S. Shmaliy
E-ISSN: 2224-3488
63
Volume 20, 2024
60 80 100 120 140 160 180 200 220 240 260
Coordinate x
50
55
60
65
70
75
Coordinate y
Measurement
Ground Truth
GKF
GUFIR
faster-RCNN
Tensorflow P VOC
Fig.1: Measured and estimated positions of the
“Car4” benchmark trajectory
The ground truth trajectory and the measured tra-
jectory are also shown here. Inherently, the measure-
ments produce the highest level of noise, while all
filtering algorithms reduce variations with respect to
the GT trajectory. It also follows that GUFIR effec-
tively smoothes data and reduces noise much better
than GKF, which exhibits greater variation. Mean-
while, both “Faster-RCNN” and “Tensorflow PAS-
CAL VOC” algorithms perform in-between. So, we
see that GUFIR algorithm outperforms others. To
evaluate the performance, the precision metrics were
calculated as shown in Figure 2.
As can be seen, GKF demonstrates the high pre-
cision up to an IoU threshold of 0.8, with precision
decreasing to around 96% between the IoU thresh-
olds of 0.8and 0.9. In the meantime, the remaining
algorithms exhibit the highest precision between the
IoU thresholds of 0and 0.9. Even so, all algorithms
demonstrate low precision at the IoU threshold equal
to 1. In other words, no algorithm has the ability to
estimate BB that overlaps 100% with the GT BB.
Although the GUFIR, “Faster-RCNN” and “Ten-
sorflow PASCAL VOC” algorithms exhibit similar
performances, as can be seen in Figure 1, GUFIR is
much more successful in noise reduction. The con-
sistency observed in the precision metric might be at-
tributed to the confinement of values within a certain
range. If we set a commonly employed IoU threshold
of 0.5, [42], then the estimated BB overlaps at least
50% with the GT BB. In this case, the precision be-
comes the same as for the levels of 53% or 59%.
The relative error, illustrating the estimation error
of each algorithm compared to the ground truth, are
sketched in Figure 3. Here, each line represents the
difference between the centroids of the EBB and the
centroid of the GT BB. While all algorithms exhibit
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Threshold
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Precision
GKF
GUFIR
Tensorflow P VOC
faster-RCNN
Fig.2: Precision of the filtering algorithms
0 100 200 300 400 500 600 700
Frames
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Relative error of BB estimated
GKF
GUFIR
Tensorflow P VOC
faster-RCNN
Fig.3: The relative error of the estimated BB
similar overall behaviors, GUFIR produces smaller
errors in each object detection. On the contrary, GKF
exhibits gives larger errors, and the “Faster-RCNN”
and “Tensorflow PASCAL VOC” algorithms give in-
termediate error values.
In Table 1, we list the performance metrics, Root
Mean Square Errors (RMSEs), and relative Errors, for
the GKF, GUFIR, “Faster-RCNN”, and “Tensorflow
PASCAL VOC” algorithms.
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2024.20.7
Eli G. Pale-Ramon, Luis J. Morales-Mendoza,
Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,
Jorge A. Ortega-Contreras, Yuriy S. Shmaliy
E-ISSN: 2224-3488
64
Volume 20, 2024
Table 2.Percentage of Centroids within the Ground
Truth
Data Percentage
GKF 19%
GUFIR 23%
faster-RCNN 22%
Tensorflow Pascal VOC 22%
Table 1.RMSE and Relative Errors Produced by the
Algorithms
Metric results
Data RMSE Average
GKF 4.1381 0.4053
GUFIR 2.8707 0.3936
faster-RCNN 3.2883 0.4024
Tensorflow Pascal VOC 2.1054 0.4024
As can be seen, the GKF has the highest RMSE
of 4.1381, indicating a significant deviation from the
GT values. Moreover, its relative error of 0.4053 sug-
gests that, in average, the GKF predictions deviate
from the ground truth by approximately 40.53%. The
GUFIR has the best performance, having the RMSE
of 2.8707 and reducing the relative prediction error to
0.3936. This means a slightly lower average devia-
tion from GT values, around 39.36%. In the mean-
time, the standard “Faster-RCNN”, and “Tensorflow
PASCAL VOC” algorithms exhibit intermediate inte-
gral performances.
Additionally, we calculated another metric, Cen-
ter of Rectangle (CoR), Table 2. This metric evalu-
ates the accuracy of detections by determining if the
center of EBB is within the geometrical limits of the
GT BB. The results are presented as the percentage
of CoR of the estimated BB that is within the GT BB
for each of the proposed algorithms. GUFIR exhibits
slightly better performance than the “Faster-RCNN”,
“Tensorflow PASCAL VOC” and GKF algorithms,
while GKF shows the lowest performance. These re-
sults confirm that GUFIR demonstrates superior per-
formance, effectively improving the detection process
with lower estimation error and high precision.
7 Conclusion
The GUFIR filtering algorithm developed in this pa-
per for visual object tracking using information about
the bounding box coordinates has demonstrated su-
perior precision over the GKF and comparable with
the standard “Faster-RCNN” and “Tensorflow PAS-
CAL VOC” algorithms, especially in the 0 to 0.9
threshold range. This has become possible by treat-
ing the environmental disturbances as Gauss-Markov
colored noise. The GUFIR algorithm effectively mit-
igates estimation errors and improve detection accu-
racy. Therefore, it can be recommended as a useful
tool for visual object detection.
We are now exploring the integration of GUFIR
filter into real-time detection systems and its adapta-
tion to different environmental conditions for robust
performance. The results will be reported it the near
future.
References:
[1] B. J. Scholl, Z. W. Pylyshyn, and J. Feldman,
“What is a visual object? Evidence from target
merging in multiple object tracking,” Cognition,
vol. 80, no. 1-2, pp. 159–177, 2001.
[2] P. Zhang, D. Wang, and H. Lu, “Multi-modal
visual tracking: Review and experimental com-
parison,” Computational Visual Media, vol. 10,
pp. 193–214, 2024.
[3] M. Dunnhofer, A. Furnari, G. M. Farinella, and
C. Micheloni, “Visual object tracking in first
person vision,” Int. J. Comput. Vision, vol. 131,
pp. 259–283, 2023.
[4] T. I. Amosa, P. Sebastian, L. I. Izhar, O. Ibrahim,
L. S. Ayinla, A. A. Bahashwan, A. Bala,
and Y. A. Samaila, “Multi-camera multi-object
tracking: A review of current trends and fu-
ture advances,” Neurocomputing, vol. 552, p.
126558, 2023.
[5] Z. Tang, T. Xu, H. Li, X.-J. Wu, X.-F. Zhu, and
J. Kittler, “Exploring fusion strategies for ac-
curate rgbt visual object tracking,” Information
Fusion, vol. 99, p. 101881, 2023.
[6] A. S. Jalal, “The state-of-the-art in visual ob-
ject tracking,” Informatica, vol. 36, pp. 227–
248, 2012.
[7] F. Chen, X. Wang, Y. Zhao, S. Lv, and X. Niu,
“Visual object tracking: A survey,” Comput. Vi-
sion Image Understand., vol. 222, p. 103508,
2022.
[8] E. Araujo, C. R. Silva, and D. J. B. S. Sampaio,
“Video target tracking by using competitive neu-
ral networks,” WSEAS Trans. Signal Process.,
vol. 8, no. 4, pp. 420–431, 2008.
[9] K. Sundaraj, “Real-time face detection using dy-
namic background subtraction,” WSEAS Infor-
mat. Sci. Appl., vol. 11, no. 5, pp. 420–431,
2008.
[10] A. Yilmaz, O. Javed, and M. Shah, “Object
tracking: A survey,” Acm Computing Surveys
(CSUR), vol. 38, no. 4, pp. 1–45, 2006.
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2024.20.7
Eli G. Pale-Ramon, Luis J. Morales-Mendoza,
Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,
Jorge A. Ortega-Contreras, Yuriy S. Shmaliy
E-ISSN: 2224-3488
65
Volume 20, 2024
[11] J. Viitanen, M. Happonen, P. Patama, and J. Ra-
jamäki, “Near border procedures for tracking
information,” WSEAS Trans. Systems, vol. 3,
no. 9, pp. 223–232, 2010.
[12] Z. Li, M. Dong, S. Wen, X. Hu, P. Zhou, and
Z. Zeng, “Clu-cnns: Object detection for medi-
cal images,” Neurocomputing, vol. 350, pp. 53–
59, 2019.
[13] G. Xu, A. S. Khan, A. J. Moshayedi, X. Zhang,
and Y. Shuxin, “The object detection, perspec-
tive and obstacles in robotic: a review,” EAI En-
dorsed Trans. AI Robot., vol. 1, no. 1, 2022.
[14] B.-F. Wu, Y.-H. Chen, and P.-C. Huang,
“A demand-driven architecture for web-based
tracking systems,” WSEAS Trans. Informat. Sci.
Appl., vol. 12, no. 8, pp. 477–486, 2011.
[15] Y. Xu, Y. S. Shmaliy, X. Chen, and Y. Li,
“UWB-based indoor human localization with
time-delayed data using EFIR filtering,” IEEE
Access, vol. 5, pp. 16 676–16 683, 2017.
[16] A. J. Frhan, “Detection and tracking of real-
world events from online social media user
data using hierarchical agglomerative clustering
based system,” WSEAS Trans. Comput., vol. 16,
pp. 355–365, 2017.
[17] D. Lokesh and N. V. Uma Reddy, “Energy effi-
cient routing design for target tracking in wire-
less sensor network,” WSEAS Trans. Informat.
Sci. Appl., vol. 19, pp. 132–137, 2022.
[18] Y. Yoon, A. Kosaka, and A. C. Kak, “A new
Kalman-filter-based framework for fast and ac-
curate visual tracking of rigid objects,” IEEE
Trans. Robotics, vol. 24, no. 5, pp. 1238–1251,
2008.
[19] M. K. Tyagi, M. Srinivasan, and L. S. S.
Reddy, “Design of traditional/hybrid software
project tracking technique: State space ap-
proach,” WSEAS Trans. Informat. Sci. Appl.,
vol. 11, no. 10, pp. 345–355, 2013.
[20] R. Haider, F. Mandreoli, and R. Martoglia, “Ef-
fective aggregation and querying of probabilis-
tic RFID data in a location tracking context,”
WSEAS Trans. Informat. Sci. Appl., vol. 12, pp.
148–160, 2015.
[21] E. G. Pale-Ramon, L. J. Morales-Mendoza,
M. González-Lee, O. G. Ibarra-Manzano, J. A.
Ortega-Contreras, and Y. S. Shmaliy, “Improv-
ing visual object tracking using general ufir and
kalman filters under disturbances in bounding
boxes,” IEEE Access, 2023.
[22] A. İftar, “Robust tracking and disturbance rejec-
tion for decentralized neutral distributed-time-
delay systems,” WSEAS Trans. Syst. Contr.,
vol. 18, pp. 307–315, 2023.
[23] Y. S. Shmaliy, “An iterative Kalman-like al-
gorithm ignoring noise and initial conditions,”
IEEE Trans. Signal Process., vol. 59, no. 6, pp.
2465–2473, 2011.
[24] Y. S. Shmaliy and S. Zhao, Optimal and Ro-
bust State Estimation: Finite Impulse Response
(FIR) and Kalman Approaches. John Wiley &
Sons, 2022.
[25] S. Vasuhi and V. Vaidehi, “Target detection and
tracking for video surveillance,” WSEAS Trans.
Signal Process., vol. 10, pp. 168–117, 2014.
[26] S. Ren, K. He, R. Girshick, and J. Sun, “Faster
r-cnn: Towards real-time object detection with
region proposal networks,” Advances Neural In-
format. Process. Syst., vol. 28, 2015.
[27] M. Abadi, A. Agarwal, P. Barham, E. Brevdo,
Z. Chen, C. Citro, G. S. Corrado, A. Davis,
J. Dean, M. Devin et al., “Tensorflow:
Large-scale machine learning on heteroge-
neous distributed systems,” arXiv preprint
arXiv:1603.04467, 2016.
[28] M. Everingham, S. M. A. Eslami, L. Van Gool,
C. K. I. Williams, J. Winn, and A. Zisserman,
“The pascal visual object classes challenge: A
retrospective,” International Journal of Com-
puter Vision, vol. 111, no. 1, pp. 98–136, Jan.
2015.
[29] L. Konwar, A. K. Talukdar, and K. K. Sarma,
“Robust real time multiple human detection and
tracking for automatic visual surveillance sys-
tem,” WSEAS Trans. Signal Process., vol. 17,
pp. 93–98, 2021.
[30] M. Benvenuti, M. G. Colantonio, S. Di Bono,
G. Pieri, and O. Salvetti, “Tracking of moving
targets in video sequences,” in Proc. 6th WSEAS
Int. Conf. on Neural Networks, Lisbon, June 16-
18, 2005, pp. 20–25.
[31] Y. Amit, P. Felzenszwalb, and R. Girshick, “Ob-
ject detection,” in Computer Vision: A Refer-
ence Guide. Springer, 2021, pp. 875–883.
[32] F. Jalled and I. Voronkov, “Object detec-
tion using image processing,” arXiv preprint
arXiv:1611.07791, 2016.
[33] W. Burger, M. J. Burge, M. J. Burge, and M. J.
Burge, Principles of Digital Image Processing.
Springer, 2009, vol. 54.
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2024.20.7
Eli G. Pale-Ramon, Luis J. Morales-Mendoza,
Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,
Jorge A. Ortega-Contreras, Yuriy S. Shmaliy
E-ISSN: 2224-3488
66
Volume 20, 2024
[34] B. Jahne, Practical Handbook on Image Pro-
cessing for Scientific and Technical Applica-
tions. CRC press, 2004.
[35] R. Szeliski, Computer vision: algorithms and
applications. Springer Nature, 2022.
[36] S.-Y. Hou, H.-S. Hung, Y.-C. Chang, and S.-H.
Chang, “Multitarget tracking algorithms using
angle innovations and extended Kalman filter,”
WSEAS Trans. Syst., vol. 3, no. 8, pp. 420–429,
2009.
[37] X. Sun, H. Qin, and J. Niu, “Comparison and
analysis of GNSS signal tracking performance
based on Kalman filter and traditional loop,”
WSEAS Trans. Signal Process., vol. 3, no. 9, pp.
99–108, 2013.
[38] I. Vasilev, D. Slater, G. Spacagna, P. Roelants,
and V. Zocca, Python Deep Learning: Explor-
ing deep learning techniques and neural net-
work architectures with Pytorch, Keras, and
TensorFlow. Packt, 2019.
[39] M. H. Assaf, V. Groza, and E. M. Petriu, “The
use of Kalman filter techniques for ship track es-
timation,” WSEAS Trans. Systems, vol. 19, pp.
7–13, 2020.
[40] S. Chen and C. Shao, “Efficient online tracking-
by-detection with kalman filter,” IEEE Access,
vol. 9, pp. 147 570–147 578, 2021.
[41] S. Brenton, “Overview of two performance met-
rics for obejct detection algorithms evaluation.”
[42] R. Padilla, W. L. Passos, T. L. Dias, S. L. Netto,
and E. A. da Silva, “A comparative analysis of
object detection metrics with a companion open-
source toolkit,” Electronics, vol. 10, no. 3, p.
279, 2021.
[43] K. Choeychuen, P. Kumhom, and K. Cham-
nongthai, “An efficient implementation of the
nearest neighbor based visual objects tracking,”
in 2006 Int. Symp. Intell. Signal Process. Com-
mun., 2006, pp. 574–577.
[44] Y. Xu, Y. S. Shmaliy, W. Ma, X. Jiang,
T. Shen, S. Bi, and H. Guo, “Improv-
ing tightly LiDAR/Compass/Encoder-integrated
mobile robot localization with uncertain sam-
pling period utilizing EFIR filter,” Mobile Net-
works Appl., vol. 26, pp. 440–448, 2021.
[45] M. Everingham, L. Van Gool, C. K. Williams,
J. Winn, and A. Zisserman, “The pascal visual
object classes (voc) challenge,” Int. J. Comput.
Vision, vol. 88, pp. 303–338, 2010.
[46] L. Čehovin, A. Leonardis, and M. Kristan, “Vi-
sual object tracking performance measures re-
visited,” IEEE Trans. Image Process., vol. 25,
no. 3, pp. 1261–1274, 2016.
[47] A. Barnston, “Correspondence among the corre-
lation [root mean square error] and heidke verifi-
cation measures; refinement of the heidke score
notes and correspondence, climate analysis cen-
ter 1992,” 2020.
[48] B. Karasulu and S. Korukoglu, “A software for
performance evaluation and comparison of peo-
ple detection and tracking methods in video pro-
cessing,” Multimed. Tools Appl., vol. 55, no. 3,
pp. 677–723, 2011.
[49] A. W. Smeulders, D. M. Chu, R. Cucchiara,
S. Calderara, A. Dehghan, and M. Shah, “Visual
tracking: An experimental survey,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 36, no. 7, pp.
1442–1468, 2013.
[50] D. L. Olson and D. Delen, Advanced Data Min-
ing Techniques. Springer Science & Business
Media, 2008.
[51] L. Fan, C. Kang, X. Zhang, and S. Wan, “Real-
time tracking method for a magnetic target using
total geomagnetic field intensity,” Pure Appl.
Geophys., vol. 173, pp. 2065–2071, 2016.
[52] X. R. Li and V. P. Jilkov, “Survey of maneu-
vering target tracking. Part I. Dynamic models,”
IEEE Trans. Aero. Electron. Syst., vol. 39, no. 4,
pp. 1333–1364, 2003.
[53] A. Bryson Jr and L. Henrikson, “Estimation us-
ing sampled data containing sequentially corre-
lated noise,” J. Spacecraft Rockets, vol. 5, no. 6,
pp. 662–665, 1968.
[54] Y. S. Shmaliy, S. Zhao, and C. K. Ahn, “Kalman
and UFIR state estimation with coloured mea-
surement noise using backward Euler method,”
IET Signal Process., vol. 14, no. 2, pp. 64–71,
2020.
[55] A. Bryson and D. Johansen, “Linear filtering for
time-varying systems using measurements con-
taining colored noise,” IEEE Trans. Automat.
Contr., vol. 10, no. 1, pp. 4–10, 1965.
[56] S. Zhao, Y. S. Shmaliy, and C. K. Ahn, “Bias-
constrained optimal fusion filtering for decen-
tralized WSN with correlated noise sources,”
IEEE Trans. Signal Inform. Process. Netw.,
vol. 4, no. 4, pp. 727–735, 2018.
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2024.20.7
Eli G. Pale-Ramon, Luis J. Morales-Mendoza,
Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,
Jorge A. Ortega-Contreras, Yuriy S. Shmaliy
E-ISSN: 2224-3488
67
Volume 20, 2024
[57] Y. S. Shmaliy, S. Zhao, and C. K. Ahn, “Un-
biased finite impluse response filtering: An it-
erative alternative to Kalman filtering ignoring
noise and initial conditions,” IEEE Contr. Syst.
Mag., vol. 37, no. 5, pp. 70–89, 2017.
[58] F. Ramirez-Echeverria, A. Sarr, and Y. S.
Shmaliy, “Optimal memory for discrete-time
FIR filters in state-space,” IEEE Trans. Signal
Process., vol. 62, no. 3, pp. 557–561, 2014.
[59] (2015) Datasets-visual tracker benchmark. [On-
line]. Available: http://www.visual-tracking.net
[60] Microsoft, “Visual object tagging tool: An elec-
tron app for building end to end object detection
models from images and videos.”
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present re-
search, at all stages from the formulation of the prob-
lem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflicts of Interest
The authors have no conflicts of interest to declare
that are relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International , CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
Declaration of Generative AI and AI-
assisted Technologies in the Writing Process
The authors wrote, reviewed and edited the content
as needed they have not utilised artificial
intelligence (AI) tools. The authors take full
responsibility for the content of the publication.
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2024.20.7
Eli G. Pale-Ramon, Luis J. Morales-Mendoza,
Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,
Jorge A. Ortega-Contreras, Yuriy S. Shmaliy
E-ISSN: 2224-3488
68
Volume 20, 2024