Improving Visual Object Detection using General UFIR Filtering

ELI G. PALE-RAMON1, LUIS J. MORALES-MENDOZA2, OSCAR G. IBARRA-MANZANO1,

RENE FABIÁN VÁZQUEZ-BAUTISTA2, JORGE A. ORTEGA-CONTRERAS1,

YURIY S. SHMALIY1

1Department of Electronics Engineering

Universidad de Guanajuato

Salamanca, 36855

MEXICO

2Faculty of Electronics and Telecommunications Engineering

Universidad de Guanajuato

Poza Rica

MEXICO

Abstract: Object detection is a fundamental task in computer vision, which involves the identification and local-

ization of objects within image frames or video sequences. The problem is complicated by large variations in the

video camera bounding box, which can be thought of as colored measurement noise (CMN). In this paper, we use

the general unbiased finite impulse response (GUFIR) approach to improve detection performance under CMN.

The results are compared to the general Kalman filter (GKF) and two detection methods: “Faster-RCNN” and

“Tensorflow PASCAL Visual Object Classes (VOC)”. Experimental testing is carried out using the benchmark

data ”Car4”. It is shown that GUFIR significantly improves the detection accuracy and demonstrates the proper-

ties of the effective tool for visual object tracking.

Key-Words: Object detection, colored measurement noise, precision, relative error, estimation.

Received: March 17, 2024. Revised: August 11, 2024. Accepted: September 8, 2024. Published: November 13, 2024.

1 Introduction

Object detection is a key task in computer vision, [1],

[2], [3], [4], [5], that involves identifying objects

and their locations in image frames or video se-

quences, [6], [7], [8], [9]. The problem arises in var-

ious research areas, including autonomous driving,

surveillance, medical imaging, and robotics, among

others, [10], [11], [12], [13]. The annoying thing is

that discrepancies usually arise between the estimated

positions of objects and the truth, primarily due to en-

vironmental factors, [14], [15], [16], [17]. Since dis-

crepancies are not white, they can be treated as col-

ored measurement noise (CMN), making accurate de-

tection difficult, [18], [19], [20], [21], [22].

The goal is thus to discusses the potential for re-

fining object detections through post-processing us-

ing filtering methods to improve detection or track-

ing accuracy under CMN. We do it by using the un-

biased finite impulse response (UFIR) filtering ap-

proach, [23], [24]. We use the general Kalman fil-

ter (GKF) as a benchmark and perform a comparative

analysis with two widely-used detection tools called

“Faster R-CNN in CNKT” and “Tensorflow PASCAL

Visual Object Classes (VOC)”, [25], [26], [27], [28],

[29], [30]. Testing is provided using video sequences,

utilizing classic methods of object initialization and a

combination of region labeling and contour search for

object detection. The object position is represented

by a bounding box (BB), whose coordinates serve as

inputs for filtering algorithms. For the “Faster R-

CNN in CNKT” and “Tensorflow PASCAL VOC” al-

gorithms, the Visual Object Tagging Tool (VOTT) is

employed. The VOTT facilitates the gathering of BB

values from each frame. These assets are then con-

verted into the “Faster R-CNN in CNKT” and “Ten-

sorflow PASCAL VOC” formats for model training

and obtaining object detection information.

2 Object Detection Process

The object detection process utilizes information ex-

tracted from images or video sequences to identify

and locate objects. Pose information used with this

aim is collected in the video camera BB, [31], [32],

which is one of the most important tasks, [33], [34],

[35]. The process starts with region labeling, divides

the image into regions and identifies boundaries be-

tween them. Object is then described through the

properties, requiring the extraction of parameters and

properties for representation. Next, an object detec-

tion algorithm is applied to analyze the image and

predict the location of objects in the scene. Post-

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2024.20.7

Eli G. Pale-Ramon, Luis J. Morales-Mendoza,

Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,

Jorge A. Ortega-Contreras, Yuriy S. Shmaliy

E-ISSN: 2224-3488

Volume 20, 2024

processing techniques, like detection refinement and

filtering algorithms such as Kalman, can be used to

improve detection precision, [36], [37], [38], [39],

[40]. Finally, the results of object detection are eval-

uated for the ground truth data. Evaluation metrics

include precision, Root Mean Square Error (RMSE),

and Center of the Rectangle (CoR), among others,

[41], [42].

2.1 Bounding Box Information

The BB is a rectangular frame that describes the loca-

tion of a detected object within an image or a video

sequence. The BB is represented by four coordi-

nates: the x-coordinate and y-coordinate of the top-

left corner of the box, and the width and height of the

box, [43]. The BBs are commonly used to locate ob-

jects and provide spatial information about their posi-

tions.

2.2 Ground truth

The ground truth (GT) is the actual, real, or correct ob-

ject position in a scene, [44]. The GT can be obtained

by manual annotation, through a reference algorithm,

or by automatic annotation using specialized software

tools. Generating a reliable and valid reference anno-

tation can be a time-consuming and complicated pro-

cedure. The ground truth includes information such

as the object locations, and their coordinates in the

image or video sequence. The GT serves as a bench-

mark for evaluating the accuracy, and effectiveness

of detection. It is also used to evaluate tracking algo-

rithms, [45].

3 Performance Evaluation

The detection performance can be evaluated using the

precision metric, RMSE, and CoR, [46]. The RMSE

is a measure of the variation between truth values and

estimated values, [47]. It is computed by taking the

square root of the average of the squared differences

between each truth value yiand its corresponding es-

timated value ˆyifor N(observations), where irep-

resents the i-th measurement out of Ntotal observa-

tions. Precision is a measure of how well the esti-

mated positions align with the ground truth, [42], [48].

Precision can be quantified using intersection over

union (IoU), which indicates the percentage of over-

lap of the predicted BB over the true BB (TBB). To

compute the precision, it needs comparing the IoU re-

sults with an established threshold, [42], [48], [49],

[50], which can be done as follows:

IoU =IA(TBB −EBB)−IA (1)

Precision =ȈTP/ΣTP + ΣFP (2)

where IA is the area of intersection between the BB

of the target object, the TBB, and the estimated BB

(EBB). Also TP is true positive and FP is false posi-

tive. The center of the rectangle is a metric to measure

the distance between the estimated and ground truth

BB (GT BB). This metric determines whether the de-

tection is true or false. It is positive when the center of

EBB is within the geometrical limit of the GT BB. The

results can be presented by the percentage of CoR of

estimated BB that is within the ground truth BB, [41].

The estimation error in object detection indicates the

difference between the EBB and the ground truth po-

sition. The estimation error can be measured using

diverse metrics. The relative error is a statistical met-

ric, which is calculated as the result of the difference

between the estimated value and GT value divided by

the GT value, [51], and is often expressed in percent-

age.

4 State-Space Model with

Measurement Disturbances

Input data are the object position information saved

in the BB. Since these data are heavily disturbed by

the environment, we consider the measurement dis-

turbances as CMN. The object coordinates are stored

at the BB coordinates Xc,Yc,Xw, an Yh. To detect

the object in the discrete-time index n, we need mea-

surements of the BB coordinates at every n. There-

fore, we represent the object dynamics in discrete-

time state-space by the following state and observa-

tion equations:

xn=F xn−1+Bwn,(3)

vn= Ψvn−1+ξn,(4)

yn=Hxn+vn,(5)

where xn∈RK,K= 8, is the partitioned state vector

xn=XT

cYT

cXT

wYT

hT,(6)

in which the vector components are defined by

Xc=xc

Vxc, Yc=yc

Vyc, Xw=xw

VXw, Yh=yh

Vyh.

The correspondent velocities VXc, VYc, VXw, and VYh

are considered constant, following, [52]. The system

matrix Fis a block diagonal with the components

F=1τ

0 1 ,(7)

where the block is repeated for the ”x”,”y”, ”width”,

and ”height” spatial dimensions, and τis the sampling

time. The system noise matrix ¯

Bis defined for each

of the states by

B=τ2

τ,(8)

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2024.20.7

Eli G. Pale-Ramon, Luis J. Morales-Mendoza,

Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,

Jorge A. Ortega-Contreras, Yuriy S. Shmaliy

E-ISSN: 2224-3488

Volume 20, 2024

ad the observation matrix is

H=





10000000

00100000

00001000

00000010





.(9)

We suppose that the object noise wn∼ N (0, Q)∈

R4,

wn= [wxcnwycnwxwnwyhn]T,

has the known covariance Q. Since the measurement

data are heavily affected by CMN, we treat the mea-

surement noise vn∈R4,

vn= [vxcnvycnvxwnvyhn]T,

as colored and represent by the Gauss-Markov

model (4), where the components in the diagonal col-

oredness factor matrix

Ψ = diag [ψxcψycψxwψyh]

are chosen such that vnremains stationary. The driv-

ing zero mean white Gaussian noise ξn∼ N (0, R)∈

R4in (4),

ξn= [ξxcnξycnξxwnξyhn]T,(10)

has the known covariance R. The noise vectors wn

and ξnare mutually uncorrelated, so the property

E{wnξT

k}= 0 holds for all nand k.

The standard UFIR and KF algorithms cannot be

applied to models with CMN. Therefore, we first con-

vert the model (3)–(5) to the standard form with white

Gaussian noise components. To this end, we will

use the measurement differencing approach proposed

in [53], in the form developed for the Euler forward

method-based state model (3) in [54].

4.1 State-Space Model Transformation

To avoid CMN in the observation equation (5), we use

the measurement differencing approach, [53], [55],

and introduce a new observation vector zn=yn−

Ψyn−1as

zn=Hxn+vn−ΨHxn−1−Ψvn−1,(11)

zn=¯

Hxn+ ¯vn,(12)

where ¯

H=H−Γ,Γ = ΨHF −1, and ¯vn= ΓBwn+

ξn. In the new observation equation (12), the noise ¯vn

is white Gaussian with the properties:

E{¯vn¯vT

n}= ΓΦ + R , (13)

E{¯vnwT

n}= ΓBQ , (14)

E{wn¯vT

n}=QBTΓT,(15)

where Φ = BQBTΓT. So, ¯vnand wnare time-

correlated.

To implement the robust UFIR filter and optimal

KF for the new state space model (3) and (12), we

need new bias correction gains for time-correlated

¯vnand wn. Our transformations will be based on

the following measures: ˆx−

n≜ˆxn|n−1is the a pri-

ori estimate, ˆxn≜ˆxn|nis the a posteriori esti-

mate, ϵ−

n=xn−ˆx−

nis the a priori estimation er-

ror, ϵn=xn−ˆxnis the a posteriori estimation error,

P−

n≜Pn|n−1=E{ϵ−

nϵ−T

n}is the a priori error co-

variance, and Pn≜Pn|n=E{ϵnϵnT}is the a poste-

riori error covariance.

5 General Filters for CMN

In this section, we will follow, [24], and develop the

GKF and GUFIR algorithms for CMN using mea-

sured information contained in BBs.

5.1 General KF for CMN

There are two options to develop GKF under time-

correlated wnand ˆvn, [24]: 1) derive a new bias cor-

rection gain or 2) de-correlate the noise vectors. Since

the ultimate algorithms are equivalent and do not have

significant advantages over each other, we will base

our developments on the first option, which implies a

new bias correction gain. A pseudo code of the GKF

developed for object detection under Gauss-Markov

CMN with time-correlated wnand ˆvnis listed as Al-

gorithm 1, [21], [24]. In the predict phase, this algo-

Algorithm 1: GKF for Object Detection un-

der CMN with Time-Correlated wnand ¯vn

Data: yn,ˆx0,P0,Q,R

Result: xn,Pn

1begin

2Γ = ΨHF −1;¯

H=H−Γ;

Φ = BQBTΓT;

3for n= 1,2,· · · do

4zn=yn−Ψyn−1;

5P−

n=F Pn−1FT+BQBT;

6Sn=¯

HP −

n¯

HT+R+HΦ + ΦT¯

HT;

7Kn= (P−

n¯

HT+ Φ)S−1

n;

8ˆx−

n=Fˆxn−1;

9ˆxn= ˆx−

n+Kn(zn−¯

Hˆx−

n);

10 Pn= (I−Kn¯

H)P−

n−KnΦT;

11 end for

12 end

rithm computes the a priori state estimate ˆx−

nand the

a priori state estimation error covariance P−

n. In the

update phase, it uses the new observation equation zn,

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2024.20.7

Eli G. Pale-Ramon, Luis J. Morales-Mendoza,

Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,

Jorge A. Ortega-Contreras, Yuriy S. Shmaliy

E-ISSN: 2224-3488

Volume 20, 2024

computes the innovation covariance Sn, and updates

the bias correction gain Kn, the a posteriori estimate

ˆxn, and the a posteriori error covariance Pn. The sub-

sequent minimization of the trace of Pnby Kngives

the optimal gain Knfor the GKF. Note that the zero

coloredness factor Ψ=0makes Γ=0and Φ=0

and converts GKF to the standard KF.

5.2 General UFIR Filter for CMN

Unlike GKF, the general UFIR (GUFIR) filter does

not require any prior knowledge about noise, ex-

cept for the zero mean assumption, and initial val-

ues. Therefore, wnand ¯

vncan be ignored in the

model (3) and (12). This means that the GUFIR is in-

variant to time-correlation between wnand ¯vn, [24],

[56]. The GUFIR filter, being of the FIR type, op-

erates without the feedback. The unbiasedness con-

dition assumes orthogonality between the linear esti-

mator and the observation. In this sense, the struc-

ture of GUFIR resembles the Gaussian least squares.

Moreover, a GUFIR filter does not require initial con-

ditions, [57]. However, the GUFIR filter cannot ig-

nore CMN, which violates the zero-mean assumption

at short horizons, [24]. Also, the GUFIR filter pro-

cesses data over the averaging horizon [m, n]of N

points, from m=n−N+ 1 to n, and minimizes the

MSE when the horizon is set optimally as Nopt, [58].

Note that the FIR filter theory is given in [24].

A pseudo code of the GUFIR filter developed for

CMN in [21], [24], and modified for object detection

is listed as Algorithm 2. The GUFIR filter operation

is divided into two parts: 1) batch initial values and 2)

iterative update. The short batch forms are used to ini-

tialize the iterations. Accordingly, the algorithm re-

quires a short measurement vector ym,s = [ym…ys]T,

where s=n−N+K, and an auxiliary partitioned

matrix CNcomputed by [24], [57]

CN=





HF −(N−1)

HF −1







,(16)

The initial state xsis also computed in batch form

(line 7).

Similarly to GKF, iterations in the GUFIR are per-

formed in two phases: predict and update. In the

predict phase, only the a priori state estimate xsis

computed. Recall that GUFIR does not require the

noise statistics. In the update phase, the state esti-

mate is combined with the actual observation state

to refine the state. The a posteriori state estimate is

updated iteratively using the generalized noise power

gain (GNPG) Gl, the new observation zl, and the bias

correction gain Kl. Finally, the a posteriori state es-

timate ¯xlgoes to the GUFIR filter output.

Algorithm 2: GUFIR Filtering Algorithm

for Object Detection under CMN

Data: yn

Result: ˆxn

1begin

2Γ = ΨHF −1;¯

H=H−Γ;

Φ = BQBTΓT;

3for n=N−1, N, · · · do

4m=n−N+ 1 ,s=n−N+K;

5Gs= (CT

NCN)−1;

6Ym,s = [ymym+1 . . . ys]T;

7¯xs=GsCT

NYm,s ;

8for l=s+ 1 : ndo

9zl=yl−Ψyl−1;

10 Gl= [ ¯

HT¯

H+ (F Gl−1FT)−1]−1;

11 Kl=Gl¯

HT;

12 ¯x−

l=F¯xl−1;

13 ¯xl= ¯x−

l+Kl(zl−¯

H¯x−

l);

14 end for

15 ˆxn= ¯xn;

16 end for

17 end

6 Experimental Results

To perform testing of the proposed algorithms, we

choose the benchmark data “Car4” available from

[59]. Before starting detecting objects using GUFIR

filter and GKF, we tune them under the following as-

sumptions. By analyzing the car trajectory, we com-

pute the standard deviation of the acceleration noise to

be σw= 3 m/s2and that suppose that the CMN orig-

inates from white Gaussian noise with the standard

deviation of σv= 2 m. To obtain the ground truth,

we manually annotate the positions of the object us-

ing VOTT, [60], for the sample time of τ= 1 s and

the coloredness factor of Ψ=0.3. This procedure

gives Q=σ2

w,R=σ2

V, and Nopt = 20. The object

dynamics is described by (3) as shown in Section 4.

Figure 1 illustrates the trajectories based on the

centroids of the BBs detected by GKG, GUFIR,

faster-RCNN, and TPVOC algorithms.

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2024.20.7

Eli G. Pale-Ramon, Luis J. Morales-Mendoza,

Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,

Jorge A. Ortega-Contreras, Yuriy S. Shmaliy

E-ISSN: 2224-3488

Volume 20, 2024

60 80 100 120 140 160 180 200 220 240 260

Coordinate x

Coordinate y

Measurement

Ground Truth

GKF

GUFIR

faster-RCNN

Tensorflow P VOC

Fig.1: Measured and estimated positions of the

“Car4” benchmark trajectory

The ground truth trajectory and the measured tra-

jectory are also shown here. Inherently, the measure-

ments produce the highest level of noise, while all

filtering algorithms reduce variations with respect to

the GT trajectory. It also follows that GUFIR effec-

tively smoothes data and reduces noise much better

than GKF, which exhibits greater variation. Mean-

while, both “Faster-RCNN” and “Tensorflow PAS-

CAL VOC” algorithms perform in-between. So, we

see that GUFIR algorithm outperforms others. To

evaluate the performance, the precision metrics were

calculated as shown in Figure 2.

As can be seen, GKF demonstrates the high pre-

cision up to an IoU threshold of 0.8, with precision

decreasing to around 96% between the IoU thresh-

olds of 0.8and 0.9. In the meantime, the remaining

algorithms exhibit the highest precision between the

IoU thresholds of 0and 0.9. Even so, all algorithms

demonstrate low precision at the IoU threshold equal

to 1. In other words, no algorithm has the ability to

estimate BB that overlaps 100% with the GT BB.

Although the GUFIR, “Faster-RCNN” and “Ten-

sorflow PASCAL VOC” algorithms exhibit similar

performances, as can be seen in Figure 1, GUFIR is

much more successful in noise reduction. The con-

sistency observed in the precision metric might be at-

tributed to the confinement of values within a certain

range. If we set a commonly employed IoU threshold

of 0.5, [42], then the estimated BB overlaps at least

50% with the GT BB. In this case, the precision be-

comes the same as for the levels of 53% or 59%.

The relative error, illustrating the estimation error

of each algorithm compared to the ground truth, are

sketched in Figure 3. Here, each line represents the

difference between the centroids of the EBB and the

centroid of the GT BB. While all algorithms exhibit

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Threshold

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Precision

GKF

GUFIR

Tensorflow P VOC

faster-RCNN

Fig.2: Precision of the filtering algorithms

0 100 200 300 400 500 600 700

Frames

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Relative error of BB estimated

GKF

GUFIR

Tensorflow P VOC

faster-RCNN

Fig.3: The relative error of the estimated BB

similar overall behaviors, GUFIR produces smaller

errors in each object detection. On the contrary, GKF

exhibits gives larger errors, and the “Faster-RCNN”

and “Tensorflow PASCAL VOC” algorithms give in-

termediate error values.

In Table 1, we list the performance metrics, Root

Mean Square Errors (RMSEs), and relative Errors, for

the GKF, GUFIR, “Faster-RCNN”, and “Tensorflow

PASCAL VOC” algorithms.

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2024.20.7

Eli G. Pale-Ramon, Luis J. Morales-Mendoza,

Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,

Jorge A. Ortega-Contreras, Yuriy S. Shmaliy

E-ISSN: 2224-3488

Volume 20, 2024

Table 2.Percentage of Centroids within the Ground

Truth

Data Percentage

GKF 19%

GUFIR 23%

faster-RCNN 22%

Tensorflow Pascal VOC 22%

Table 1.RMSE and Relative Errors Produced by the

Algorithms

Metric results

Data RMSE Average

GKF 4.1381 0.4053

GUFIR 2.8707 0.3936

faster-RCNN 3.2883 0.4024

Tensorflow Pascal VOC 2.1054 0.4024

As can be seen, the GKF has the highest RMSE

of 4.1381, indicating a significant deviation from the

GT values. Moreover, its relative error of 0.4053 sug-

gests that, in average, the GKF predictions deviate

from the ground truth by approximately 40.53%. The

GUFIR has the best performance, having the RMSE

of 2.8707 and reducing the relative prediction error to

0.3936. This means a slightly lower average devia-

tion from GT values, around 39.36%. In the mean-

time, the standard “Faster-RCNN”, and “Tensorflow

PASCAL VOC” algorithms exhibit intermediate inte-

gral performances.

Additionally, we calculated another metric, Cen-

ter of Rectangle (CoR), Table 2. This metric evalu-

ates the accuracy of detections by determining if the

center of EBB is within the geometrical limits of the

GT BB. The results are presented as the percentage

of CoR of the estimated BB that is within the GT BB

for each of the proposed algorithms. GUFIR exhibits

slightly better performance than the “Faster-RCNN”,

“Tensorflow PASCAL VOC” and GKF algorithms,

while GKF shows the lowest performance. These re-

sults confirm that GUFIR demonstrates superior per-

formance, effectively improving the detection process

with lower estimation error and high precision.

7 Conclusion

The GUFIR filtering algorithm developed in this pa-

per for visual object tracking using information about

the bounding box coordinates has demonstrated su-

perior precision over the GKF and comparable with

the standard “Faster-RCNN” and “Tensorflow PAS-

CAL VOC” algorithms, especially in the 0 to 0.9

threshold range. This has become possible by treat-

ing the environmental disturbances as Gauss-Markov

colored noise. The GUFIR algorithm effectively mit-

igates estimation errors and improve detection accu-

racy. Therefore, it can be recommended as a useful

tool for visual object detection.

We are now exploring the integration of GUFIR

filter into real-time detection systems and its adapta-

tion to different environmental conditions for robust

performance. The results will be reported it the near

future.

References:

[1] B. J. Scholl, Z. W. Pylyshyn, and J. Feldman,

“What is a visual object? Evidence from target

merging in multiple object tracking,” Cognition,

vol. 80, no. 1-2, pp. 159–177, 2001.

[2] P. Zhang, D. Wang, and H. Lu, “Multi-modal

visual tracking: Review and experimental com-

parison,” Computational Visual Media, vol. 10,

pp. 193–214, 2024.

[3] M. Dunnhofer, A. Furnari, G. M. Farinella, and

C. Micheloni, “Visual object tracking in first

person vision,” Int. J. Comput. Vision, vol. 131,

pp. 259–283, 2023.

[4] T. I. Amosa, P. Sebastian, L. I. Izhar, O. Ibrahim,

L. S. Ayinla, A. A. Bahashwan, A. Bala,

and Y. A. Samaila, “Multi-camera multi-object

tracking: A review of current trends and fu-

ture advances,” Neurocomputing, vol. 552, p.

126558, 2023.

[5] Z. Tang, T. Xu, H. Li, X.-J. Wu, X.-F. Zhu, and

J. Kittler, “Exploring fusion strategies for ac-

curate rgbt visual object tracking,” Information

Fusion, vol. 99, p. 101881, 2023.

[6] A. S. Jalal, “The state-of-the-art in visual ob-

ject tracking,” Informatica, vol. 36, pp. 227–

248, 2012.

[7] F. Chen, X. Wang, Y. Zhao, S. Lv, and X. Niu,

“Visual object tracking: A survey,” Comput. Vi-

sion Image Understand., vol. 222, p. 103508,

2022.

[8] E. Araujo, C. R. Silva, and D. J. B. S. Sampaio,

“Video target tracking by using competitive neu-

ral networks,” WSEAS Trans. Signal Process.,

vol. 8, no. 4, pp. 420–431, 2008.

[9] K. Sundaraj, “Real-time face detection using dy-

namic background subtraction,” WSEAS Infor-

mat. Sci. Appl., vol. 11, no. 5, pp. 420–431,

2008.

[10] A. Yilmaz, O. Javed, and M. Shah, “Object

tracking: A survey,” Acm Computing Surveys

(CSUR), vol. 38, no. 4, pp. 1–45, 2006.

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2024.20.7

Eli G. Pale-Ramon, Luis J. Morales-Mendoza,

Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,

Jorge A. Ortega-Contreras, Yuriy S. Shmaliy

E-ISSN: 2224-3488

Volume 20, 2024

[11] J. Viitanen, M. Happonen, P. Patama, and J. Ra-

jamäki, “Near border procedures for tracking

information,” WSEAS Trans. Systems, vol. 3,

no. 9, pp. 223–232, 2010.

[12] Z. Li, M. Dong, S. Wen, X. Hu, P. Zhou, and

Z. Zeng, “Clu-cnns: Object detection for medi-

cal images,” Neurocomputing, vol. 350, pp. 53–

59, 2019.

[13] G. Xu, A. S. Khan, A. J. Moshayedi, X. Zhang,

and Y. Shuxin, “The object detection, perspec-

tive and obstacles in robotic: a review,” EAI En-

dorsed Trans. AI Robot., vol. 1, no. 1, 2022.

[14] B.-F. Wu, Y.-H. Chen, and P.-C. Huang,

“A demand-driven architecture for web-based

tracking systems,” WSEAS Trans. Informat. Sci.

Appl., vol. 12, no. 8, pp. 477–486, 2011.

[15] Y. Xu, Y. S. Shmaliy, X. Chen, and Y. Li,

“UWB-based indoor human localization with

time-delayed data using EFIR filtering,” IEEE

Access, vol. 5, pp. 16 676–16 683, 2017.

[16] A. J. Frhan, “Detection and tracking of real-

world events from online social media user

data using hierarchical agglomerative clustering

based system,” WSEAS Trans. Comput., vol. 16,

pp. 355–365, 2017.

[17] D. Lokesh and N. V. Uma Reddy, “Energy effi-

cient routing design for target tracking in wire-

less sensor network,” WSEAS Trans. Informat.

Sci. Appl., vol. 19, pp. 132–137, 2022.

[18] Y. Yoon, A. Kosaka, and A. C. Kak, “A new

Kalman-filter-based framework for fast and ac-

curate visual tracking of rigid objects,” IEEE

Trans. Robotics, vol. 24, no. 5, pp. 1238–1251,

2008.

[19] M. K. Tyagi, M. Srinivasan, and L. S. S.

Reddy, “Design of traditional/hybrid software

project tracking technique: State space ap-

proach,” WSEAS Trans. Informat. Sci. Appl.,

vol. 11, no. 10, pp. 345–355, 2013.

[20] R. Haider, F. Mandreoli, and R. Martoglia, “Ef-

fective aggregation and querying of probabilis-

tic RFID data in a location tracking context,”

WSEAS Trans. Informat. Sci. Appl., vol. 12, pp.

148–160, 2015.

[21] E. G. Pale-Ramon, L. J. Morales-Mendoza,

M. González-Lee, O. G. Ibarra-Manzano, J. A.

Ortega-Contreras, and Y. S. Shmaliy, “Improv-

ing visual object tracking using general ufir and

kalman filters under disturbances in bounding

boxes,” IEEE Access, 2023.

[22] A. İftar, “Robust tracking and disturbance rejec-

tion for decentralized neutral distributed-time-

delay systems,” WSEAS Trans. Syst. Contr.,

vol. 18, pp. 307–315, 2023.

[23] Y. S. Shmaliy, “An iterative Kalman-like al-

gorithm ignoring noise and initial conditions,”

IEEE Trans. Signal Process., vol. 59, no. 6, pp.

2465–2473, 2011.

[24] Y. S. Shmaliy and S. Zhao, Optimal and Ro-

bust State Estimation: Finite Impulse Response

(FIR) and Kalman Approaches. John Wiley &

Sons, 2022.

[25] S. Vasuhi and V. Vaidehi, “Target detection and

tracking for video surveillance,” WSEAS Trans.

Signal Process., vol. 10, pp. 168–117, 2014.

[26] S. Ren, K. He, R. Girshick, and J. Sun, “Faster

r-cnn: Towards real-time object detection with

region proposal networks,” Advances Neural In-

format. Process. Syst., vol. 28, 2015.

[27] M. Abadi, A. Agarwal, P. Barham, E. Brevdo,

Z. Chen, C. Citro, G. S. Corrado, A. Davis,

J. Dean, M. Devin et al., “Tensorflow:

Large-scale machine learning on heteroge-

neous distributed systems,” arXiv preprint

arXiv:1603.04467, 2016.

[28] M. Everingham, S. M. A. Eslami, L. Van Gool,

C. K. I. Williams, J. Winn, and A. Zisserman,

“The pascal visual object classes challenge: A

retrospective,” International Journal of Com-

puter Vision, vol. 111, no. 1, pp. 98–136, Jan.

2015.

[29] L. Konwar, A. K. Talukdar, and K. K. Sarma,

“Robust real time multiple human detection and

tracking for automatic visual surveillance sys-

tem,” WSEAS Trans. Signal Process., vol. 17,

pp. 93–98, 2021.

[30] M. Benvenuti, M. G. Colantonio, S. Di Bono,

G. Pieri, and O. Salvetti, “Tracking of moving

targets in video sequences,” in Proc. 6th WSEAS

Int. Conf. on Neural Networks, Lisbon, June 16-

18, 2005, pp. 20–25.

[31] Y. Amit, P. Felzenszwalb, and R. Girshick, “Ob-

ject detection,” in Computer Vision: A Refer-

ence Guide. Springer, 2021, pp. 875–883.

[32] F. Jalled and I. Voronkov, “Object detec-

tion using image processing,” arXiv preprint

arXiv:1611.07791, 2016.

[33] W. Burger, M. J. Burge, M. J. Burge, and M. J.

Burge, Principles of Digital Image Processing.

Springer, 2009, vol. 54.

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2024.20.7

Eli G. Pale-Ramon, Luis J. Morales-Mendoza,

Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,

Jorge A. Ortega-Contreras, Yuriy S. Shmaliy

E-ISSN: 2224-3488

Volume 20, 2024

[34] B. Jahne, Practical Handbook on Image Pro-

cessing for Scientific and Technical Applica-

tions. CRC press, 2004.

[35] R. Szeliski, Computer vision: algorithms and

applications. Springer Nature, 2022.

[36] S.-Y. Hou, H.-S. Hung, Y.-C. Chang, and S.-H.

Chang, “Multitarget tracking algorithms using

angle innovations and extended Kalman filter,”

WSEAS Trans. Syst., vol. 3, no. 8, pp. 420–429,

2009.

[37] X. Sun, H. Qin, and J. Niu, “Comparison and

analysis of GNSS signal tracking performance

based on Kalman filter and traditional loop,”

WSEAS Trans. Signal Process., vol. 3, no. 9, pp.

99–108, 2013.

[38] I. Vasilev, D. Slater, G. Spacagna, P. Roelants,

and V. Zocca, Python Deep Learning: Explor-

ing deep learning techniques and neural net-

work architectures with Pytorch, Keras, and

TensorFlow. Packt, 2019.

[39] M. H. Assaf, V. Groza, and E. M. Petriu, “The

use of Kalman filter techniques for ship track es-

timation,” WSEAS Trans. Systems, vol. 19, pp.

7–13, 2020.

[40] S. Chen and C. Shao, “Efficient online tracking-

by-detection with kalman filter,” IEEE Access,

vol. 9, pp. 147 570–147 578, 2021.

[41] S. Brenton, “Overview of two performance met-

rics for obejct detection algorithms evaluation.”

[42] R. Padilla, W. L. Passos, T. L. Dias, S. L. Netto,

and E. A. da Silva, “A comparative analysis of

object detection metrics with a companion open-

source toolkit,” Electronics, vol. 10, no. 3, p.

279, 2021.

[43] K. Choeychuen, P. Kumhom, and K. Cham-

nongthai, “An efficient implementation of the

nearest neighbor based visual objects tracking,”

in 2006 Int. Symp. Intell. Signal Process. Com-

mun., 2006, pp. 574–577.

[44] Y. Xu, Y. S. Shmaliy, W. Ma, X. Jiang,

T. Shen, S. Bi, and H. Guo, “Improv-

ing tightly LiDAR/Compass/Encoder-integrated

mobile robot localization with uncertain sam-

pling period utilizing EFIR filter,” Mobile Net-

works Appl., vol. 26, pp. 440–448, 2021.

[45] M. Everingham, L. Van Gool, C. K. Williams,

J. Winn, and A. Zisserman, “The pascal visual

object classes (voc) challenge,” Int. J. Comput.

Vision, vol. 88, pp. 303–338, 2010.

[46] L. Čehovin, A. Leonardis, and M. Kristan, “Vi-

sual object tracking performance measures re-

visited,” IEEE Trans. Image Process., vol. 25,

no. 3, pp. 1261–1274, 2016.

[47] A. Barnston, “Correspondence among the corre-

lation [root mean square error] and heidke verifi-

cation measures; refinement of the heidke score

notes and correspondence, climate analysis cen-

ter 1992,” 2020.

[48] B. Karasulu and S. Korukoglu, “A software for

performance evaluation and comparison of peo-

ple detection and tracking methods in video pro-

cessing,” Multimed. Tools Appl., vol. 55, no. 3,

pp. 677–723, 2011.

[49] A. W. Smeulders, D. M. Chu, R. Cucchiara,

S. Calderara, A. Dehghan, and M. Shah, “Visual

tracking: An experimental survey,” IEEE Trans.

Pattern Anal. Mach. Intell., vol. 36, no. 7, pp.

1442–1468, 2013.

[50] D. L. Olson and D. Delen, Advanced Data Min-

ing Techniques. Springer Science & Business

Media, 2008.

[51] L. Fan, C. Kang, X. Zhang, and S. Wan, “Real-

time tracking method for a magnetic target using

total geomagnetic field intensity,” Pure Appl.

Geophys., vol. 173, pp. 2065–2071, 2016.

[52] X. R. Li and V. P. Jilkov, “Survey of maneu-

vering target tracking. Part I. Dynamic models,”

IEEE Trans. Aero. Electron. Syst., vol. 39, no. 4,

pp. 1333–1364, 2003.

[53] A. Bryson Jr and L. Henrikson, “Estimation us-

ing sampled data containing sequentially corre-

lated noise,” J. Spacecraft Rockets, vol. 5, no. 6,

pp. 662–665, 1968.

[54] Y. S. Shmaliy, S. Zhao, and C. K. Ahn, “Kalman

and UFIR state estimation with coloured mea-

surement noise using backward Euler method,”

IET Signal Process., vol. 14, no. 2, pp. 64–71,

2020.

[55] A. Bryson and D. Johansen, “Linear filtering for

time-varying systems using measurements con-

taining colored noise,” IEEE Trans. Automat.

Contr., vol. 10, no. 1, pp. 4–10, 1965.

[56] S. Zhao, Y. S. Shmaliy, and C. K. Ahn, “Bias-

constrained optimal fusion filtering for decen-

tralized WSN with correlated noise sources,”

IEEE Trans. Signal Inform. Process. Netw.,

vol. 4, no. 4, pp. 727–735, 2018.

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2024.20.7

Eli G. Pale-Ramon, Luis J. Morales-Mendoza,

Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,

Jorge A. Ortega-Contreras, Yuriy S. Shmaliy

E-ISSN: 2224-3488

Volume 20, 2024

[57] Y. S. Shmaliy, S. Zhao, and C. K. Ahn, “Un-

biased finite impluse response filtering: An it-

erative alternative to Kalman filtering ignoring

noise and initial conditions,” IEEE Contr. Syst.

Mag., vol. 37, no. 5, pp. 70–89, 2017.

[58] F. Ramirez-Echeverria, A. Sarr, and Y. S.

Shmaliy, “Optimal memory for discrete-time

FIR filters in state-space,” IEEE Trans. Signal

Process., vol. 62, no. 3, pp. 557–561, 2014.

[59] (2015) Datasets-visual tracker benchmark. [On-

line]. Available: http://www.visual-tracking.net

[60] Microsoft, “Visual object tagging tool: An elec-

tron app for building end to end object detection

models from images and videos.”

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present re-

search, at all stages from the formulation of the prob-

lem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflicts of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International , CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

Declaration of Generative AI and AI-

assisted Technologies in the Writing Process

The authors wrote, reviewed and edited the content

as needed they have not utilised artificial

intelligence (AI) tools. The authors take full

responsibility for the content of the publication.

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2024.20.7

Eli G. Pale-Ramon, Luis J. Morales-Mendoza,

Oscar G. Ibarra-Manzano, Rene Fabián Vázquez-Bautista,

Jorge A. Ortega-Contreras, Yuriy S. Shmaliy

E-ISSN: 2224-3488

Volume 20, 2024