ORB Visual and WiFi Online RSSI fusion SLAM
YI-HSIEN LU1, CHIA-CHIHUANG2, CHIH-CHUNG CHOU, CHENG-FU CHOU1,2
1Department of Graduate Institute of Network and Multimedia,
National Taiwan University,
TAIWAN
2Department of Computer Science and Information Engineering,
National Taiwan University,
TAIWAN
Abstract: - Simultaneous Localization and Mapping (SLAM) technologies are indispensable for indoor service
robots, enabling them to navigate through and interact with environments. Visual SLAM systems often
encounter significant challenges such as dynamic obstacles, variable lighting, feature scarcity, and perceptual
aliasing in real-world scenarios. By merging the precise environmental mapping capabilities of visual SLAM
with the ubiquity and stability of WiFi signals, our method effectively addresses the limitations typically
associated with visual SLAM. Notably, our fusion technique leverages existing WiFi infrastructure, thus
providing a cost-effective improvement in spatial awareness without the extensive offline database
requirements of WiFi RSSI-based localization. Comparative performance evaluations highlight that our graph
optimization-based approach not only surpasses the original ORBSLAM3 method but also significantly
outperforms the Extended Kalman Filter (EKF) in terms of accuracy, particularly in environments characterized
by poor lighting, feature-less scenes, and significant occlusions. This is evidenced by a reduced Root Mean
Square Error (RMSE) in localization: 3.09m for our method versus 4.02m for EKF. This enhancement in
precision underscores the potential of our integrated system to advance indoor navigation technologies, making
it a crucial development in the field of robotics and automated systems.
Key-Words: - Real-Time, Visual SLAM, WiFi Localization, Robotic Navigation, Spatial Awareness,Sensor
Fusion.
Received: August 18, 2023. Revised: May 27, 2024. Accepted: July 13, 2024. Published: September 3, 2024.
1 Introduction
Visual SLAM (Simultaneous Localization and
Mapping) is a popular solution for indoor robot
localization by feature point extraction and
matching to positioning and mapping. Due to its
high accuracy, lightweight, low cost, and low
power consumption shown in Figure 1.
Compare some previous work.ORBSLAM3,
[1], one of the Visual SLAM SoTA methods,which
have short-term, mid-term, and long-term data
association by ORB descriptor model and adjusted
method to feature extraction and feature matching.
The ORBslam3 uses quite an efficient and precise
way to Visual, but still can’t overcome the problem
when a robot or device goes to featureless
environment would lose track and cause low
accuracy.YOLO-SLAM, a kind of improved SoTA
Visual SLAM integrated with semantic information
supported by deep learning models, it can help
robots better perceive their surrounding
environment. However, the accuracy of the
estimated position is largely dependent on feature
correspondences and can be adversely affected by
occlusion caused by dynamic objects, featureless
scenes, drastic viewpoint changes, and changes in
illumination, leading to incorrect estimations due to
false tracking correspondences, [2].
In our paper, Figure 16, Figure 18 and Figure
20 illustrate different challenge trajectories. These
figures show that without the WiFi, [3] submodule
and algorithm added to ORB-SLAM3, the system
loses track and its accuracy decreases.
Additionally, loop closure detection is a crucial
component of the SLAM system for the
relocalization of a robot in a map. Perceptual
aliasing, especially in symmetric and repetitive
environments such as indoor corridors with similar
patterns of doors and lights, can lead to false loops
and inaccurate map estimations. Our proposed
method can avoid false loop detection by using
WiFi RSSI, [3] value outliers, making the system
more robust. Figure 19 shows that using ORB-
SLAM3 without the WiFi submodule may cause
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
398
Volume 21, 2024
2
false loop detection, but with our WiFi submodule,
false loop detection can be avoided, resulting in a
more accurate and robust system.
Moreover, wireless signal-based indoor
localization has become increasingly popular in
recent years as a reliable method for identifying the
locations of IoT (Internet of Things) devices in
indoor environments, where GNSS (Global
Navigation Satellite Systems) are typically
unavailable due to the lack of a direct line-of-sight.
This has motivated various research efforts to
develop effective techniques for this type of
localization.
However, Wireless-signal-based indoor
localization [3], approaches have been able to
achieve acceptable accuracy, these methods are not
compatible with the need for centimeter-level
precision. Additionally, such strategies normally
require a predefined WiFi radio map which must be
maintained and updated regularly, making them
incompatible with the idea of SLAM, where a robot
can be placed in an unknown environment without
prior knowledge.
We present a robust life-long SLAM system
that utilizes ORB-SLAM3, [1] as its base Visual
SLAM module. This system consists of a Visual
SLAM and a WiFi SLAM module, allowing it to
address challenges with vision-based localization
and navigation. These two modules interactively
update both the vision map and WiFi map, with the
WiFi SLAM module, [3] consisting of a tracking
submodule to locate the robot when vision has
difficulty, as well as a mapping submodule that
autonomously updates with assistance from Visual
SLAM. The advantage of our system over
database-based Wireless signal based offline indoor
localization methods is its ability to adapt to
changing environments. Furthermore, WiFi
information is used in the loop detection
submodule of Visual SLAM to prevent false loop
detection, since WiFi signals are different in two
separate places with similar vision scenes. On top
of that, a real-time degeneracy detection module is
used to detect whenever the vision sensor is
degraded, which introduces a mechanism to decide
whether to compensate the degradation with WiFi
signal information. Our system enables the
combination of Visual SLAM and WiFi SLAM to
provide reliable and accurate robot localization in
dynamic indoor environments. With this, robots
can be deployed in unknown environments without
prior knowledge, and accurately localize and map
areas in real-time.
2 Background
2.1 Visual SLAM
Fig. 1: Typical visual SLAM framework, [1]
With the advantages of sensor configuration
simplicity, lightweight, and low cost, visual-based
SLAM algorithms are proposed in research. A
typical visual SLAM framework consists of
frontend visual odometry, backend optimization,
loop closure detection, and mapping modules as
shown in Figure 2. The frontend visual odometry
estimates the motion between input images from
sensors and constructs a local map using a
feature-based method or direct method. Backend
optimization then optimizes the results from visual
odometry. Simultaneously, the mapping module
constructs and maintains a global map based on the
measurements. To combat accumulated error, loop
closure detection recognizes previously visited
places, relocalizes, and improves mapping accuracy
by reducing accumulated drift caused by noise.
ORB-SLAM3 is one of the well-known
keyframe-based real-time visual SLAM algorithms,
[1] which consists of three main threads: tracking,
local mapping, and loop closing. ORB (Oriented
FAST and Rotated BRIEF) features are used in this
system, which is then transformed into map points
after the corresponding frame is selected as a
keyframe to construct the map. The tracking thread
tracks for unmapped regions using ORB features
extracted from images and matches ORB features
to map points to perform local bundle adjustment in
local mapping thread. In our system, we use
ORB-SLAM3 with RGBD cameras as our visual
SLAM module and sensors to demonstrate the
challenges of visual SLAM and how WiFi signals,
[3] can improve them.
2.2 WiFi-based Indoor Localization
The lack of availability of GNSS in indoor
environments has led to an increase in demand for
indoor localization solutions. One popular solution
is based on WiFi fingerprinting, which utilizes the
existing infrastructure of WiFi networks. This
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
399
Volume 21, 2024
method has attracted attention from both academia
and industry as it is achievable and cost-effective.
Fig. 2: Typical WiFi-based indoor localization
pipeline
A typical WiFi-based indoor localization
pipeline is shown in Figure 2. It consists of an
offline stage and an online stage. Firstly, a radio
map  construction is done in the offline
stage, where N is the number of fingerprints, and M
represents the number of access points plus 2 (X
and Y to represent locations). A fingerprint is a
vector v RM of RSSI received in a place n
with coordinates (,).Secondly, in the online
stage, the user's location is estimated by matching
the fingerprint of the current place to those on the
radio map. Traditional matching algorithms such as
K-Nearest Neighbors, Decision Tree, Random
Forest, [4], and Support Vector Machine classifiers,
[5] have been explored for years. WKNN can be
applied to WiFi localization by using the signal
strength (RSSI) values from nearby access points
as features. Given a set of RSSI measurements
from multiple access points, WKNN can determine
the k nearest neighbors (based on signal strength
similarity) to the query point (the device for which
localization is required). Random Forest can also
be utilized for WiFi localization. During inference,
the trained Random Forest model can predict the
location of a device based on its WiFi signal
strengths. Generally, machine learning-based
solutions achieve higher accuracy than traditional
methods, but they can be expensive because
training and tuning are required, and as the scale of
the model increases, more computational resources
are needed. Additionally, data-driven approaches
depend heavily on the distribution of training data,
so a natural trade-off between accuracy and
robustness needs to be considered. Both traditional
WiFi fingerprint-based indoor localization and
machine learning-based solutions require an offline
database, which does not align with the scenarios in
a SLAM system, where a robot explores and
locates itself without prior knowledge. Therefore,
our system proposes a WiFi SLAM solution that
can operate without an offline database.
2.3 Visual SLAM with WiFi
Due to the unique advantages and disadvantages of
camera and WiFi sensors, several methods, [3]
have been proposed to combine these two sensors
to compensate for each other’s weaknesses and
construct a more robust system.Proposed a system
that utilizes WiFi-based positioning methods, [4]
for mobile robot-based learning data collection,
localization, and tracking in indoor spaces. The
system combines the extended Viterbi algorithm,
tracking algorithm, odometer information, and a
new signal fluctuation matrix to improve the
accuracy of robot location tracking and the
effectiveness of building a high-quality WiFi Radio
Map.
With the help of WiFi information, they select
a subset of RGBD images that correspond to the
similar location range as the current frame for loop
closure detection, thus avoiding the perceptual
aliasing problem. In addition, computational
complexity can be reduced because of the low
computation overhead of determining WiFi
similarity, and the number of RGBD images in the
database that need to be searched is decreased by
filtering loop closure candidates via their WiFi
similarity. In our system, we also integrate WiFi
with visual SLAM to tackle the false loop closure
problem by associating a keyframe with
corresponding WiFi information. However, instead
of storing the WiFi fingerprint or signature, we
store a pose estimated by the WiFi SLAM module
in our system. Furthermore, our system not only
solves the perceptual aliasing problem but also
provides a coarse robot position sup- ported by our
WiFi SLAM module to make our system more
robust when visual SLAM is out of function.
Both Extended Kalman Filter (EKF), [5], [6]
and Graph Optimization are popular techniques
used for Simultaneous Localization and Mapping
(SLAM) in robotics and computer vision. EKF
SLAM uses a state vector to represent the robot’s
pose and the map’s features and estimates the state
vector by incorporating sensor measurements such
as odometry and range measurements. EKF SLAM
is computationally efficient and is widely used in
mobile robotics applications. On the other hand,
Graph Optimization represents the SLAM problem
as a graph, where nodes represent robot poses and
landmarks and edges represent constraints between
them. Graph Optimization finds the optimal
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
400
Volume 21, 2024
estimate of the robot’s trajectory and the map by
minimizing the error between the constraints and
the estimated values. Although Graph Optimization
is computationally more expensive than EKF, it is a
global optimization technique that can improve the
accuracy of SLAM estimates. In recent years, with
the improvement of hardware, graph optimization
has become increasingly popular in modern SLAM
algorithms.
The closest work to ours is [6], where an
EKF-based SLAM using WiFi signal strength is
proposed to estimate the pose of the robot and the
locations of the access points (APs) in the
environment. The pose estimated by WiFi signal
can be further used to improve loop closure in
visual SLAM and provide a rough localization
result. This work estimates the robot pose using a
WiFi signal and RGBD images based on an
Extended Kalman Filter (EKF), [6]. Graph
optimization is only conducted when the last frame
is detected, to optimize the pose estimation. In
contrast, our system is a full graph
optimization-based system. We implement both our
visual SLAM and WiFi SLAM modules based on
graph optimization due to the advantage of graph
optimization that it takes the whole history state
into account and is a more accurate approach that
can handle non-Gaussian errors, whereas EKF only
considers recent states, and the disadvantages of
EKF that assumes that the system’s error is
Gaussian and may lead to inconsistency in highly
non-linear systems.
2.4 Degenercy Detection
Sensors have an inevitable degradation
problem.For example, a vision sensor may degrade
in cases of poor lighting, occlusion, and featureless
scenes. Similarly, a Lidar sensor may degrade in
scenarios with self-symmetry or fewer geometric
constraints. When faced with such degradation, a
SLAM system may lose track. To improve the
robustness of a SLAM system, A well-known work,
[7], [8] proposed a general mechanism to detect
degeneracy. This work defines an optimization
based state estimation problem as 󰐝
󰐝and a degeneracy factor, D = δd/δ, where
δ represents the maximum amount of shift of an
artifact constraint, and δd is the difference between
the original estimation result and the estimation
result affected by the artifact constraint. After a
series of mathematical deductions, the degeneracy
factor D =  + 1, where  is the smallest
eigenvalue of A. With this lemma, we can
detect a degeneracy by setting a threshold for the
minimum eigenvalue and further integrating sensor
data extraction to compensate for the degradation.
3 Method
Fig. 3: illustrates the proposed system, [1], [3]
3.1 System Diagram
The input is a pair of RGBD images and WiFi
RSSIReceived Signal Strength Indication) values,
and the output is the robot poses. The Visual
SLAM module, based on ORB-SLAM3, uses
RGBD images as input and outputs an estimated
robot pose determined by visual information. On
the other hand, the WiFi SLAM module utilizes
WiFi RSSI values as input and outputs a robot pose
estimated using the WiFi signal. If a vision
degeneracy is detected, the pose estimated through
WiFi is utilized instead.The whole system is shown
in Figure 3.
3.2 Graph-based SLAM
A SLAM problem, [1], can be formulated as a
MLE (Maximum Likelihood Estimation) problem
with a probability model
󰇛 󰇜
where X represents the state and Z represents the
observation.
Two main approaches are solving the state
estimation problem, while traditional SLAM tends
to use filter-based approaches such as Kalman
filters and Particle filters, [5], modern graph-based
SLAM, [1], [6] uses a least-squares approach,
turning a SLAM problem into a least-squares
problem and solving it with the optimization
algorithm.
In ORB-SLAM, [6], map points  R3
and robot poses  SE(3), where w stands for
the world reference, are optimized minimizing the
reprojection error with respect to the matched
keypoints  R2, the error function is:
 
where π is the projection function.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
401
Volume 21, 2024
In our system, as illustrated in Figure 4, WiFi
access points are utilized as landmarks along with
Visual SLAM map points. And this is precisely the
novelty of our paper.
Fig. 4: Graph visualization of our system
3.3 WiFi SLAM Module
3.3.1 Propagation Model
The signal propagation model plays an important
role in indoor localization systems based on WiFi-
received signal strength indication (RSSI). WiFi
RSSI attenuates with distance.Signal propagation
model, [8] is described by:
󰇛󰇜 󰇛󰇜
where d and d0 are the distances from the
transmitter, P (d) and P (d0) are the received
RSSI(dbm) at distance d and d0, and η is the path
loss exponent.
3.3.2 Mapping Submodule
Mapping the submodule of the WiFi module is
implemented by estimating access point (AP)
positions, [9], [10]. As illustrated in Figure 4, WiFi
access points serve as landmarks in our system.
The location of these access points is continually
estimated and updated to maintain an up-to-date
WiFi map. The keyframe class in ORB-SLAM3
has been modified to include the observation of
access points APm (m M, with M being the
number of the access points in the environment) at
corresponding location xi, along with their
respective RSSI values. When an access point is
observed more than α times 3) and has an
RSSI value greater than β(dBm), it will be selected
as a candidate node in the graph. Before starting
optimization, the status of candidate nodes will be
further evaluated to ensure that they have been
properly initialized since a good initialization of
nodes is crucial for optimal results. To initialize,
the average location of all locations where APm
was observed will be taken as the initialization
value.
3.3.3 Tracking Submodule
With the aid of a WiFi map that is kept up-to-date
by the WiFi Mapping module, the tracking
submodule can determine the robot’s pose, denoted
as ,using a similar approach as the mapping
submodule.At an uncertain robot’s pose , we can
receive RSSI values rim from each access point
. By following the same method used in the
mapping submodule, we can construct a graph with
 as fix vertices, the difference between the
estimated RSSI value and the received RSSI value
as edges, and our estimated robot’s pose as an
estimated vertex. After optimization, the robot’s
pose, can be estimated. The main difference
between the tracking and mapping submodules is
that the tracking submodule aims to estimate the
robot’s pose with known fixed AP positions, while
the mapping submodule aims to estimate AP
locations with fixed robot’s poses.
Fig. 5: Estimation Access Point Algorithm
Explained the above method more clearly. We
employed techniques from Visual SLAM
(Simultaneous Localization and Mapping) to
initially estimate the positions of WiFi Aps and our
algorithm about estimating WiFi Aps is shown
below in Figure 5 (Algorithm 1). Once these
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
402
Volume 21, 2024
positions are approximated, they are used in
conjunction with WiFi RSSI (Received Signal
Strength Indicator) values to enhance pose
estimation. The algorithm for pose estimation is
shown below in Figure 6 (Algorithm 2). This
hybrid approach leverages the strengths of both
Visual SLAM and WiFi signal analysis. By
estimating WiFi locations first, the system can use
these locations as additional data points for more
accurate pose estimation than would be possible by
relying solely on WiFi RSSI values for localization.
This method provides a more robust and precise
navigation framework by systematically refining
both the map of the environment and the robot's
understanding of its position within it.
Fig. 6: Estimation Access Point Algorithm
3.4 Visual SLAM Module
3.4.1 Base Visual SLAM Algorithm
ORB-SLAM3, [1] is a famous open-source and
well-structured visual SLAM framework, which is
used as a research tool by many students and
researchers. We choose ORB-SLAM3 as the based
visual SLAM algorithm in our system, finding out
its weakness and improving it by integrating WiFi
as an extra sensor, [9].
3.4.2 Loop Detection Submodule
To address the issue of false loop closures, we
enhance the loop detection mechanism in the visual
SLAM module by incorporating WiFi signals. We
assume that WiFi RSSI values received in different
locations from the same access points should be
distinct and obtained in different locations should
be distinguishable. This helps to rectify false loop
closures that arise due to similar appearances in
two distinct locations shown in Figure 7.
Fig. 7: Loop detection, [1], [8]
Associated with Parts 3.3 and 3.4, our system
combines the advantages of both WiFi and Visual
SLAM to achieve a more robust navigation
solution. While WiFi-based indoor positioning
techniques typically offer meter-level accuracy,
they lack the precision of centimeter-level accuracy.
On the other hand, relying solely on Visual SLAM
can lead to instability in environments with
insufficient features. By fusing these two
technologies in our system, we create a synergy
akin to an ensemble method in deep learning.
Incorporating multiple sensors and optimizing their
positioning allows us to achieve more robust and
precise navigation outcomes.
4 Experiment
4.1 Dataset Setup of Experiments
There are a bunch of well visual SLAM
benchmarks such as TUM dataset [11], EuRoC [1]
dataset and KITTI dataset, [7], however, according
to the best of our knowledge, there is not a SLAM
dataset consists of RGB-D images and WiFi signal
that can be used in our experiments to determine
our system performance. As a result, we
constructed our dataset on the fifth floor of the
college of Electrical Engineering and Computer
Science Building (CSIE), National Taiwan
University. We use an RGB-D camera, Realsense
D435i, produced by Intel to collect RGB-D images
and an ASUS Zenbook pro15 laptop with WiFi
6E(802.11ax) network card to collect RSSI signal
from access points in the environment shown in
Figure 8.
Fig. 8: Dataset Setup of Experiments
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
403
Volume 21, 2024
CSIE 5F is a typical corridor environment. The
total area of our experimental space is
approximately 870 square meters, and it consists of
around 326 access points, including WiFi 2.4G
(802.11b) and WiFi 5G (802.11ac). For a visual
representation of the access points detected during
our data collection process, please refer to Figure 6.
Fig. 9: Access Points Distribution
4.2 Data Preprocessing
Fig. 10: Data preprocess
Due to the inherent characteristics of the
devices, the RGB-D image data is captured at a
frequency of 15 frames per second (fps), while the
WiFi RSSI data is collected at a considerably lower
frequency of only 1 fps. To ensure accurate data
preprocessing, it is essential to conduct a data
association and synchronization process to align
and harmonize the two inputs. For this purpose, we
leverage the capabilities of ROS (Robot Operating
System), a versatile and open-source framework
widely adopted in robotics for developing and
programming robotic systems. ROS provides a rich
collection of tools and libraries that enable us to
associate the RGB-D images and WiFi data by
aligning their timestamps, ensuring synchronization,
and merging them into a unified data stream that
seamlessly integrates into our system.
4.3 Evaluation and Comparison
In Wifi SLAM module,we use RSSI signal strength.
Only an RSSI value greater than β(dBm) will be
considered as valid data. To determine the optimal
threshold, we conducted tests using various RSSI
values ranging from -100 dBm to -40 dBm. After
careful evaluation, we have decided to set the
threshold at -60 dBm. This threshold demonstrated
lower error and maintained an adequate number of
valid access points, making it a suitable choice for
our system. The experimental result is depicted in
Figure 11.
Fig. 11: RSSI Threshold Table
As our system aims to create a robust and
sustainable solution that can continuously update
the vision mapping and WiFi mapping information,
we conducted a test to simulate the long-term
operation of a robot in the environment. As
depicted in Figure 9 and Figure 10, our WiFi
SLAM module demonstrates the capability to
improve its accuracy in real-time without requiring
manual interventions shown in Figure 12 and
Figure 13.
Fig. 12: Longterm Accuracy Table
Fig. 13: Longterm Accuracy Graph
Compared with Offline Methods, WKNN
(Weighted k-Nearest Neighbors) [11] and Random
Forest, [4] are commonly employed techniques for
WiFi localization. Conversely, Wi-Fi DSAR [12],
[13] is a machine learning-based approach that
utilizes an auto-encoder. In Figure 11 the error
comparison between our method and these
approaches is depicted. The results indicate that
despite the challenge of online updating without
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
404
Volume 21, 2024
prior knowledge, which is crucial for a SLAM
system, our method maintains an acceptable level
of performance when compared to these offline
database-dependent methods.
Fig. 14: Comparison with Offline Methods
Compared with EKF, the filter-based approach
has, in theory, lower accuracy compared to the
graph optimization approach. In practical terms, the
error comparison between our method and EKF
(Extended Kalman Filter) depicted in Figure 14 and
Figure 15 shows that our method utilizing graph
optimization exhibits higher accuracy than the EKF
approach.
Fig. 15: Comparison with EKF
In the Visual SLAM module, to showcase the
robustness of our system against common
challenges such as lighting variations, occlusion,
and featureless environments, and to address the
issue of false loop detection caused by similar
visual scenes, we performed a series of experiments
specifically designed to simulate these scenarios.
As depicted in Figure 13, we intentionally
created an environment with insufficient lighting to
observe the behavior of the system. In the case of
pure visual SLAM, the system experienced track
loss, resulting in an incorrect trajectory. However,
when we integrated the WiFi SLAM module to
compensate for the challenging lighting conditions,
the trajectory remained correct. This demonstrates
the effectiveness of the WiFi SLAM module in
improving robustness and ensuring accurate
trajectory estimation even in challenging lighting
situations. Figure 14 provides additional insight
into the performance of the two modules. When the
visual SLAM module lost track, the minimum
eigenvalue associated with it approached 0,
indicating a degenerated state. In contrast, the
minimum eigenvalue of the WiFi SLAM module
remained higher than 300. This demonstrates that
even when the visual information degrades, the
WiFi signal can still provide reliable measurements
without suffering from degeneracy.
Fig. 16: Lighting Challenge Trajectories
Fig. 17: Minimum Eigenvalue Comparison
(Lighting)
Similarly, we designed a scenario where a
person continuously walked around the
environment, leading to tracking loss due to
occlusion. As illustrated in Figure 16 and Figure 17,
the integration of the WiFi SLAM module, [14],
proved beneficial as it helped overcome the
occlusion challenge, leading to accurate final
results. Furthermore, Figure 17 displays a
comparison of the minimum eigenvalues. It
demonstrates that the WiFi SLAM module provides
a non-degenerate constraint, improving the
system’s robustness when the visual SLAM module
experiences degradation.
Fig. 18: Occlusion Challenge Trajectories
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
405
Volume 21, 2024
Fig. 19: Minimum Eigenvalue Comparison
(Occlusion), [12]
Finally, as part of our evaluation, we
intentionally designed two featureless scenes
within the environment to assess the performance
of our system. The results, depicted in Figure 18
and Figure 19, unequivocally demonstrate the
significant contribution of WiFi integration in
enabling the system to effectively handle
featureless scenes. The WiFi integration proves to
be a valuable asset in overcoming the challenges
posed by the absence of distinct visual features,
ultimately enhancing the system’s performance and
reliability.
Fig. 20: Feature-less Scene Challenge Trajectories
Fig. 21: Minimum Eigenvalue Comparison
(Featureless), [12]
To assess the system’s capability to eliminate
false loop detection caused by two visually similar
scenes in different locations, [15], we deliberately
designed visually similar environments at two
distinct places.
The result depicted in Figure 20, Figure 21 and
Figure 22 demonstrates that the original ORB-
SLAM3 fails to differentiate between these two
locations, leading to a false loop detection. As a
consequence, the system corrects the trajectory
based on this false loop detection, resulting in an
incorrect trajectory.
In contrast, our system incorporates WiFi
information to filter the loop detection process.
Consequently, these two visually similar places are
not identified as a loop, preventing the system from
making incorrect trajectory corrections based on
false loop detections. If the Visual extraction is
recognized as the same place (but it is not), our
Wifi fingerprint system will prevent it from false
loop detections by RSSI value outlier removal.
By Figure 20 displayed, we can discover the
trajectory can recognize it is not in the same place
so that it will not cause false loop detection which
is displayed in the bottom right corner of the image.
If we just use pure Visual SLAM (ORBSLAM3), it
will cause false loop detection and thus the
accuracy would drop very sharply.
Fig. 22: False Loop Detection, [1], [4]
5 Conclusion
Our contribution reproduced a novel structure
combine Visual SLAM and Wifi real-time
interactive framework positioning system and
mutually helps each other drawback. Our research
leverages data from both WiFi and visual sensors,
along with degeneracy detection techniques which
are more robust then ORBSLAM3, [1]. This
framework effectively enhances the robustness of
visual SLAM by addressing challenges such as
lighting variations, occlusion, and featureless
scenes. Additionally, our proposed solution
successfully eliminates the issue of false loop
detection. By combining WiFi and visual
information and implementing advanced detection
mechanisms, our framework offers an innovative
approach to improving the performance and
reliability of SLAM systems.
However, SLAM and Wifi positioning still
have some limitations. Although it can have
centimeter-level accuracy, we cannot order them to
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
406
Volume 21, 2024
do some task simply by this framework. With the
rise of multimodal research like VLM(Visual
Language Model) and LLM (Large Language
Model) we can not only position but also navigate
or order instructions to robot.This will become our
future work to research to improve our system.
Declaration of Generative AI and AI-assisted
technologies in the writing process
During the preparation of this work the authors
used ChatGPT in order to in order to improve the
readability. After using this tool/service, the
authors reviewed and edited the content as needed
and takes full responsibility for the content of the
publication.
References:
[1] C. Campos, R. Elvira, J. J. Gomez, J. M. M.
Montiel, and J. D. Tardós. ORB-SLAM3: An
accurate open-source library for visual,
visual-inertial, and multi-map SLAM. IEEE
Transactions on Robotics, 37(6):1874–1890,
2021.
[2] C.-C. Chou. Enhance SLAM Performance
with Tightly-Coupled Camera and Lidar
Fusion. PhD thesis, National Taiwan
University, Taipei, 2021.
[3] M. Abbas, M. Elhamshary, H. Rizk, M. Torki,
and M. Youssef. Wideep: Wifi-based
accurate and robust indoor localization
system using deep learning. In 2019 IEEE
International Conference on Pervasive
Computing and Communications (PerCom),
pp.1–10. IEEE, 2019.
[4] G. Biau and E. Scornet. A random forest
guided tour. Test, Springer Test, 25:197–227,
2016.
[5] A. H. Salamah, M. Tamazin, M. A. Sharkas,
and M. Khedr. An enhanced wifi indoor
localization system based on machine
learning. In 2016 International Conference
on Indoor Positioning and Indoor Navigation
(IPIN), pp.1–8, 2016.
[6] G. Welch, G. Bishop, et al. An introduction
to the kalman filter. 1995.
[7] J. Zhang, M. Kaess, and S. Singh. On
degeneracy of optimization-based state
estima- tion problems. In 2016 IEEE
International Conference on Robotics and
Automation (ICRA), pp.809–816. IEEE, 2016.
[8] G. Lee, B.-C. Moon, S. Lee, and D. Han.
Fusion of the slam with wi-fi-based
positioning methods for mobile robot-based
learning data collection, localization, and
tracking in indoor spaces. Sensors,
20(18):5182, 2020.
[9] C.-Z. Sun, B. Zhang, J.-K. Wang, and C.-S.
Zhang. A review of visual slam based on
unmanned systems. In 2021 2nd
International Conference on Artificial
Intelligence and Education (ICAIE), pp.226-
234. IEEE, 2021.
[10] A. H. Salamah, M. Tamazin, M. A. Sharkas,
and M. Khedr. An enhanced wifi indoor
localization system based on machine
learning. In 2016 International Conference
on Indoor Positioning and Indoor Navigation
(IPIN), pp.1–8, 2016.
[11] J. Sturm, N. Engelhard, F. Endres, W.
Burgard, and D. Cremers. A benchmark for
the evaluation of rgb-d slam systems. In Proc.
of the International Conference on Intelligent
Robot Systems (IROS), Oct. 2012.
[12] Y.-H. Wang, T.-W. Yang, C.-F. Chou, and I.-
C. Chang. Wi-fi dsar: Wi-fi based indoor
localization using denoising supervised
autoencoder. In 2021 30th Wireless and
Optical Communications Conference
(WOCC), pp.188-192, 2021.
[13] Yongbo Chen, Liang Zhao , Ki Myung Brian
Lee. Broadcast Your Weakness: Cooperative
Active Pose-Graph SLAM for Multiple
Robots In 2020 IEEE Robotics and
Automation Letters,
[14] Morgan Quigley,Ken Conley,Brain
PGerkey.ROS: an open-source Robot
Operating System. ICRA 2029 Workshop on
Open Source Software.
[15] W. Xue, W. Qiu, X. Hua, and K. Yu.
Improved wi-fi rssi measurement for indoor
localization. IEEE Sensors Journal,
17(7):2224–2230, 2017.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
407
Volume 21, 2024
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
- Yi-Hsien Lu, carried out the Wifi and Visual
fusion experiment preprocess and data collection.
- Chia-Chi Huang, implement C++ code to Wifi
and Visual fusion ROS code
- Chih-Chung Chou, give instruction and
suggestion
- Cheng-Fu Chou, give instruction and suggestion.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
The research is based on work supported by the
National Science and Technology Council, Taiwan,
under Grant number NSTC 112-2221-E-008-059-
MY2, 112-2221-E-002 -118 -113-2221-E-002 -
201.
Conflict of Interest
The authors have no conflicts of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.e
n_US
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.37
Yi-Hsien Lu, Chia-Chihuang,
Chih-Chung Chou, Cheng-Fu Chou
E-ISSN: 2224-3402
408
Volume 21, 2024