ML algorithms have substantially increased in various
research domains and applied fields especially in cellular
technologies. They grow fast and extensively to handle the
mobility management in cellular networks. In [5], the authors
use RL algorithms in order to optimize handovers with user
mobility under a dynamic small-cell network. In [6], the
authors combine fuzzy-based function with the Q-Learning
to control and optimize the HO and load balancing issue. By
considering the velocities and locations of a user, the authors
of [7] attempt to maximize the throughput of the terrestrial
users under a given location and velocity by using RL optimal
HO decision-making policy. The works of [8] implement a
hidden Markov process in order to reduce latency mobile
networks, learn the optimal control for HOs, and predict the
next connected access point.
In [9], the authors propose a novel method to minimize
the interference in a cellular network caused by the drones
on the ground users using deep RL algorithms. In [10],
cell selection and handover measurements are discussed for
drones connected to an LTE in a suburban environment.
Simulations show the increasing in the number of the HOs
while increasing flight altitude. As discussed in the prior
work, mobility challenges pertaining to drone communications
is widely suggested in the literature. While, efficient HOs
optimization for drone has received little attention. To this
end, in this work, a HO mechanism based on Q-learning is
investigated in different topology of cellular connected drone
networks, i.e. Rural, semi-Rural, or Urban. We also suggest
the impact of the hyper-parameters on the average number of
HOs.
In this work, we consider three cellular networks topologies,
each consisting of different number of geo-spatially deployed
ground Base stations (BSs) in order to serve the UAVs. These
latter are supposed to fly in a two-dimensional (2D) trajectory
with a fixed height hUAV . While flying, an UAV may operate
several HOs by switching from a BS to another in order to
maintain reliable connectivity. Several factors may lead to a
HO process such as the BS distribution, the received signal at
the UAV, its speed, height or trajectory.
Let Krepresent the number of the base stations separated
by a distance dBS , and Crepresent the number of cells per
base station. Three types of cellular network are considered,
i.e. Rural, Semi-rural, and Urban, with an area of same length
L={−l/2,+l/2}and width W={−w/2,+w/2}but
different Kand dBS by taking into account the base station
deployment in each environment. Propagation Path Loss (P L)
estimation is an important constraint to formulate and design
cellular networks. Generally, P L can be influenced by terrain
contours, environment (Urban or Rural), propagation medium
(dry or moist air), the distance between the transmitter and the
receiver, and the height and location of antennas. We use two
different definition of the P L, for Rural or Urban environment,
introduced in the 3GP P reference as follows [1]:
P L{Rural}= max(23.9−1.8∗log10 (hUAV ,20)∗log10 (d3D)+
20 ∗log10 (40 ∗π∗fc
3)(1)
P L{U rban}= 28 + 22 ∗log10(d3D) + 20 ∗log10 (f c)(2)
where hUAV donates the height of the drone, d3Drepresents
the 3Ddistance from the drone to the base station, and fc
is the transmission bandwidth. For a more realistic model,
we also consider the standard deviation (σ) of the shadowing
propagation in the environment defined in [1] as follows:
σ{Rural}= 4.2∗exp(−0.0046 ∗hUAV )(3)
σ{U rban}= 4.64 ∗exp(−0.0066 ∗hUAV )(4)
To evaluate the quality of the signal, we mainly focus on
the Reference Signals Received Power (RSRP ) as introduced
in [11]:
RSRP =Ptx −10∗log10(12∗f c)−P L −Sh+GUAV +GK
where Ptx represents the maximum transmit power from the
base station, Sh donates the probability density function of
the shadowing with a standard deviation σ.GUAV and GK
respectively represent the antenna gain of UAV and the BSs.
At first, Ndrone trajectories are generated in order to train
and test the RL algorithm: 2N/3are used to train the model
and N/3for testing.
We note that, the initial location and the destination for
each trajectory are generated in the range of {−l/4, l/4}
and {−w/4, w/4}in order to avoid border effect, dropped
calls, access failures, and dead zones. We suppose that each
trajectory is divided into several waypoints with a distance
dUAV between them. As long as the initial and final location
for each trajectory are randomly generated, then each of them
may have different length with different number of waypoints.
When the initial location of each trajectory has been generated,
the drone selects the shortest path to reach the final location.
In particular, the drone selects a movement direction θs∈
{r.π/4, r = 0,1, ..., 7}and moves in a fixed distance dUAV
to get the next waypoint. This procedure is repeated until the
drone reaches its final destination.
Let xsand ysrepresent the 2D drone’s position, and cs
being the currently connected cell. We subsequently define
s={xs, ys, θs, cs}as the state of the drone at each waypoint.
Using eq. 3, we can obtain the RSRP value for the k-strongest
cells at each waypoint in the environment in which we define
Cksthat contains the kstrongest cells at state s.
At each waypoint, the drone has to make an action A
by selecting a serving cell among the k-strongest cells. We
note that decision-making approaches are better in the long
run as compared to the baseline approaches in which the
5HODWHG:RUNV
6\VWHP0RGHO
(QYLURQPHQW*HQHUDWRU
'URQH7UDMHFWRU\*HQHUDWRU
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2022.10.12
Mahmoud Almasri, Xavier Marjou, Fanny Parzysz