An explicit solution to a discrete-time stochastic optimal control problem
MARIO LEFEBVRE
Department of Mathematics and Industrial Engineering
Polytechnique Montréal
2500, chemin de Polytechnique, Montréal (Québec) H3T 1J4
CANADA
https://www.polymtl.ca/expertises/en/lefebvre-mario
Abstract: The problem of controlling a one-dimensional Markov chain until is leaves a given set Cis considered.
The optimizer tries to minimize the time spent by the Markov chain inside C. The control variable can take two
different values. An exact formula is obtained for the value function, from which the optimal control is deduced.
Key-Words: Markov chain, homing problem, value function, difference equations, absorption problems.
Received: July 12, 2022. Revised: March 7, 2023. Accepted: March 19, 2023. Published: April 26, 2023.
1 Introduction
Assume that the controlled discrete-time stochastic
process {Xn, n = 0,1, . . .}is such that X(0) = x
and
Xn+1 =Xn+un+ϵn,(1)
where unis the control variable and ϵnis a random
variable. We assume that both unand ϵncan take
a finite number of integer values. Thus, {Xn, n =
0,1, . . .}is a controlled one-dimensional Markov
chain.
In [4], the authors considered the following prob-
lem: find the value of the control variable that mini-
mizes the expected value of the cost function
J(x) =
T(x)1
n=0
(u2
n+λ),(2)
where
T (x) := inf{n > 0 : Xn D | X0 = x C}, (3
in which D is a subset of Z and λ is a non-zero con-
stant. The random variable T (x) is a called a first-
passage time in probability theory. The set D is the
stopping region, which is the complement of the con-
tinuation region C.
This type of problem, in which the optimizer tries
to either minimize (if λ > 0) or maximize (if λ < 0)
the time spent by a controlled stochastic process in a
given region, while taking the control costs into ac-
count, is known as a homing problem. [6],
considered this problem for n-dimensional diffusion
processes. In [7], the author treated the case when the
cost criterion is risk-sensitive; see also [1], [5]. Re-
cent papers on homing problems include the follow-
ing ones by the author: [2], [3].
In [4], authors found the exact solutions to two
particular problems. They also considered the
following case: assume that the set C is {−k +
1, . . . , k 1}, where k 2, un {−2, 1, 1, 2}
and ϵn = ±1 with probability 1/2. Moreover, the
parameter λ in the cost function is strictly positive
and
T (x) := min{n > 0 : |Xn| k | X0 = x C}.(4)
Next, we define the value function
F(x) = min
un, n=0,...,T (x)1E[J(x)].(5)
Using dynamic programming, we can state the fol-
lowing lemma (see, [4]).
Lemma 1.1. The function F (x) satisfies the dynamic
programming equation
F(x) = min
u0{u2
0+λ+1
2[F(x+u01)
+F(x+u0+ 1)]}.(6)
Moreover, we have the boundary condition
F(x) = 0 if |x| k.(7)
By symmetry, we only have to consider the case
when x {0,1, . . . , k 1}. Moreover, because λ
is positive, it is clear that the optimal control is equal
to either +1 or +2 for any x0. It follows that
Equation (6) becomes
F(x) = min {1 + λ+1
2[F(x) + F(x+ 2)] ,
4 + λ+1
2[F(x+ 1) + F(x+ 3)]}.(8)
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2023.22.40
Mario Lefebvre
E-ISSN: 2224-2678
368
Volume 22, 2023
In [4], the authors showed how we can compute
the optimal control u
0(x)first for x=k1, then for
x=k2and x=k3. However, as they mentioned
in their paper, it is not easy to give a general formula
for any x {0,1, . . . , k1}in terms of the parameter
λ. In the current paper, we will show that we can give
a general expression for the value function F(x)for
any fixed value of λ. We can also then deduce the
value of the optimal control u
0(x).
2 An explicit expression for the value
function
Let us denote the function F(x)by Fi(x)if the opti-
mizer chooses u0=i, for i= 1,2. We deduce from
Eq. (8) that we have
F1(x) = 1 + λ+1
2[F1(x) + F1(x+ 2)] (9)
and
F2(x) = 4 + λ+1
2[F2(x+ 1) + F2(x+ 3)] .(10)
We can find the general solution of both difference
equations. First, Eq. (9) is a second-order linear dif-
ference equation with constant coefficients:
F1(x+ 2) F1(x) + 2(1 + λ) = 0.(11)
Its general solution can be written as follows:
F1(x) = c1(1)x+c2(1 + λ)x, (12)
where c1and c2are arbitrary constants. To deter-
mine the values of c1and c2, we impose the condi-
tions F1(k) = F1(k+ 1) = 0. Moreover, we set
F1(k+ 2) = 0.
Next, Eq. (10) is a third-order linear difference
equation with constant coefficients:
F2(x+3)+F2(x+1)2F2(x)+2(4+λ) = 0.(13)
We find that
F2(x) = d1+d2(1
2+7i
2)x
+d3(1
27i
2)x
4 + λ
2x, (14)
where d1,d2and d3are constants that are determined
from the boundary conditions F2(k) = F2(k+ 1) =
F2(k+ 2) = 0.
Remarks. (i) Even though the expression for the func-
tion F2(x)contains complex terms, it is actually real
for any integer x {0,1, . . . , k 1}.
(ii) The function Fi(x)corresponds to the expected
cost if we choose u0(x)i, for i= 1,2.
We can state the following proposition.
Proposition 2.1. The value function F(x)can be ex-
pressed as follows:
F(x) = min {1 + λ+1
2[min{F1(x), F2(x)}
+min{F1(x+ 2), F2(x+ 2)}],4 + λ
+1
2[min{F1(x+ 1), F2(x+ 1)}
+min{F1(x+ 3), F2(x+ 3)}]}(15)
for x= 0,1, . . . , k 1.
Let
G(x) := 1 + λ+1
2[min{F1(x), F2(x)}
+min{F1(x+ 2), F2(x+ 2)}](16)
and
H(x) := 4 + λ+1
2[min{F1(x+ 1), F2(x+ 1)}
+min{F1(x+ 3), F2(x+ 3)}],(17)
so that
F(x) = min{G(x), H(x)}.(18)
To determine the optimal control u
0(x) for any x in
{0, 1, . . . , k 1}, we can compare the value of G(x)
with that of H(x).
In [4], the authors proved that the value
function satisfies a non-linear difference equation.
Proposition 2.2. The value function F (x) satisfies
the non-linear third-order difference equation
0=2F2(x)F(x)[F(x+1)+2F(x+ 2)
+F(x+ 3) + 12 + 6λ]
+2(4 + λ)F(x+ 2) + 2(1 + λ)[F(x+ 1)
+F(x+ 3)] + F(x+ 1)F(x+ 2)
+F(x+ 2)F(x+ 3) + (5 + 2λ)29(19)
for x= 0,1, . . . , k 1. The boundary conditions are
F(x) = 0 if x=k, k + 1, k + 2.
Remark. There are a few misprints in [4].
2.1 A particular problem
Assume that k= 4. We find that
F1(x) = 1
2(1 + λ)(1)x+9
2(1 + λ)(1 + λ)x
(20)
and that the constants d1,d2and d3in Eq. (14) are
given by
d1=19
8(4 + λ), d2 = (4 + λ)(3i7)7
56i7 + 168
(21)
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2023.22.40
Mario Lefebvre
E-ISSN: 2224-2678
369
Volume 22, 2023
Table 1: Functions F(x),F1(x),F2(x),G(x)and
H(x), and optimal control u
0(x)for x= 0,1,2,3
when λ= 1
x F (x)F1(x)F2(x)G(x)H(x)u
0(x)
0 8 8 11.875 8 11 1
1 7 8 8.75 8 7 2
2 4 4 7.5 4 7 1
3 4 4 5 4 5 1
Table 2: Functions F(x),F1(x),F2(x),G(x)and
H(x), and optimal control u
0(x)for x= 0,1,2,3
when λ= 2
x F (x)F1(x)F2(x)G(x)H(x)u
0(x)
0 12 12 14.25 12 14.25 1
1 9 12 10.5 11.25 9 2
2 6 6 9 6 9 1
3 6 6 6 6 6 1 or 2
and
d3=(4 + λ)i7
56 .(22)
Table 1, Table 2, Table 3, Table 4 give the value
function F(x),F1(x),F2(x),G(x),H(x)and the op-
timal control u
0(x)for x= 0,1,2,3for various val-
ues of the parameter λ. Notice that, as expected, when
λis large, the optimal control is most often u
0(x) = 2.
To conclude this section, we will check that the
values of the function F(x)given in Table 1 (and
using the fact that F(x)=0for x4) are such
that Eq. (19) with λ= 1 is indeed satisfied for x=
0,1,2,3. First, when x= 0, we have
0=2×828(7 + 2 ×4+4+12+6)+10×4
+4(7 + 4) + 7 ×4+4×4 + 40.(23)
Similarly, for x= 1,x= 2 and x= 3 we have
Table 3: Functions F(x),F1(x),F2(x),G(x)and
H(x), and optimal control u
0(x)for x= 0,1,2,3
when λ= 5
x F (x)F1(x)F2(x)G(x)H(x)u
0(x)
0 21.375 24 21.375 22.6875 21.375 2
1 15 24 15.75 18.375 15 2
2 12 12 13.5 12 13.5 1
3 9 12 9 10.5 9 2
Table 4: Functions F(x),F1(x),F2(x),G(x)and
H(x), and optimal control u
0(x)for x= 0,1,2,3
when λ= 10
x F (x)F1(x)F2(x)G(x)H(x)u
0(x)
0 33.25 44 33.25 38.125 33.25 2
1 24.5 44 24.5 30.25 24.5 2
2 21 22 21 21.5 21 2
3 14 22 14 18 14 2
respectively
0=2×727(4 + 2 ×4+0+12+6)+10×4
+4(4 + 0) + 4 ×4+4×0 + 40,(24)
0=2×424(4 + 2 ×0+0+12+6)+10×0
+4(4 + 0) + 4 ×0+0×0 + 40 (25)
and
0=2×424(0 + 2 ×0+0+12+6)+10×0
+4(0 + 0) + 0 ×0+0×0 + 40.(26)
3 Conclusion
In this note, we gave an explicit and exact expres-
sion for the value function F(x)in an optimal con-
trol problem for a discrete-time and discrete-state
Markov chain that was considered by Lefebvre and
Kounta [4]. From this expression, it is possible to de-
termine the optimal control u
0(x)for any value of x
in the set {0,1, . . . , k 1}. Moreover, by symmetry,
we can write that u
0(x) = u
0(x).
We saw that the function F(x)satisfies a non-
linear third-order difference equation. Solving such
equations directly is a very difficult task. However,
we checked in a particular case that the values ob-
tained for F(x)are indeed such that the difference
equation is satisfied.
The results presented in this note can be general-
ized to the case when the control variable can take
more than two values. We could also consider this
type of problem in two or more dimensions.
References:
[1] J. Kuhn, The risk-sensitive homing problem,
Journal of Applied Probability, Vol. 22, 1985,
pp. 796-803. https://doi.org/10.2307/3213947
[2] M. Lefebvre, Minimizing or maximizing the
first-passage time to a time-dependent bound-
ary, Optimization, Vol. 71, No. 2, 2022, pp. 387-
401. https://doi.org/10.1080/02331934.2021.
1914039
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2023.22.40
Mario Lefebvre
E-ISSN: 2224-2678
370
Volume 22, 2023
[3] M. Lefebvre, The homing problem for
autoregressive processes, IMA Journal
of Mathematical Control and Informa-
tion, Vol. 39, No. 1, 2022, pp. 322-344.
https://doi.org/10.1093/imamci/dnab047
[4] M. Lefebvre and M. Kounta, Discrete
homing problems, Archives of Control
Sciences, Vol. 23, No. 1, 2013, pp. 5-18.
https://doi.org/10.2478/v10170-011-0039-6
[5] C. Makasu, Risk-sensitive control for
a class of homing problems, Automat-
ica, Vol. 45, No. 10, 2009, pp. 2454-
2455. https://doi.org/10.1016/j.automatica
.2009.06.015
[6] P. Whittle, Optimization over Time, Vol. I, Wi-
ley, Chichester, 1982.
[7] P. Whittle, Risk-sensitive Optimal Control, Wi-
ley, Chichester, 1990.
Contribution of individual authors to
the creation of a scientific article
(ghostwriting policy)
The author did all the research work of this study.
Sources of funding for research
presented in a scientific article or
scientific article itself
This research was supported by the Natural Sciences
and Engineering Research Council of Canada.
Conflict of Interest
The author has no conflict of interest to declare that is
relevant to the content of this article.
Creative Commons Attribution
License 4.0 (Attribution 4.0
International , CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en_US
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2023.22.40
Mario Lefebvre
E-ISSN: 2224-2678
371
Volume 22, 2023