An explicit solution to a discrete-time stochastic optimal control problem

MARIO LEFEBVRE

Department of Mathematics and Industrial Engineering

Polytechnique Montréal

2500, chemin de Polytechnique, Montréal (Québec) H3T 1J4

CANADA

https://www.polymtl.ca/expertises/en/lefebvre-mario

Abstract: The problem of controlling a one-dimensional Markov chain until is leaves a given set Cis considered.

The optimizer tries to minimize the time spent by the Markov chain inside C. The control variable can take two

different values. An exact formula is obtained for the value function, from which the optimal control is deduced.

Key-Words: Markov chain, homing problem, value function, difference equations, absorption problems.

Received: July 12, 2022. Revised: March 7, 2023. Accepted: March 19, 2023. Published: April 26, 2023.

1 Introduction

Assume that the controlled discrete-time stochastic

process {Xn, n = 0,1, . . .}is such that X(0) = x

and

Xn+1 =Xn+un+ϵn,(1)

where unis the control variable and ϵnis a random

variable. We assume that both unand ϵncan take

a finite number of integer values. Thus, {Xn, n =

0,1, . . .}is a controlled one-dimensional Markov

chain.

In [4], the authors considered the following prob-

lem: find the value of the control variable that mini-

mizes the expected value of the cost function

J(x) =

T(x)−1

∑

n=0

(u2

n+λ),(2)

where

T (x) := inf{n > 0 : Xn ∈ D | X0 = x ∈ C}, (3

in which D is a subset of Z and λ is a non-zero con-

stant. The random variable T (x) is a called a first-

passage time in probability theory. The set D is the

stopping region, which is the complement of the con-

tinuation region C.

This type of problem, in which the optimizer tries

to either minimize (if λ > 0) or maximize (if λ < 0)

the time spent by a controlled stochastic process in a

given region, while taking the control costs into ac-

count, is known as a homing problem. [6],

considered this problem for n-dimensional diffusion

processes. In [7], the author treated the case when the

cost criterion is risk-sensitive; see also [1], [5]. Re-

cent papers on homing problems include the follow-

ing ones by the author: [2], [3].

In [4], authors found the exact solutions to two

particular problems. They also considered the

following case: assume that the set C is {−k +

1, . . . , k − 1}, where k ≥ 2, un ∈ {−2, −1, 1, 2}

and ϵn = ±1 with probability 1/2. Moreover, the

parameter λ in the cost function is strictly positive

and

T (x) := min{n > 0 : |Xn| ≥ k | X0 = x ∈ C}.(4)

Next, we define the value function

F(x) = min

un, n=0,...,T (x)−1E[J(x)].(5)

Using dynamic programming, we can state the fol-

lowing lemma (see, [4]).

Lemma 1.1. The function F (x) satisfies the dynamic

programming equation

F(x) = min

u0{u2

0+λ+1

2[F(x+u0−1)

+F(x+u0+ 1)]}.(6)

Moreover, we have the boundary condition

F(x) = 0 if |x| ≥ k.(7)

By symmetry, we only have to consider the case

when x∈ {0,1, . . . , k −1}. Moreover, because λ

is positive, it is clear that the optimal control is equal

to either +1 or +2 for any x≥0. It follows that

Equation (6) becomes

F(x) = min {1 + λ+1

2[F(x) + F(x+ 2)] ,

4 + λ+1

2[F(x+ 1) + F(x+ 3)]}.(8)

WSEAS TRANSACTIONS on SYSTEMS

DOI: 10.37394/23202.2023.22.40

Mario Lefebvre

E-ISSN: 2224-2678

368

Volume 22, 2023

In [4], the authors showed how we can compute

the optimal control u∗

0(x)first for x=k−1, then for

x=k−2and x=k−3. However, as they mentioned

in their paper, it is not easy to give a general formula

for any x∈ {0,1, . . . , k−1}in terms of the parameter

λ. In the current paper, we will show that we can give

a general expression for the value function F(x)for

any fixed value of λ. We can also then deduce the

value of the optimal control u∗

0(x).

2 An explicit expression for the value

function

Let us denote the function F(x)by Fi(x)if the opti-

mizer chooses u0=i, for i= 1,2. We deduce from

Eq. (8) that we have

F1(x) = 1 + λ+1

2[F1(x) + F1(x+ 2)] (9)

and

F2(x) = 4 + λ+1

2[F2(x+ 1) + F2(x+ 3)] .(10)

We can find the general solution of both difference

equations. First, Eq. (9) is a second-order linear dif-

ference equation with constant coefficients:

F1(x+ 2) −F1(x) + 2(1 + λ) = 0.(11)

Its general solution can be written as follows:

F1(x) = c1(−1)x+c2−(1 + λ)x, (12)

where c1and c2are arbitrary constants. To deter-

mine the values of c1and c2, we impose the condi-

tions F1(k) = F1(k+ 1) = 0. Moreover, we set

F1(k+ 2) = 0.

Next, Eq. (10) is a third-order linear difference

equation with constant coefficients:

F2(x+3)+F2(x+1)−2F2(x)+2(4+λ) = 0.(13)

We find that

F2(x) = d1+d2(−1

2+√7i

2)x

+d3(−1

2−√7i

2)x

−4 + λ

2x, (14)

where d1,d2and d3are constants that are determined

from the boundary conditions F2(k) = F2(k+ 1) =

F2(k+ 2) = 0.

Remarks. (i) Even though the expression for the func-

tion F2(x)contains complex terms, it is actually real

for any integer x∈ {0,1, . . . , k −1}.

(ii) The function Fi(x)corresponds to the expected

cost if we choose u0(x)≡i, for i= 1,2.

We can state the following proposition.

Proposition 2.1. The value function F(x)can be ex-

pressed as follows:

F(x) = min {1 + λ+1

2[min{F1(x), F2(x)}

+min{F1(x+ 2), F2(x+ 2)}],4 + λ

2[min{F1(x+ 1), F2(x+ 1)}

+min{F1(x+ 3), F2(x+ 3)}]}(15)

for x= 0,1, . . . , k −1.

Let

G(x) := 1 + λ+1

2[min{F1(x), F2(x)}

+min{F1(x+ 2), F2(x+ 2)}](16)

and

H(x) := 4 + λ+1

2[min{F1(x+ 1), F2(x+ 1)}

+min{F1(x+ 3), F2(x+ 3)}],(17)

so that

F(x) = min{G(x), H(x)}.(18)

To determine the optimal control u∗

0(x) for any x in

{0, 1, . . . , k − 1}, we can compare the value of G(x)

with that of H(x).

In [4], the authors proved that the value

function satisfies a non-linear difference equation.

Proposition 2.2. The value function F (x) satisfies

the non-linear third-order difference equation

0=2F2(x)−F(x)[F(x+1)+2F(x+ 2)

+F(x+ 3) + 12 + 6λ]

+2(4 + λ)F(x+ 2) + 2(1 + λ)[F(x+ 1)

+F(x+ 3)] + F(x+ 1)F(x+ 2)

+F(x+ 2)F(x+ 3) + (5 + 2λ)2−9(19)

for x= 0,1, . . . , k −1. The boundary conditions are

F(x) = 0 if x=k, k + 1, k + 2.

Remark. There are a few misprints in [4].

2.1 A particular problem

Assume that k= 4. We find that

F1(x) = −1

2(1 + λ)(−1)x+9

2(1 + λ)−(1 + λ)x

(20)

and that the constants d1,d2and d3in Eq. (14) are

given by

d1=19

8(4 + λ), d2 = (4 + λ)(3i−√7)√7

56i√7 + 168

(21)

WSEAS TRANSACTIONS on SYSTEMS

DOI: 10.37394/23202.2023.22.40

Mario Lefebvre

E-ISSN: 2224-2678

369

Volume 22, 2023

Table 1: Functions F(x),F1(x),F2(x),G(x)and

H(x), and optimal control u∗

0(x)for x= 0,1,2,3

when λ= 1

x F (x)F1(x)F2(x)G(x)H(x)u∗

0(x)

0 8 8 11.875 8 11 1

1 7 8 8.75 8 7 2

2 4 4 7.5 4 7 1

3 4 4 5 4 5 1

Table 2: Functions F(x),F1(x),F2(x),G(x)and

H(x), and optimal control u∗

0(x)for x= 0,1,2,3

when λ= 2

x F (x)F1(x)F2(x)G(x)H(x)u∗

0(x)

0 12 12 14.25 12 14.25 1

1 9 12 10.5 11.25 9 2

2 6 6 9 6 9 1

3 6 6 6 6 6 1 or 2

and

d3=−(4 + λ)i√7

56 .(22)

Table 1, Table 2, Table 3, Table 4 give the value

function F(x),F1(x),F2(x),G(x),H(x)and the op-

timal control u∗

0(x)for x= 0,1,2,3for various val-

ues of the parameter λ. Notice that, as expected, when

λis large, the optimal control is most often u∗

0(x) = 2.

To conclude this section, we will check that the

values of the function F(x)given in Table 1 (and

using the fact that F(x)=0for x≥4) are such

that Eq. (19) with λ= 1 is indeed satisfied for x=

0,1,2,3. First, when x= 0, we have

0=2×82−8(7 + 2 ×4+4+12+6)+10×4

+4(7 + 4) + 7 ×4+4×4 + 40.(23)

Similarly, for x= 1,x= 2 and x= 3 we have

Table 3: Functions F(x),F1(x),F2(x),G(x)and

H(x), and optimal control u∗

0(x)for x= 0,1,2,3

when λ= 5

x F (x)F1(x)F2(x)G(x)H(x)u∗

0(x)

0 21.375 24 21.375 22.6875 21.375 2

1 15 24 15.75 18.375 15 2

2 12 12 13.5 12 13.5 1

3 9 12 9 10.5 9 2

Table 4: Functions F(x),F1(x),F2(x),G(x)and

H(x), and optimal control u∗

0(x)for x= 0,1,2,3

when λ= 10

x F (x)F1(x)F2(x)G(x)H(x)u∗

0(x)

0 33.25 44 33.25 38.125 33.25 2

1 24.5 44 24.5 30.25 24.5 2

2 21 22 21 21.5 21 2

3 14 22 14 18 14 2

respectively

0=2×72−7(4 + 2 ×4+0+12+6)+10×4

+4(4 + 0) + 4 ×4+4×0 + 40,(24)

0=2×42−4(4 + 2 ×0+0+12+6)+10×0

+4(4 + 0) + 4 ×0+0×0 + 40 (25)

and

0=2×42−4(0 + 2 ×0+0+12+6)+10×0

+4(0 + 0) + 0 ×0+0×0 + 40.(26)

3 Conclusion

In this note, we gave an explicit and exact expres-

sion for the value function F(x)in an optimal con-

trol problem for a discrete-time and discrete-state

Markov chain that was considered by Lefebvre and

Kounta [4]. From this expression, it is possible to de-

termine the optimal control u∗

0(x)for any value of x

in the set {0,1, . . . , k −1}. Moreover, by symmetry,

we can write that u∗

0(−x) = −u∗

0(−x).

We saw that the function F(x)satisfies a non-

linear third-order difference equation. Solving such

equations directly is a very difficult task. However,

we checked in a particular case that the values ob-

tained for F(x)are indeed such that the difference

equation is satisfied.

The results presented in this note can be general-

ized to the case when the control variable can take

more than two values. We could also consider this

type of problem in two or more dimensions.

References:

[1] J. Kuhn, The risk-sensitive homing problem,

Journal of Applied Probability, Vol. 22, 1985,

pp. 796-803. https://doi.org/10.2307/3213947

[2] M. Lefebvre, Minimizing or maximizing the

first-passage time to a time-dependent bound-

ary, Optimization, Vol. 71, No. 2, 2022, pp. 387-

401. https://doi.org/10.1080/02331934.2021.

1914039

WSEAS TRANSACTIONS on SYSTEMS

DOI: 10.37394/23202.2023.22.40

Mario Lefebvre

E-ISSN: 2224-2678

370

Volume 22, 2023

[3] M. Lefebvre, The homing problem for

autoregressive processes, IMA Journal

of Mathematical Control and Informa-

tion, Vol. 39, No. 1, 2022, pp. 322-344.

https://doi.org/10.1093/imamci/dnab047

[4] M. Lefebvre and M. Kounta, Discrete

homing problems, Archives of Control

Sciences, Vol. 23, No. 1, 2013, pp. 5-18.

https://doi.org/10.2478/v10170-011-0039-6

[5] C. Makasu, Risk-sensitive control for

a class of homing problems, Automat-

ica, Vol. 45, No. 10, 2009, pp. 2454-

2455. https://doi.org/10.1016/j.automatica

.2009.06.015

[6] P. Whittle, Optimization over Time, Vol. I, Wi-

ley, Chichester, 1982.

[7] P. Whittle, Risk-sensitive Optimal Control, Wi-

ley, Chichester, 1990.

Contribution of individual authors to

the creation of a scientific article

(ghostwriting policy)

The author did all the research work of this study.

Sources of funding for research

presented in a scientific article or

scientific article itself

This research was supported by the Natural Sciences

and Engineering Research Council of Canada.

Conflict of Interest

The author has no conflict of interest to declare that is

relevant to the content of this article.

Creative Commons Attribution

License 4.0 (Attribution 4.0

International , CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en_US

WSEAS TRANSACTIONS on SYSTEMS

DOI: 10.37394/23202.2023.22.40

Mario Lefebvre

E-ISSN: 2224-2678

371

Volume 22, 2023