Two different solution techniques for an
optimal control problem with a stochastic switching time
ALESSANDRA BURATTO, LUCA GROSSET, MADDALENA MUTTONI
Dipartimento di Matematica ”Tullio Levi-Civita”
Università degli Studi di Padova
Via Trieste, 63 - 35121 Padova
ITALY
Abstract: In optimal control theory, strategic decision making requires the consideration of unforeseen disruptions
that may arise within a predetermined time horizon. In this context, we introduce the concept of ”stochastic
switching time” as a random moment in time at which a sudden, irreversible alteration takes place in the system’s
dynamics or in the payoff function. To address optimal decision-making under such uncertain conditions, the
literature presents two prominent methodologies: the ”backward” approach and the ”heterogeneous” approach.
In this study, we offer an exposition and a comparative analysis of these two approaches. Finally, we present an
illustrative example to show, in a detailed context, the advantages and disadvantages associated with these two
solution strategies.
Key-Words: Optimal control, Regime shifts, Hamilton-Jacobi-Bellman equation, Pontryagin maximum
principle.
Received: May 23, 2023. Revised: August 23, 2023. Accepted: September 28, 2023. Published: October 20, 2023.
1 Introduction
In the context of dynamical systems, we call
stochastic switching time an event which
occurs at a random time τ;
changes abruptly the nature of the system;
splits the time horizon into two stages: a Stage 1
before the occurrence of τ, and a Stage 2
afterwards.
This framework finds many applications in various
areas, such as epidemiology, [1], rational risk, [2],
and renewable resources, [3], open source software,
[4], to name a few. In the context of optimal control,
we are interested to see how an optimal strategy
adjusts to the change of the regime, ie how it changes
going from Stage 1 to Stage 2, upon the occurrence of
τ.
An optimal control problem is a dynamic
optimisation problem in which the agent sets the
value of the control variable u(·),for every time in a
given programming interval [0, T ], choosing it from
a given set Uof feasible controls. Control enters
state dynamics, influencing the evolution of the state
variable x(·), whose initial value is given. Assuming
the existence and uniqueness of the solution of such
dynamics, the pair (u(·), x(·)) of control and the
corresponding state trajectory is called a process, as
done in [5]). For simplicity, in this paper, we assume
that both the control variable and the state variable
are unidimensional.
The planners objective is to maximise a certain
payoff, which is the sum of an intertemporal term and
a salvage value. The first is the integral of a profit
flow over time, which depends on the strategy and
the corresponding state trajectory; the latter is a lump
sum which depends on the final state x(T):
max
u(t)UZT
0
gt, x(t), u(t)dt +Sx(T)
subject to:
˙x(t) = ft, x(t), u(t)for t[0, T ]
x(0) = x0
where:
g(t, x, u)is the running payoff;
S(x)is the salvage value function;
f(t, x, u)is the state dynamics.
Two interesting applications of optimal control theory
to economic growth models can be found in [6] and
in [7]. There are two solution approaches to this
problem: Dynamic programming and Pontryagin’s
maximum principle, see, [5], for the standard details
of these two different techniques.
We are interested in integrating a stochastic
switching time into the optimal control framework
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.80
Alessandra Buratto, Luca Grosset, Maddalena Muttoni
E-ISSN: 2224-2880
730
Volume 22, 2023
described in the previous paragraph. We model
the stochastic switching time τas an absolutely
continuous random variable taking values in [0,+).
Due to the finiteness of the time horizon, τ
could occur during the programming interval [0, T ]
(splitting it into a Stage 1 and a Stage 2, as in Figure
1) or after T(leaving the whole interval in Stage 1, as
depicted in Figure 2).
t
0τT
Stage 1 Stage 2
Figure 1: Case τ < T
t
0Tτ
Stage 1
Figure 2: Case τT
We describe the probability distribution of τthrough
a quantity called the hazard rate of τat time t:
lim
h0+
P(τt+h
τ > t)
h=η(t, x(t)) (1)
where η(t, x)is the hazard rate function. Actually, the
hazard rate may be exogenous or endogenous, and in
the latter case it may depend both on the state variable
and the control variable. Nevertheless, in this paper,
we assume it to be endogenous and dependent only on
the state of the system.
The switch may have one or more simultaneous
effects on the system, such as a change in the state
dynamics, in the running payoff, in the salvage value
function, and in the control set. Moreover, it may
induce a jump discontinuity in the state trajectory
such that
x(τ+) = ϕτ, x(τ).
See the following table for a schematic representation
(and the respective notation) of the effects of the
switch. Observe that if the state is assumed to be
continuous in τ, then ϕ(τ, x) = x.
Stage 1 Switch Stage 2
Dynamics f1(t, x, u)f2(τ, t, x, u)
Payoff g1(t, x, u)g2(τ, t, x, u)
Salvage S1(x)S2(τ, x)
Control U1U2
Jump ϕ(τ, x)
Because the planner does not know when τwill occur,
it is necessary to plan a Stage 1 strategy for the entire
time horizon, with associated process
(u1(t), x1(t)) ,for t[0, T ]
If the switching time occurs during the programming
interval, the planner realises that τhas occurred and
when. Therefore, in the Stage 2 interval [τ, T ]they
will be able to implement the optimal strategy for any
specific occurrence of τ. The Stage 2 process hence
depends on two variables: the realisation s[0, T ]
of the switching time τ, and the time tin the Stage 2
interval [s, T ] :
(u2(s, t), x2(s, t)) ,for s[0, T ], t [s, T ]
Due to the stochasticity of τ, the planner can
only aim at maximising the expectation of the total
payoff, which takes different forms depending on the
fact that τoccurs during the programming interval
or afterwards. The switching time optimal control
problem is:
max
u1(t)U1
u2(s,t)U2
E1{τ <T }nZτ
0
g1(t, x1(t), u1(t))dt
+ZT
τ
g2(τ, t, x2(τ, t), u2(τ, t))dt
+S2(τ, x2(τ, T ))o+
1{τT}nZT
0
g1t, x1(t), u1(t)dt
+S1x1(T)o(2)
subject to:
˙x1(t) = f1(t, x1(t), u1(t)), t [0, T ]
x1(0) = x0
˙x2(s, t) = f2(s, t, x2(s, t), u2(s, t)), t [s, T ]
x2(s, s) = ϕs, x1(s)
Hazard rate of τat time t:ηt, x1(t)
We observe that the dynamics is deterministic and
is parametrically fixed for every possible realisation
sof the random variable τ. Therefore, the decision
maker fixes both the control u1(·)and the control
u2(s, ·), which are parametric in s. With abuse of
notation in the formulas above, we set: ˙x2(s, t) =
tx2(s, t). We will use the same notation in the rest
of the paper.
It should be noted that the initial condition of
the Stage 2 problem is a function of the Stage 1
variables at the switch: x2(s, s) = ϕs, x1(s).
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.80
Alessandra Buratto, Luca Grosset, Maddalena Muttoni
E-ISSN: 2224-2880
731
Volume 22, 2023
This implies that the Stage 2 problem cannot be
solved independently, unless one applies dynamic
programming, where an optimal control problem is
solved for every possible initial value. We compute
the expectation using the auxiliary Stage 1 state
variable z1(t) := P(τ > t), which is the probability
of still being in Stage 1 at time t. To view it as a state
variable, we write its dynamics and initial value:
˙z1(t) = ηt, x1(t)z1(t)
z1(0) = 1
where the dynamics is derived from the definition of
hazard rate (1). Then, the probability density of τat
time tis:
fτ(t) = ˙z1(t) = ηt, x1(t)z1(t).(3)
Both expressions in (3) are used to compute the
expectation in (2). After basic integral manipulations,
the resulting objective is:
max
u1(t)U1
u2(s,t)U2
ZT
0
z1(t)ng1t, x1(t), u1(t)
+η(t, x1(t))hS2t, x2(t, T )
+ZT
t
g2(t, θ, x2(t, θ), u2(t, θ))iodt
+z1(T)S1x1(T)
(4)
subject to:
˙x1(t) = f1t, x1(t), u1(t)
x1(0) = x0
˙z1(t) = ηt, x1(t)z1(t)
z1(0) = 1
˙x2(s, t) = f2s, t, x2(s, t), u2(s, t)
x2(s, s) = ϕs, x1(s)
where we updated the Stage 1 dynamics with the new
variable z1.
2 Solution approaches
There are two possible ways of solving the two-stage
optimal control problem defined above: the backward
approach and the heterogeneous one.
The backward approach is based on dynamic
programming: it involves solving the Stage 2 problem
for every possible occurrence sof the switch and
for every possible initial state, and then plugging
the Stage 2 value function into the Stage 1 problem,
which is solved as a simple optimal control problem,
assuming optimal behavior in Stage 2. The two stages
are solved separately (in reverse order) at the cost
of computing the Stage 2 value function for every
possible value of x2at the switch, instead of just the
value that it will take from the condition x2(s, s) =
ϕs, x1(s). For more details on this approach, see
e.g., [5].
The heterogeneous approach is based on
Pontryagin’s Maximum Principle (see, [5]): one
derives necessary conditions for the solution
by calculating the co-state functions for both
stages, as solutions of the corresponding adjoint
system, and letting the optimal strategies satisfy the
resulting maximality conditions. The two stages are
necessarily solved together, because x1and u1enter
the initial conditions of Stage 2, and (as we will
see) the Stage 2 co-states enter the Stage 1 adjoint
equations.
In what follows, we will determine the optimal
control of the switching time problem, applying the
two approaches and comparing their results.
2.1 Backward approach
First, let us solve the Stage 2 problem with dynamic
programming. Since, in general, the Stage 2 data may
depend on the realization sof the switching time τ,
the Stage 2 value function V2will depend on sas well:
V2(s, t, x) :=
sup
u(θ)U2ZT
t
g2(s, θ, x(θ), u(θ))+S2s, x(T)
(5)
subject to:
˙x(θ) = f2s, θ, x(θ), u(θ)for θ[t, T ]
x(t) = x
If V2(s, ·,·)is differentiable, then it is the solution
of the corresponding system of HJB equation and
terminal condition (parametrized by s):
tV2(s, t, x) = maxuU2g2(s, t, x, u)
+xV2(s, t, x)·f2(s, t, x, u)
V2(s, T, x) = S2(s, x)
(6)
feedback The optimal feedback strategy Φ2(s, t, x)
maximizes the RHS of the HJB equation in (6):
Φ2(s, t, x)arg max
uU2g2(s, t, x, u)
+xV2(s, t, x)·f2(s, t, x, u).
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.80
Alessandra Buratto, Luca Grosset, Maddalena Muttoni
E-ISSN: 2224-2880
732
Volume 22, 2023
Let (u1, x1)be a feasible Stage 1 process. Let the
trajectory t7→ x2(s, t)satisfy the Cauchy problem
˙x2(s, t) = f2s, t, x2(s, t),Φ2(s, t, x2(s, t))
x2(s, s) = ϕs, x1(s),
defined for t[s, T ]; then, the optimal control for
Stage 2, given (u1, x1), is
u2(s, t) = Φ2s, t, x2(s, t).
By Bellman’s Principle of Optimality, given (u1, x1)
and assuming (u2, x2)as above, we can write:
ZT
t
g2(s, θ, x2(s, θ), u2(s, θ))+S2s, x2(s, T )=
V2s, t, x2(s, t).
In particular, for s=t, we can substitute x2(t, t) =
ϕt, x1(t), yielding:
V2t, t, x2(t, t)=V2t, t, ϕ(t, x1(t).(7)
Assuming optimal behavior in Stage 2, in (4) we can
substitute the Stage 2 payoff with (7), obtaining the
following objective for Stage 1:
max
u1(t)U1ZT
0
z1(t)ng1t, x1(t), u1(t)
+ηt, x1(t)V2t, t, ϕ(t, x1(t)odt
+z1(T)S1x1(T)
(8)
subject to:
˙x1(t) = f1t, x1(t), u1(t)
x1(0) = x0
˙z1(t) = ηt, x1(t)z1(t)
z1(0) = 1
(9)
This is a simple optimal control problem, which
can be solved using either Dynamic programming
or Pontryagin’s maximum principle. It is worth
observing that the state variable z1(t)plays the role
of a discount factor.
2.2 Heterogeneous approach
By [8], if the suitable regularity assumptions on the
data hold and if (u1, x1, z1, u2, x2, z2)constitute an
optimal 2-stage process, then there exist the co-state
functions λx(t),λz(t), and ξx(s, t),ξz(s, t)such that,
once defined the “current” co-state functions
λc
x(t) := λx(t)/z1(t), ξc
x(s, t) := ξx(s, t)/z2(s, t),
the following conditions hold:
Maximality condition for Stage 1
u1(t)arg max
uU1ng1t, x1(t),u
+λc
x(t)·f1t, x1(t),uo
Maximality condition for Stage 2
u2(s, t)arg max
uU2ng2s, t, x2(s, t),u
+ξc
x(s, t)·f2s, t, x2(s, t),uo
The co-state functions are solutions of the following
adjoint system:
˙
λc
x(t) = xg1t, x1(t), u1(t)
+λc
x(t)xf1t, x1(t), u1(t)
+[ξc
x(t, t)xϕ(t, x1(t)) λc
x(t)]η(t, x1(t))
+[ξz(t, t)λz(t)]xη(t, x1(t))
λc
x(T) = xS1x1(T)
˙
ξc
x(s, t) = xg2s, t, x2(s, t), u2(s, t)
+ξc
x(s, t)xf2s, t, x2(s, t), u2(s, t)
ξc
x(s, T ) = xS2s, x2(s, T )
˙
λz(t) = g1t, x1(t), u1(t)+
+ξz(t, t)λz(t)ηt, x1(t)
λz(T) = S1x1(T)
˙
ξz(s, t) = g2s, t, x2(s, t), u2(s, t)
ξz(s, T ) = S2s, x2(s, T )
Observe that if η0(that is, without switching time)
the Stage 1 current co-state λc
xcoincides with the co-
state of a single stage optimal control problem. On the
other hand, when ηdoes not vanish, the two additional
terms:
+[ξc
x(t, t)xϕ(t, x1(t)) λc
x(t)]η(t, x1(t))
and
+[ξz(t, t)λz(t)]xη(t, x1(t))
represent the anticipating effect on the Stage 1 shadow
value of the state variable. Observe that also in the
heterogeneous approach, variables z1(t)and z2(t)
play the role of discount factors necessary to obtain
the “current” co-state functions from the regular ones.
The heterogeneous approach can also be useful
in addressing optimal control problems with age-
structured dynamics. These models are well
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.80
Alessandra Buratto, Luca Grosset, Maddalena Muttoni
E-ISSN: 2224-2880
733
Volume 22, 2023
described in the literature, [5], and allow for the
study of interesting practical problems. A very recent
example of such an application can be found in [9].
Moreover, in [10], the heterogeneous approach is
used to characterise the necessary conditions for an
age-structured optimal control problem.
3 Numerical example
In the following numerical example, we compare the
two techniques presented in the previous sections
to solve a specific switching time optimal control
problem. Let us set the data of the problem:
T= 1, x0= 1
f1(t, x, u) = u,
g1(t, x, u) = u,
S1(x) = 0, U1= [0,1]
η(t, x) = x
ϕ(s, x) = x
f2(s, t, x, u) = 0,
g2(s, t, x, u) = αu, α R
S2(s, x) = 0 U2= [0,+)
3.1 Backward approach
The maximisation in (6) gives the Stage 2 optimal
control u
2(s, t)0, therefore the HJB system
becomes tV2(s, t, x) = α,
V2(s, 1, x) = 0
that solved gives the value function V2(s, t, x) =
α(1t). Using this solution, we can write the optimal
control problem defined in (8) and (9):
max
u1[0,1] Z1
0
z1(t){−u1(t) + αx1(t)(1 t)}dt
subject to
˙x1(t) = u1(t)
x1(0) = 1
˙z1(t) = x1(t)z1(t)
z1(0) = 1
We now apply Pontryagin’s Maximum Principle,
[5] to this standard optimal control problem with
associated Hamiltonian function
H=z1{−u1+αx1(1 t)}+pxu1pzx1z1.
The optimal control in feedback form is
u
1=1{pxz1},
while the co-state equations are
˙x1(t) = 1{px(t)z1(t)}(t)
x1(0) = 1
˙z1(t) = x1(t)z1(t)
z1(0) = 1
˙px(t) = (α(1 t) + pz(t)) z1(t)
px(1) = 0
˙
pz(t) = 1{px(t)z1(t)}(t)(α(1 t)pz(t))x1(t)
pz(1) = 0
(10)
This system of ODEs can be solved using only a
numerical procedure.
3.2 Heterogeneus approach
Using the heterogeneous approach we can write the
two Maximality conditions:
For Stage 1 we get
u
1(t)arg max
u[0,1]{−u+λc
x(t)u}
hence
u
1(t) = 1{λc
x(t)1}(t),
while for Stage 2 we obtain
u
2(s, t)arg max
u[0,+){αu}
therefore u
2(s, t)0.
The adjoint system is
˙x1(t) = 1{λc
x(t)1}(t)
x1(0) = 1
˙
λc
x(t) = (α(t1) + λz(t)) + λc
x(t)x1(t)
λc
x(1) = 0
˙
λz(t) = 1{λc
x(t)1}(t)(α(t1) λz(t))x1(t)
λz(1) = 0
(11)
and this problem can only be solved using a numerical
procedure.
Even if these two approaches seem to give a different
solution, it is simple to go from (10) to (11) observing
that z1(t)is always strictly positive and
d
dt
px(t)
z1(t)= (α(1 t) + pz(t)) + px(t)
z1(t)x1(t)
which is exactly the co-state equation for λc
x(t).
4 Conclusion
Comparing the two approaches to solve a stochastic
switching time optimal control problem, we can
underline the respective pros and cons.
The backward approach offers the advantage of
deriving the optimal strategy in feedback form, which
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.80
Alessandra Buratto, Luca Grosset, Maddalena Muttoni
E-ISSN: 2224-2880
734
Volume 22, 2023
is very handy if the planner has access to the value
of the state variable at all times. However, this
comes at the cost of having to compute V2(s, t, x)for
every (s, t, x), which is generally not an easy task.
Moreover, the computation of a value function suffers
from the curse of dimensionality.
The heterogeneous approach, on the contrary,
allows one to derive the strategy only in the open-
loop form. On the other hand, it allows for a compact
and unified representation of the necessary conditions
where the interaction between the two stages is made
explicit.
Both approaches require a mathematical
formulation that involves a complex notation.
To facilitate comprehension, numerical examples
can certainly be helpful. In Section 3, we provide a
simple example that shows the equivalence of these
two approaches in terms of ODEs. In future research,
it would be useful to study additional examples that
elucidate the details of this intricate relationship.
Finally, we notice that, even starting from a
very simple example, the complexity of the system
of ODEs obtained by the necessary conditions
immediately requires a numerical procedure to find
an explicit solution.
References:
[1] A. Buratto, M. Muttoni, S. Wrzaczek &
Freiberger, Michael, Should the COVID-19
Lockdown be Relaxed or Intensified in Case
a Vaccine Becomes Available?, PLOS ONE,
Vol.17, 2022, pp. e0273557.
[2] M. Kuhn & S. Wrzaczek, Rationally Risking:
A Two-Stage Approach. In (Eds.) J.L.
Haunschmied, R.M. Kovacevic, W. Semmler
& V.M. Veliov, Dynamic Modeling and
Econometrics in Economics and Finance,
Springer, Cham, 2021, pp. 85–110.
[3] S. Polasky, A. de Zeeuw & F. Wagener,
Optimal Management with Potential Regime
Shifts, Journal of Environmental Economics and
Management, Vol. 62, 2011, pp. 229–240.
[4] A. Seidl & S. Wrzaczek, Opening the Source
Code: The Threat of Forking, Journal of
Dynamics and Games, Vol. 10, 2023, pp. 121-
150.
[5] D. Grass, Dieter, J.P. Caulkins, G Feichtinger,
G Tragler & D.A. Behrens, Optimal Control of
Nonlinear Processes, with Applications in Drugs,
Corruption, and Terror, Springer, Berlin, 2008.
[6] O. Bundău, Optimal Control Applied to a
Ramsey Model with Taxes and Exponential
Utility, WSEAS Transactions on Mathematics,
Vol.8, 2009, pp. 689-698.
[7] C. Udriste & M. Ferrara, Multitime Models
of Optimal Growth, WSEAS Transactions on
Mathematics, Vol.7, 2008, pp. 51-55.
[8] V.M. Veliov, Optimal Control of Heterogeneous
Systems: Basic theory, Journal of Mathematical
Analysis and Applications, Vol. 346, 2008, pp.
227–242.
[9] R.F. Hartl, & P.M. Kort & S. Wrzaczek,
Reputation or Warranty, What is More Effective
Against Planned Obsolescence?, International
Journal of Production Research, Vol. 61, 2023,
pp. 939-954.
[10] S. Wrzaczek & M. Kuhn & I. Frankovic, Using
Age Structure for a Multi-stage Optimal Control
Model with Random Switching Time, Journal of
Optimization Theory and Applications, Vol. 184,
2022, pp. 1065–1082.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The authors have no conflicts of interest to declare
that are relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.80
Alessandra Buratto, Luca Grosset, Maddalena Muttoni
E-ISSN: 2224-2880
735
Volume 22, 2023