Two different solution techniques for an

optimal control problem with a stochastic switching time

ALESSANDRA BURATTO, LUCA GROSSET, MADDALENA MUTTONI

Dipartimento di Matematica ”Tullio Levi-Civita”

Università degli Studi di Padova

Via Trieste, 63 - 35121 Padova

ITALY

Abstract: In optimal control theory, strategic decision making requires the consideration of unforeseen disruptions

that may arise within a predetermined time horizon. In this context, we introduce the concept of ”stochastic

switching time” as a random moment in time at which a sudden, irreversible alteration takes place in the system’s

dynamics or in the payoff function. To address optimal decision-making under such uncertain conditions, the

literature presents two prominent methodologies: the ”backward” approach and the ”heterogeneous” approach.

In this study, we offer an exposition and a comparative analysis of these two approaches. Finally, we present an

illustrative example to show, in a detailed context, the advantages and disadvantages associated with these two

solution strategies.

Key-Words: Optimal control, Regime shifts, Hamilton-Jacobi-Bellman equation, Pontryagin maximum

principle.

Received: May 23, 2023. Revised: August 23, 2023. Accepted: September 28, 2023. Published: October 20, 2023.

1 Introduction

In the context of dynamical systems, we call

stochastic switching time an event which

• occurs at a random time τ;

• changes abruptly the nature of the system;

• splits the time horizon into two stages: a Stage 1

before the occurrence of τ, and a Stage 2

afterwards.

This framework finds many applications in various

areas, such as epidemiology, [1], rational risk, [2],

and renewable resources, [3], open source software,

[4], to name a few. In the context of optimal control,

we are interested to see how an optimal strategy

adjusts to the change of the regime, ie how it changes

going from Stage 1 to Stage 2, upon the occurrence of

τ.

An optimal control problem is a dynamic

optimisation problem in which the agent sets the

value of the control variable u(·),for every time in a

given programming interval [0, T ], choosing it from

a given set Uof feasible controls. Control enters

state dynamics, influencing the evolution of the state

variable x(·), whose initial value is given. Assuming

the existence and uniqueness of the solution of such

dynamics, the pair (u(·), x(·)) of control and the

corresponding state trajectory is called a process, as

done in [5]). For simplicity, in this paper, we assume

that both the control variable and the state variable

are unidimensional.

The planner’s objective is to maximise a certain

payoff, which is the sum of an intertemporal term and

a salvage value. The first is the integral of a profit

flow over time, which depends on the strategy and

the corresponding state trajectory; the latter is a lump

sum which depends on the final state x(T):

max

u(t)∈UZT

gt, x(t), u(t)dt +Sx(T)

subject to:

˙x(t) = ft, x(t), u(t)for t∈[0, T ]

x(0) = x0

where:

•g(t, x, u)is the running payoff;

•S(x)is the salvage value function;

•f(t, x, u)is the state dynamics.

Two interesting applications of optimal control theory

to economic growth models can be found in [6] and

in [7]. There are two solution approaches to this

problem: Dynamic programming and Pontryagin’s

maximum principle, see, [5], for the standard details

of these two different techniques.

We are interested in integrating a stochastic

switching time into the optimal control framework

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.80

Alessandra Buratto, Luca Grosset, Maddalena Muttoni

E-ISSN: 2224-2880

730

Volume 22, 2023

described in the previous paragraph. We model

the stochastic switching time τas an absolutely

continuous random variable taking values in [0,+∞).

Due to the finiteness of the time horizon, τ

could occur during the programming interval [0, T ]

(splitting it into a Stage 1 and a Stage 2, as in Figure

1) or after T(leaving the whole interval in Stage 1, as

depicted in Figure 2).

0τT

Stage 1 Stage 2

Figure 1: Case τ < T

0Tτ

Stage 1

Figure 2: Case τ≥T

We describe the probability distribution of τthrough

a quantity called the hazard rate of τat time t:

lim

h→0+

P(τ≤t+h

τ > t)

h=η(t, x(t)) (1)

where η(t, x)is the hazard rate function. Actually, the

hazard rate may be exogenous or endogenous, and in

the latter case it may depend both on the state variable

and the control variable. Nevertheless, in this paper,

we assume it to be endogenous and dependent only on

the state of the system.

The switch may have one or more simultaneous

effects on the system, such as a change in the state

dynamics, in the running payoff, in the salvage value

function, and in the control set. Moreover, it may

induce a jump discontinuity in the state trajectory

such that

x(τ+) = ϕτ, x(τ).

See the following table for a schematic representation

(and the respective notation) of the effects of the

switch. Observe that if the state is assumed to be

continuous in τ, then ϕ(τ, x) = x.

Stage 1 Switch Stage 2

Dynamics f1(t, x, u)f2(τ, t, x, u)

Payoff g1(t, x, u)g2(τ, t, x, u)

Salvage S1(x)S2(τ, x)

Control U1U2

Jump ϕ(τ, x)

Because the planner does not know when τwill occur,

it is necessary to plan a Stage 1 strategy for the entire

time horizon, with associated process

(u1(t), x1(t)) ,for t∈[0, T ]

If the switching time occurs during the programming

interval, the planner realises that τhas occurred and

when. Therefore, in the Stage 2 interval [τ, T ]they

will be able to implement the optimal strategy for any

specific occurrence of τ. The Stage 2 process hence

depends on two variables: the realisation s∈[0, T ]

of the switching time τ, and the time tin the Stage 2

interval [s, T ] :

(u2(s, t), x2(s, t)) ,for s∈[0, T ], t ∈[s, T ]

Due to the stochasticity of τ, the planner can

only aim at maximising the expectation of the total

payoff, which takes different forms depending on the

fact that τoccurs during the programming interval

or afterwards. The switching time optimal control

problem is:

max

u1(t)∈U1

u2(s,t)∈U2

E1{τ <T }nZτ

g1(t, x1(t), u1(t))dt

+ZT

g2(τ, t, x2(τ, t), u2(τ, t))dt

+S2(τ, x2(τ, T ))o+

1{τ≥T}nZT

g1t, x1(t), u1(t)dt

+S1x1(T)o(2)

subject to:











˙x1(t) = f1(t, x1(t), u1(t)), t ∈[0, T ]

x1(0) = x0

˙x2(s, t) = f2(s, t, x2(s, t), u2(s, t)), t ∈[s, T ]

x2(s, s) = ϕs, x1(s)

Hazard rate of τat time t:ηt, x1(t)

We observe that the dynamics is deterministic and

is parametrically fixed for every possible realisation

sof the random variable τ. Therefore, the decision

maker fixes both the control u1(·)and the control

u2(s, ·), which are parametric in s. With abuse of

notation in the formulas above, we set: ˙x2(s, t) =

∂tx2(s, t). We will use the same notation in the rest

of the paper.

It should be noted that the initial condition of

the Stage 2 problem is a function of the Stage 1

variables at the switch: x2(s, s) = ϕs, x1(s).

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.80

Alessandra Buratto, Luca Grosset, Maddalena Muttoni

E-ISSN: 2224-2880

731

Volume 22, 2023

This implies that the Stage 2 problem cannot be

solved independently, unless one applies dynamic

programming, where an optimal control problem is

solved for every possible initial value. We compute

the expectation using the auxiliary Stage 1 state

variable z1(t) := P(τ > t), which is the probability

of still being in Stage 1 at time t. To view it as a state

variable, we write its dynamics and initial value:

˙z1(t) = −ηt, x1(t)z1(t)

z1(0) = 1

where the dynamics is derived from the definition of

hazard rate (1). Then, the probability density of τat

time tis:

fτ(t) = −˙z1(t) = ηt, x1(t)z1(t).(3)

Both expressions in (3) are used to compute the

expectation in (2). After basic integral manipulations,

the resulting objective is:

max

u1(t)∈U1

u2(s,t)∈U2

ZT

z1(t)ng1t, x1(t), u1(t)

+η(t, x1(t))hS2t, x2(t, T )

+ZT

g2(t, θ, x2(t, θ), u2(t, θ))dθiodt

+z1(T)S1x1(T)

(4)

subject to:











˙x1(t) = f1t, x1(t), u1(t)

x1(0) = x0

˙z1(t) = −ηt, x1(t)z1(t)

z1(0) = 1

˙x2(s, t) = f2s, t, x2(s, t), u2(s, t)

x2(s, s) = ϕs, x1(s)

where we updated the Stage 1 dynamics with the new

variable z1.

2 Solution approaches

There are two possible ways of solving the two-stage

optimal control problem defined above: the backward

approach and the heterogeneous one.

The backward approach is based on dynamic

programming: it involves solving the Stage 2 problem

for every possible occurrence sof the switch and

for every possible initial state, and then plugging

the Stage 2 value function into the Stage 1 problem,

which is solved as a simple optimal control problem,

assuming optimal behavior in Stage 2. The two stages

are solved separately (in reverse order) at the cost

of computing the Stage 2 value function for every

possible value of x2at the switch, instead of just the

value that it will take from the condition x2(s, s) =

ϕs, x1(s). For more details on this approach, see

e.g., [5].

The heterogeneous approach is based on

Pontryagin’s Maximum Principle (see, [5]): one

derives necessary conditions for the solution

by calculating the co-state functions for both

stages, as solutions of the corresponding adjoint

system, and letting the optimal strategies satisfy the

resulting maximality conditions. The two stages are

necessarily solved together, because x1and u1enter

the initial conditions of Stage 2, and (as we will

see) the Stage 2 co-states enter the Stage 1 adjoint

equations.

In what follows, we will determine the optimal

control of the switching time problem, applying the

two approaches and comparing their results.

2.1 Backward approach

First, let us solve the Stage 2 problem with dynamic

programming. Since, in general, the Stage 2 data may

depend on the realization sof the switching time τ,

the Stage 2 value function V2will depend on sas well:

V2(s, t, x) :=

sup

u(θ)∈U2ZT

g2(s, θ, x(θ), u(θ))dθ+S2s, x(T)

(5)

subject to:

˙x(θ) = f2s, θ, x(θ), u(θ)for θ∈[t, T ]

x(t) = x

If V2(s, ·,·)is differentiable, then it is the solution

of the corresponding system of HJB equation and

terminal condition (parametrized by s):







−∂tV2(s, t, x) = maxu∈U2g2(s, t, x, u)

+∂xV2(s, t, x)·f2(s, t, x, u)

V2(s, T, x) = S2(s, x)

(6)

feedback The optimal feedback strategy Φ2(s, t, x)

maximizes the RHS of the HJB equation in (6):

Φ2(s, t, x)∈arg max

u∈U2g2(s, t, x, u)

+∂xV2(s, t, x)·f2(s, t, x, u).

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.80

Alessandra Buratto, Luca Grosset, Maddalena Muttoni

E-ISSN: 2224-2880

732

Volume 22, 2023

Let (u1, x1)be a feasible Stage 1 process. Let the

trajectory t7→ x2(s, t)satisfy the Cauchy problem

˙x2(s, t) = f2s, t, x2(s, t),Φ2(s, t, x2(s, t))

x2(s, s) = ϕs, x1(s),

defined for t∈[s, T ]; then, the optimal control for

Stage 2, given (u1, x1), is

u2(s, t) = Φ2s, t, x2(s, t).

By Bellman’s Principle of Optimality, given (u1, x1)

and assuming (u2, x2)as above, we can write:

g2(s, θ, x2(s, θ), u2(s, θ))dθ+S2s, x2(s, T )=

V2s, t, x2(s, t).

In particular, for s=t, we can substitute x2(t, t) =

ϕt, x1(t), yielding:

V2t, t, x2(t, t)=V2t, t, ϕ(t, x1(t).(7)

Assuming optimal behavior in Stage 2, in (4) we can

substitute the Stage 2 payoff with (7), obtaining the

following objective for Stage 1:

max

u1(t)∈U1ZT

z1(t)ng1t, x1(t), u1(t)

+ηt, x1(t)V2t, t, ϕ(t, x1(t)odt

+z1(T)S1x1(T)

(8)

subject to:











˙x1(t) = f1t, x1(t), u1(t)

x1(0) = x0

˙z1(t) = −ηt, x1(t)z1(t)

z1(0) = 1

(9)

This is a simple optimal control problem, which

can be solved using either Dynamic programming

or Pontryagin’s maximum principle. It is worth

observing that the state variable z1(t)plays the role

of a discount factor.

2.2 Heterogeneous approach

By [8], if the suitable regularity assumptions on the

data hold and if (u1, x1, z1, u2, x2, z2)constitute an

optimal 2-stage process, then there exist the co-state

functions λx(t),λz(t), and ξx(s, t),ξz(s, t)such that,

once defined the “current” co-state functions

λc

x(t) := λx(t)/z1(t), ξc

x(s, t) := ξx(s, t)/z2(s, t),

the following conditions hold:

Maximality condition for Stage 1

u1(t)∈arg max

u∈U1ng1t, x1(t),u

+λc

x(t)·f1t, x1(t),uo

Maximality condition for Stage 2

u2(s, t)∈arg max

u∈U2ng2s, t, x2(s, t),u

+ξc

x(s, t)·f2s, t, x2(s, t),uo

The co-state functions are solutions of the following

adjoint system:











−˙

λc

x(t) = ∂xg1t, x1(t), u1(t)

+λc

x(t)∂xf1t, x1(t), u1(t)

+[ξc

x(t, t)∂xϕ(t, x1(t)) −λc

x(t)]η(t, x1(t))

+[ξz(t, t)−λz(t)]∂xη(t, x1(t))

λc

x(T) = ∂xS1x1(T)

−˙

ξc

x(s, t) = ∂xg2s, t, x2(s, t), u2(s, t)

+ξc

x(s, t)∂xf2s, t, x2(s, t), u2(s, t)

ξc

x(s, T ) = ∂xS2s, x2(s, T )

−˙

λz(t) = g1t, x1(t), u1(t)+

+ξz(t, t)−λz(t)ηt, x1(t)

λz(T) = S1x1(T)

−˙

ξz(s, t) = g2s, t, x2(s, t), u2(s, t)

ξz(s, T ) = S2s, x2(s, T )

Observe that if η≡0(that is, without switching time)

the Stage 1 current co-state λc

xcoincides with the co-

state of a single stage optimal control problem. On the

other hand, when ηdoes not vanish, the two additional

terms:

+[ξc

x(t, t)∂xϕ(t, x1(t)) −λc

x(t)]η(t, x1(t))

and

+[ξz(t, t)−λz(t)]∂xη(t, x1(t))

represent the anticipating effect on the Stage 1 shadow

value of the state variable. Observe that also in the

heterogeneous approach, variables z1(t)and z2(t)

play the role of discount factors necessary to obtain

the “current” co-state functions from the regular ones.

The heterogeneous approach can also be useful

in addressing optimal control problems with age-

structured dynamics. These models are well

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.80

Alessandra Buratto, Luca Grosset, Maddalena Muttoni

E-ISSN: 2224-2880

733

Volume 22, 2023

described in the literature, [5], and allow for the

study of interesting practical problems. A very recent

example of such an application can be found in [9].

Moreover, in [10], the heterogeneous approach is

used to characterise the necessary conditions for an

age-structured optimal control problem.

3 Numerical example

In the following numerical example, we compare the

two techniques presented in the previous sections

to solve a specific switching time optimal control

problem. Let us set the data of the problem:

•T= 1, x0= 1

•f1(t, x, u) = u,

g1(t, x, u) = −u,

S1(x) = 0, U1= [0,1]

•η(t, x) = x

•ϕ(s, x) = x

•f2(s, t, x, u) = 0,

g2(s, t, x, u) = α−u, α ∈R

S2(s, x) = 0 U2= [0,+∞)

3.1 Backward approach

The maximisation in (6) gives the Stage 2 optimal

control u∗

2(s, t)≡0, therefore the HJB system

becomes −∂tV2(s, t, x) = α,

V2(s, 1, x) = 0

that solved gives the value function V2(s, t, x) =

α(1−t). Using this solution, we can write the optimal

control problem defined in (8) and (9):

max

u1∈[0,1] Z1

z1(t){−u1(t) + αx1(t)(1 −t)}dt

subject to











˙x1(t) = u1(t)

x1(0) = 1

˙z1(t) = −x1(t)z1(t)

z1(0) = 1

We now apply Pontryagin’s Maximum Principle,

[5] to this standard optimal control problem with

associated Hamiltonian function

H=z1{−u1+αx1(1 −t)}+pxu1−pzx1z1.

The optimal control in feedback form is

u∗

1=1{px−z1},

while the co-state equations are











˙x1(t) = 1{px(t)−z1(t)}(t)

x1(0) = 1

˙z1(t) = −x1(t)z1(t)

z1(0) = 1

˙px(t) = (−α(1 −t) + pz(t)) z1(t)

px(1) = 0

pz(t) = 1{px(t)−z1(t)}(t)−(α(1 −t)−pz(t))x1(t)

pz(1) = 0

(10)

This system of ODEs can be solved using only a

numerical procedure.

3.2 Heterogeneus approach

Using the heterogeneous approach we can write the

two Maximality conditions:

For Stage 1 we get

u∗

1(t)∈arg max

u∈[0,1]{−u+λc

x(t)u}

hence

u∗

1(t) = 1{λc

x(t)−1}(t),

while for Stage 2 we obtain

u∗

2(s, t)∈arg max

u∈[0,+∞){α−u}

therefore u∗

2(s, t)≡0.

The adjoint system is











˙x1(t) = 1{λc

x(t)−1}(t)

x1(0) = 1

λc

x(t) = (−α(t−1) + λz(t)) + λc

x(t)x1(t)

λc

x(1) = 0

λz(t) = 1{λc

x(t)−1}(t)−(α(t−1) −λz(t))x1(t)

λz(1) = 0

(11)

and this problem can only be solved using a numerical

procedure.

Even if these two approaches seem to give a different

solution, it is simple to go from (10) to (11) observing

that z1(t)is always strictly positive and

px(t)

z1(t)= (−α(1 −t) + pz(t)) + px(t)

z1(t)x1(t)

which is exactly the co-state equation for λc

x(t).

4 Conclusion

Comparing the two approaches to solve a stochastic

switching time optimal control problem, we can

underline the respective pros and cons.

The backward approach offers the advantage of

deriving the optimal strategy in feedback form, which

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.80

Alessandra Buratto, Luca Grosset, Maddalena Muttoni

E-ISSN: 2224-2880

734

Volume 22, 2023

is very handy if the planner has access to the value

of the state variable at all times. However, this

comes at the cost of having to compute V2(s, t, x)for

every (s, t, x), which is generally not an easy task.

Moreover, the computation of a value function suffers

from the curse of dimensionality.

The heterogeneous approach, on the contrary,

allows one to derive the strategy only in the open-

loop form. On the other hand, it allows for a compact

and unified representation of the necessary conditions

where the interaction between the two stages is made

explicit.

Both approaches require a mathematical

formulation that involves a complex notation.

To facilitate comprehension, numerical examples

can certainly be helpful. In Section 3, we provide a

simple example that shows the equivalence of these

two approaches in terms of ODEs. In future research,

it would be useful to study additional examples that

elucidate the details of this intricate relationship.

Finally, we notice that, even starting from a

very simple example, the complexity of the system

of ODEs obtained by the necessary conditions

immediately requires a numerical procedure to find

an explicit solution.

References:

[1] A. Buratto, M. Muttoni, S. Wrzaczek &

Freiberger, Michael, Should the COVID-19

Lockdown be Relaxed or Intensified in Case

a Vaccine Becomes Available?, PLOS ONE,

Vol.17, 2022, pp. e0273557.

[2] M. Kuhn & S. Wrzaczek, Rationally Risking:

A Two-Stage Approach. In (Eds.) J.L.

Haunschmied, R.M. Kovacevic, W. Semmler

& V.M. Veliov, Dynamic Modeling and

Econometrics in Economics and Finance,

Springer, Cham, 2021, pp. 85–110.

[3] S. Polasky, A. de Zeeuw & F. Wagener,

Optimal Management with Potential Regime

Shifts, Journal of Environmental Economics and

Management, Vol. 62, 2011, pp. 229–240.

[4] A. Seidl & S. Wrzaczek, Opening the Source

Code: The Threat of Forking, Journal of

Dynamics and Games, Vol. 10, 2023, pp. 121-

150.

[5] D. Grass, Dieter, J.P. Caulkins, G Feichtinger,

G Tragler & D.A. Behrens, Optimal Control of

Nonlinear Processes, with Applications in Drugs,

Corruption, and Terror, Springer, Berlin, 2008.

[6] O. Bundău, Optimal Control Applied to a

Ramsey Model with Taxes and Exponential

Utility, WSEAS Transactions on Mathematics,

Vol.8, 2009, pp. 689-698.

[7] C. Udriste & M. Ferrara, Multitime Models

of Optimal Growth, WSEAS Transactions on

Mathematics, Vol.7, 2008, pp. 51-55.

[8] V.M. Veliov, Optimal Control of Heterogeneous

Systems: Basic theory, Journal of Mathematical

Analysis and Applications, Vol. 346, 2008, pp.

227–242.

[9] R.F. Hartl, & P.M. Kort & S. Wrzaczek,

Reputation or Warranty, What is More Effective

Against Planned Obsolescence?, International

Journal of Production Research, Vol. 61, 2023,

pp. 939-954.

[10] S. Wrzaczek & M. Kuhn & I. Frankovic, Using

Age Structure for a Multi-stage Optimal Control

Model with Random Switching Time, Journal of

Optimization Theory and Applications, Vol. 184,

2022, pp. 1065–1082.

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.80

Alessandra Buratto, Luca Grosset, Maddalena Muttoni

E-ISSN: 2224-2880

735

Volume 22, 2023