E. B. DYNKIN
Unknown
Submitted 1963-01-01 | RussiaRxiv: ru-196301.02352 | Translated from Russian

Full Text

E. B. DYNKIN

OPTIMAL CHOICE OF THE STOPPING TIME OF A MARKOV PROCESS

(Presented by Academician A. N. Kolmogorov on 14 XII 1962)

1. Let a Markov process \((x_t,\xi,\mathcal M_t,\mathbf P_x)\)* be given in a phase space \(E\), and let \(g(x)\) be a nonnegative function, and let \(\varphi_t^s\) \((0 \leq s < t < \xi)\) be a nonnegative additive functional of the process. It is required to choose a random time \(\tau\) so that the expectation \(\mathbf M_x\{g(x_\tau)-\varphi_\tau^0\}\) be maximal. Here \(\tau\) must be a Markov time (m. m.) (i.e., for every \(t\), \(\{\tau \leq t\}\in\mathcal M_t\)). The value \(g(x)\) may be interpreted as the gain received if the process is stopped at the point \(x\), and \(\varphi_t^s\) as the cost of observations during the time \([s,t]\). We shall consider in detail the case of a discrete Markov chain, and then indicate the changes that must be made for processes with continuous time.

2. Let the set \(E\) be finite or countable, let \(x_t\) \((t=0,1,2,\ldots)\) be a homogeneous Markov chain with transition function \(p(x,y)\) \((x,y\in E)\), and let

\[ \varphi_t^s=\sum_{u=s}^{t-1} c(x_u,x_{u+1}) \]

(this is the general form of a homogeneous additive functional). If \(G(x)=\mathbf M_x\varphi_\xi^0<\infty\), then for every Markov time \(\tau\)

\[ \mathbf M_x\varphi_\tau^0=G(x)-\mathbf M_xG(x_\tau)^{**}. \]

Hence it follows that

\[ \mathbf M_x[g(x_\tau)-\varphi_\tau^0] =\mathbf M_x\widetilde g(x_\tau)-G(x), \]

where \(\widetilde g=g+G\), and the problem of finding the maximum of \(\mathbf M_x[g(x_\tau)-\varphi_\tau^0]\) reduces to the problem of finding the maximum of \(\mathbf M_x\widetilde g(x_\tau)\). Thus, the case of principal interest is that in which the cost of observations \(\varphi_t^s\) is equal to zero. We shall henceforth consider only this case.

3. Put \(\mathbf P_t f(x)=\mathbf M_x f(x_t)\); in particular,

\[ \mathbf P_1 f(x)=\mathbf M_x f(x_1)=\sum p(x,y)f(y). \]

A function \(f\) with values in the interval \([0,\infty]\) is called an excessive function (e. f.) if \(\mathbf P_1 f(x)\leq f(x)\) for all \(x\in E\). We list a number of properties of e. f.
a) A constant is an e. f.
b) The exact lower bound of any set of e. f. is again an e. f.
c) If \(f\) is an e. f. and \(\tau\) is an m. m., then \(\mathbf M_x f(x_\tau)\leq f(x)\).
d) If \(f\) is an e. f. and \(\tau<\tau_1\) are two m. m., then \(\mathbf M_x f(x_{\tau_1})\leq \mathbf M_x f(x_\tau)\).
e) If \(f\) is an e. f. and \(\tau\) is the first hitting time of some set \(\Gamma\), then \(f_1(x)=\mathbf M_x f(x_\tau)\) is also an e. f.

Properties a) and b) are obvious. We prove c) and d). First suppose \(f\) is finite. For any \(0<a<1\) and any \(n\),

\[ f=h+a\mathbf P_1h+\cdots+a^{n-1}\mathbf P_{n-1}h+a^n\mathbf P_n f, \]

where \(h=f-a\mathbf P f\geq 0\). Since \(a^n\mathbf P_n f\leq a^n f\to 0\) as \(n\to\infty\), we have

\[ f=\sum_0^\infty a_n\mathbf P_nh =\mathbf M_x\sum_0^\infty a^n h(x_n). \]

Hence it is clear that

\[ \mathbf M_x a^{\tau_1} f(x_{\tau_1}) \leq \mathbf M_x a^\tau f(x_\tau) \leq f(x). \]

Letting \(a\uparrow 1\), we arrive at c) and d). If \(f\) is any e. f., then, by virtue of a) and b), \(f_n=\min(f,n)\) is a finite e. f., and from the validity of c) and d) for \(f_n\), by passage to the limit, we obtain their validity for the function \(f\). To prove e), denote by \(\tau_1\) the mo-

* \(x_t\) is the trajectory of the process, \(\xi\) is the killing time, \(\mathcal M_t\) is the collection of events observable during the time \([0,t]\), \(\mathbf P_x\) is the probability distribution corresponding to the initial state \(x\). (Concerning the terminology and notation, see (¹, ²).)

** Even if \(\mathbf M_x\varphi_\xi^0=\infty\), such a representation can often be obtained by means of some other function \(G(x)\).

the first time after time 1 at which \(\Gamma\) is reached, and note that, by virtue of d),

\[ \mathbf{P}_1 f_1(x)=\mathbf{M}_x f_1(x_1)=\mathbf{M}_x\mathbf{M}_{x_1}f(x_\tau)=\mathbf{M}_x f(x_{\tau_1})\leq f_1(x). \]

4. We shall call a function \(s(x)\) an excessive majorant (e.m.) of a function \(g(x)\) if \(s\) is an e.f. and if \(s\) is less than or equal to any e.f. \(f\geq g\). By virtue of 3, b), to obtain such a function it suffices to take the exact lower bound of all e.f. \(f\geq g\). Put \(Qf=\max(f,\mathbf{P}_1f)\). It is not hard to verify that \(Q^n g\uparrow s\) as \(n\to\infty\), and this makes it possible to compute the e.m. by the method of successive approximations. On the other hand, if the set \(E\) is finite, then the e.m. can be computed by methods of linear programming; indeed, \(s(x)\) is equal to the minimum of \(f(x)\) on the polyhedron

\[ \{f:\mathbf{P}_1f\leq f,\ f\geq g\}. \]

Theorem 1. Let \(s\) be the e.m. of the function \(g\geq0\), and let \(\tau_\varepsilon\) be the moment of first hitting of the set \(\Gamma_\varepsilon=\{x:g(x)\geq s(x)-\varepsilon\}\). Then the exact upper bound of \(\mathbf{M}_x g(x_\tau)\) over all m.m. \(\tau\) is equal to \(s(x)\). If the function \(g\) is bounded, then for any \(\varepsilon>0\)

\[ s(x)-\varepsilon\leq \mathbf{M}_x g(x_{\tau_\varepsilon})\leq s(x). \]

If the set \(E\) is finite, then \(s(x)=\mathbf{M}_x g(x_{\tau_0})\).

Theorem 1 shows that, in the case of a finite set \(E\), the optimal strategy consists in continuing observations until \(g(x)\) first becomes equal to \(s(x)\). In the countable case an optimal strategy, generally speaking, does not exist; however, for any \(\varepsilon>0\) one can obtain a strategy optimal to within \(\varepsilon\). For this it is necessary to continue observations until the inequality

\[ g(x)\geq s(x)-\varepsilon \]

is first satisfied.

5. Proof of Theorem 1. Suppose that the function \(g\) is bounded. Put \(s_\varepsilon(x)\geq \mathbf{M}_x s(x_{\tau_\varepsilon})\). We shall show that \(s_\varepsilon(x)\geq g(x)\). Let

\[ c=\sup [g(x)-s_\varepsilon(x)]\geq0. \]

Then the e.f. \(s_\varepsilon+c\) (see 3, a) and 3, d)) majorizes \(g\). Therefore \(s_\varepsilon+c\geq s\). Let \(0<\alpha<\varepsilon\). There exists \(a\in E\) such that

\[ g(a)-s_\varepsilon(a)>c-\alpha. \]

Obviously,

\[ 0\leq s(a)-g(a)\leq s_\varepsilon(a)+c-g(a)<\alpha. \]

Hence \(a\in\Gamma_\alpha\subseteq\Gamma_\varepsilon\), and \(s_\varepsilon(a)=\mathbf{M}_a s(x_{\tau_\varepsilon})=s(a)\). Thus,

\[ c-\alpha<g(a)-s_\varepsilon(a)\leq s(a)-s_\varepsilon(a)=0. \]

Letting \(\alpha\downarrow0\), we observe that \(c\leq0\), and hence \(g\leq s_\varepsilon\).

Note that \(s_\varepsilon=s\). Indeed, according to 3, c), \(s_\varepsilon\leq s\), while from the inequality \(g\leq s_\varepsilon\) it follows that \(s\leq s_\varepsilon\).

Let \(\tau\) be an m.m. Since \(g\leq s\), we have

\[ \mathbf{M}_x g(x_{\tau_\varepsilon})\leq \mathbf{M}_x s(x_{\tau_\varepsilon})\leq s(x) \]

(see 3, c)), and consequently

\[ s(x)-\varepsilon=\mathbf{M}_x s(x_{\tau_\varepsilon})-\varepsilon\leq \mathbf{M}_x g(x_{\tau_\varepsilon})\leq s(x). \]

If the set \(E\) is finite, then for all sufficiently small \(\varepsilon>0\) the set \(\Gamma_\varepsilon\) coincides with \(\Gamma_0\), and, consequently,

\[ s(x)-\varepsilon\leq \mathbf{M}_x g(x_{\tau_0})\leq s(x). \]

It follows that

\[ \mathbf{M}_x g(x_{\tau_0})=s(x). \]

Now let \(g\) be unbounded. Denote by \(s_n\) the e.m. of the bounded function

\[ g_n=\min(g,n). \]

By what has been proved, for any \(n\),

\[ \sup_\tau \mathbf{M}_x g(x_\tau)\geq \sup_\tau \mathbf{M}_x g_n(x_\tau)\geq s_n(x). \]

By virtue of 3, c) the left-hand side does not exceed \(s(x)\). Since \(s_n\uparrow s\), it is equal to \(s(x)\).

6. In solving concrete problems the following observation is often useful. In order that \(x\in\Gamma_0\), it is sufficient (and necessary) that there be found a “barrier” at the point \(x\), i.e., an e.f. majorizing \(g\) and coinciding with \(g\) at the point \(x\). On the other hand, in order that \(x\notin\Gamma_0\), it is sufficient that

\[ \mathbf{P}_1 f(x)>f(x). \]

As an example let us consider the following problem on the optimal choice of one of \(n\) objects. It is assumed that after becoming acquainted with any two objects it becomes clear which of them is better. The objects are considered in random order, and each object considered is either rejected (then one can no longer return to it) or accepted; and then the choice terminates at this point. It is required to choose a strategy that would lead with the greatest probability to the choice of the best object. Number all the objects in the order in which we become acquainted with them.

with them. Put \(x_0=1\) and denote by \(x_t\) (\(t\geqslant 1\)) the number of the first object better than the object with number \(x_{t-1}\). It is not hard to verify that \(x_0,x_1,\ldots,\ldots,x_t,\ldots\) is a Markov chain in the phase space \(E=\{1,2,\ldots,n\}\) with transition function \(p(k,m)=k/m(m-1)\) for \(k<m\), \(p(k,m)=0\) for \(k\geqslant m\). The probability that we make the best choice, stopping at the object \(x_t=k\), is equal to \(k/n\). Therefore the problem of optimal choice reduces to the problem of finding the maximum of \(\mathbf M_x g(x_\tau)\) for \(g(x)=x/n\). Let \(m_n\) denote the largest integer for which \(1/m_n+1/(m_n+1)+\cdots+1/(n-1)>1\). It is not hard to verify that, for \(m>m_n\), the function \(f(x)=\min(m/n,x/n)\) is a “barrier” at the point \(m\). On the other hand, for \(m\leqslant m_n\), \(\mathbf P_1 g(m)>g(m)\). Therefore \(\Gamma_0=\{m_n+1,m_n+2,\ldots,n\}\), and the best strategy is to stop at the first of the objects which turns out to be better than the first \(m_n\) objects examined. This prescription was proposed earlier by L. Moser and J. R. Pounder (see (³)), but its optimality apparently had not been proved. We note that \(m_n/n\to 1/e\) as \(n\to\infty\).

  1. Suppose now that \((x_t,\xi,\mathcal M_t,\mathbf P_x)\) is a standard process on the semigroup \((E,\mathcal C)\). The arguments of § 2 are applicable to this case; therefore we shall assume that the cost of observations \(\varphi_t^s\) is equal to zero.

An almost Borel \(C_0\)-lower semicontinuous function \(f(x)\) with values in \([0,+\infty]\) is called an excessive function if \(\mathbf P_t f(x)\leqslant f(x)\) for all \(t\) and \(x\). It is known (see (²) or (⁴)) that excessive functions have properties 3, a) and 3, c)—3, e). Property 3, b) does not always hold; however, the following theorem is valid.

Theorem 2. Put
\[ Q_m f=\max(f,\mathbf P_{2-m}f),\quad Q_m^\infty f=\lim_{n\to\infty} Q_m^n f \]
and
\[ Sf=\lim_{m\to\infty} Q_m^\infty\cdots Q_2^\infty Q_1^\infty f. \]
If \(g\) is an almost Borel \(C_0\)-lower semicontinuous function, then \(s=Sg\) is the excessive minorant of the function \(g\).

Proof. The sequence \(Q_m f,Q_m^2 f,\ldots\) does not decrease and therefore converges to some limit \(Q_m^\infty f\). The sequence \(Q_m^\infty\cdots Q_2^\infty Q_1^\infty f\) also does not decrease. Therefore it has a limit \(Sf\). All the operators considered map into itself the set of all almost Borel functions. By virtue of (2) (see § 3, Ch. 4) they preserve \(C_0\)-lower semicontinuity. Thus the function \(s=Sg\) is almost Borel and \(C_0\)-lower semicontinuous. For any \(m\),
\[ \mathbf P_{2-m}Q_m^\infty f\leqslant Q_m^\infty f. \]
Therefore \(\mathbf P_{2-m}s\leqslant s\). In view of the \(C_0\)-lower semicontinuity of the function \(s\), it follows from this that \(\mathbf P_t s\leqslant s\) for all \(t\). Thus \(s\) is excessive. It is not hard to see that if an excessive \(f\geqslant g\), then \(f\geqslant s\). Hence \(s\) is the excessive minorant.

  1. Theorem 1 and its proof remain valid for any standard process under the assumption that the function \(g\) is \(C_0\)-lower semicontinuous and almost Borel. Only the concluding part of the theorem concerning the conditions under which
    \[ \mathbf M_x g(x_{\tau_0})=s(x) \]
    (and, consequently, \(\tau_0\) specifies an optimal strategy) needs modification. It is sufficient, for example, to require that: a) the functions \(g(x_t)\) and \(s(x_t)\) be almost surely continuous in \(t\); b)
    \[ \mathbf P_x\{\tau_\varepsilon<\tau_0=\zeta\}\to 0 \]
    as \(\varepsilon\downarrow 0\); c) \(g(x)\) be bounded. Under these conditions the formula
    \[ \mathbf M_x g(x_{\tau_0})=\mathbf M_x s(x_{\tau_0})=s(x) \]
    can be obtained from the formula
    \[ \mathbf M_x s(x_{\tau_\varepsilon})=s(x) \]
    by passage to the limit as \(\varepsilon\downarrow 0\).

  2. The problem of optimal stopping of a random sequence \(\{x_n\}\) was considered by Snell (⁵). Instead of an excessive minorant Snell constructs a minimal supermartingale majorizing \(\{x_n\}\). In our case this supermartingale is equal to \(s(x_n)\). (This does not follow from Snell’s results.)

Moscow State University
named after M. V. Lomonosov

Received
12 XII 1962

CITED LITERATURE

¹ E. B. Dynkin, Foundations of the Theory of Markov Processes, Moscow–Leningrad, 1959.
² E. B. Dynkin, Markov Processes, Moscow–Leningrad, 1963.
³ M. Gardner, Sci. Am., 202, 2, 150 (1960); 202, 3, 172 (1960).
⁴ J. K. Hsiao, Markov Processes and Potentials, Moscow, 1962.
⁵ J. L. Snell, Trans. Am. Math. Soc., 73, 293 (1952).

Submission history

E. B. DYNKIN