Abstract
We prove that given a Markov Decision Process (MDP) and a fixed subset of its states~F, there is a Markov policy which maximizes everywhere the probability to reach F infinitely often. Moreover such a maximum policy is computable in polytime in the size of the MDP. This result can be applied in order to control a system with randomized or uncertain behavior with respect to a given property to optimize.
Get full access to this article
View all access options for this article.
