Sage Journals: Discover world-class research

Abstract

Continuous Partially Observable Markov Decision Processes (POMDPs) with general belief-dependent rewards are notoriously difficult to solve online. In this paper, we present a complete provable theory of adaptive multilevel simplification for the setting of a given externally constructed belief tree and Monte Carlo Tree Search (MCTS) that constructs the belief tree on the fly using an exploration technique. Our theory allows to accelerate POMDP planning with belief-dependent rewards without any sacrifice in the quality of the obtained solution. We rigorously prove each theoretical claim in the proposed unified theory. Using the general theoretical results, we present three algorithms to accelerate continuous POMDP online planning with belief-dependent rewards. Our two algorithms, SITH-BSP and LAZY-SITH-BSP, can be utilized on top of any method that constructs a belief tree externally. The third algorithm, SITH-PFT, is an anytime MCTS method that permits to plug-in any exploration technique. All our methods are guaranteed to return exactly the same optimal action as their unsimplified equivalents. We replace the costly computation of information-theoretic rewards with novel adaptive upper and lower bounds which we derive in this paper, and are of independent interest. We show that they are easy to calculate and can be tightened by the demand of our algorithms. Our approach is general; namely, any bounds that monotonically converge to the reward can be utilized to achieve a significant speedup without any loss in performance. Our theory and algorithms support the challenging setting of continuous states, actions, and observations. The beliefs can be parametric or general and represented by weighted particles. We demonstrate in simulation a significant speedup in planning compared to baseline approaches with guaranteed identical performance.

Keywords

Decision-making under uncertainty belief space planning partially observable Markov decision processes belief-dependent rewards planning with imperfect information

Get full access to this article

View all access options for this article.

References

Araya

Buffet

Thomas

, et al. (2010) A pomdp extension with belief-dependent rewards. In: Advances in Neural Information Processing Systems (NIPS). Glasgow, Scotland: Curran Associates, Inc, 64–72.

Auer

Cesa-Bianchi

Fischer

(2002) Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2): 235–256.

Auger

Couetoux

Teytaud

(2013) Continuous upper confidence trees with polynomial exploration–consistency. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2013, proceedings, part I 13, Prague, Czech Republic, 23–27 September 2013, 194–209. Springer.

Barenboim

Indelman

(2022) Adaptive information belief space planning. In: The 31st international joint conference on artificial intelligence and the 25th European conference on artificial intelligence (IJCAI-ECAI), Vienna, Austria, 23–29 July 2022.

Barenboim

Indelman

(2023) Online pomdp planning with anytime deterministic guarantees. In: Advances in Neural Information Processing Systems (NIPS). Glasgow, Scotland: Curran Associates, Inc.

Boers

Driessen

Bagchi

, et al. (2010) Particle filter based entropy. In: 2010 13th international conference on information fusion, Edinburgh, UK, 26–29 July 2010, pp. 1–8. DOI: 10.1109/ICIF.2010.5712013.

Burgard

Fox

Thrun

(1997) Active mobile robot localization. In: 15th international joint conference on artificial intelligence (IJCAI 97), Nagoya, Japan, 23–29 August 1997, 1346–1352. Citeseer.

Crisan

Doucet

(2002) A survey of convergence results on particle filtering for practitioners. IEEE Transactions on Signal Processing 50(3): 736–746.

Dressel

Kochenderfer

Barbulescu

, et al. (2017) Efficient decision-theoretic target localization. In: Smith

(ed) Mausam and proceedings of the twenty-seventh international conference on automated planning and scheduling, ICAPS 2017, Pittsburgh, PA, 18–23 June 2017, 70–78. AAAI Press.

10.

Egorov

Sunberg

Balaban

, et al. (2017) Pomdps. jl: a framework for sequential decision making under uncertainty. Journal of Machine Learning Research 18(1): 831–835.

11.

Elimelech

Indelman

(2022) Simplified decision making in the belief space using belief sparsification. The International Journal of Robotics Research 41(5): 470–496.

12.

Farhi

Indelman

(2019) iX-BSP: belief space planning through incremental expectation. In: IEEE Intl. Conf. on robotics and automation (ICRA), Montreal, QC, 20–24 May 2019.

13.

Farhi

Indelman

(2021) ix-bsp: incremental belief space planning.ArXiv Preprint arXiv:2102.09539.

14.

Fehr

Buffet

Thomas

, et al. (2018) rho-pomdps have lipschitz-continuous epsilon-optimal value functions. In: Bengio

Wallach

Larochelle

, et al. (eds) Advances in Neural Information Processing Systems. Glasgow, Scotland: Curran Associates, Inc, 6933–6943.

15.

Fischer

Tas

(2020) Information particle filter tree: an online algorithm for pomdps with belief-based rewards on continuous domains. In: International conference on machine learning (ICML), Vienna, Austria, 12–18 July 2020.

16.

Garg

Hsu

Lee

(2019) Despot-α: online pomdp planning with large state and observation spaces. In: Robotics: science and systems (RSS), Freiburg im Breisgau, Germany, 22–26 June 2019.

17.

Hoerger

Kurniawati

(2021) An on-line pomdp solver for continuous observation spaces. In: IEEE Intl. Conf. on robotics and automation (ICRA), Xi’an, China, 30 May–5 June 2021, 7643–7649. IEEE.

18.

Hoerger

Kurniawati

Elfes

(2019) Multilevel monte-carlo for solving pomdps online. In: Proc. international symposium on robotics research (ISRR), Hanoi, Vietnam, 6–10 October 2019.

19.

Hoerger

Kurniawati

Bandyopadhyay

, et al. (2020) Linearization in motion planning under uncertainty. In: Algorithmic Foundations of Robotics XII: Proceedings of the Twelfth Workshop on the Algorithmic Foundations of Robotics. Berlin, Germany: Springer, 272–287.

20.

Hollinger

Sukhatme

(2014) Sampling-based robotic information gathering algorithms. The International Journal of Robotics Research 33: 1271–1287.

21.

Indelman

Carlone

Dellaert

(2015) Planning in the continuous domain: a generalized belief space approach for autonomous navigation in unknown environments. The International Journal of Robotics Research 34(7): 849–882.

22.

Kearns

Mansour

(2002) A sparse sampling algorithm for near-optimal planning in large markov decision processes. Machine Learning 49(2): 193–208.

23.

Kitanov

Indelman

(2024) Topological belief space planning for active slam with pairwise Gaussian potentials and performance guarantees. The International Journal of Robotics Research 43(1): 69–97. DOI: 10.1177/02783649231204898.

24.

Kochenderfer

Wheeler

Wray

(2022) Algorithms for Decision Making. Cambridge, MA: MIT Press.

25.

Kocsis

Szepesvári

(2006) Bandit based Monte-Carlo planning. In: Machine Learning: ECML. Berlin, Germany: Springer, 282–293.

26.

Kopitkov

Indelman

(2017) No belief propagation required: belief space planning in high-dimensional state spaces via factor graphs, the matrix determinant lemma, and re-use of calculation. The International Journal of Robotics Research 36(10): 1088–1130.

27.

Kopitkov

Indelman

(2019) General-purpose incremental covariance update and efficient belief space planning via a factor-graph propagation action tree. The International Journal of Robotics Research 38(14): 1644–1673.

28.

Kurniawati

Hsu

Lee

(2008) SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems (RSS). Cambridge, MA: MIT Press.

29.

Lev-Yehudi

Barenboim

Indelman

(2024) Simplifying complex observation models in continuous pomdp planning with probabilistic guarantees and practice. Proceedings of the AAAI Conference on Artificial Intelligence 38(18): 20176–20184.

30.

Munos

(2014) From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning. Delft, the Netherlands: Now Publishers.

31.

Papadimitriou

Tsitsiklis

(1987) The complexity of Markov decision processes. Mathematics of Operations Research 12(3): 441–450.

32.

Pathak

Thomas

Indelman

(2018) A unified framework for data association aware robust belief space planning and perception. Intl. J. of Robotics Research 32(2–3): 287–315.

33.

Platt

Tedrake

Kaelbling

, et al. (2010) Belief space planning assuming maximum likelihood observations. In: Robotics: science and systems (RSS), Zaragoza, Spain, 27–30 June 2010, pp. 587–593.

34.

Shienman

Indelman

(2022) Nonmyopic distilled data association belief space planning under budget constraints. In: Robotics Research. ISRR 2022. Springer Proceedings in Advanced Robotics. Cham, Switzerland: Springer.

35.

Silver

Veness

(2010) Monte-carlo planning in large pomdps. In: Advances in Neural Information Processing Systems). Glasgow, Scotland: Curran Associates, Inc, pp. 2164–2172.

36.

Smith

Simmons

(2004) Heuristic search value iteration for pomdps. In: Conf. on uncertainty in artificial intelligence (UAI), Arlington, VA, 7 July 2004, pp. 520–527.

37.

Spaan

Veiga

Lima

(2015) Decision-theoretic planning under uncertainty with information rewards for active cooperative perception. Autonomous Agents and Multi-Agent Systems 29(6): 1157–1185.

38.

Stachniss

Grisetti

Burgard

(2005) Information gain-based exploration using Rao-Blackwellized particle filters. In: Robotics: science and systems (RSS), Cambridge, MA, 8–11 June 2005, pp. 65–72.

39.

Sunberg

Kochenderfer

(2018) Online algorithms for pomdps with continuous state, action, and observation spaces. In: Proceedings of the international conference on automated planning and scheduling, Delft, the Netherlands, 24–29 June 2018, 259–263.

40.

Sutton

Barto

(2018) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.

41.

Sztyglic

Indelman

(2022) Speeding up online pomdp planning via simplification. In: IEEE/RSJ intl. conf. on intelligent robots and systems (IROS), Kyoto, Japan, 23–27 October 2022.

42.

Thrun

Burgard

Fox

(2005) Probabilistic Robotics. Cambridge, MA: The MIT Press.

43.

Van Den Berg

Patil

Alterovitz

(2012) Motion planning under uncertainty using iterative local optimization in belief space. The International Journal of Robotics Research 31(11): 1263–1278.

44.

Walsh

Goschin

Littman

(2010) Integrating sample-based planning and model-based reinforcement learning. In: AAAI conf. on artificial intelligence, Atlanta, GA, 11–15 July 2010, 612–617.

45.

Somani

Hsu

, et al. (2017) Despot: online pomdp planning with regularization. Journal of Artificial Intelligence Research 58: 231–266.

46.

Zhitnikov

Indelman

(2022a) Risk aware adaptive belief-dependent probabilistically constrained continuous pomdp planning. ArXiv Preprint arXiv:2209.02679.

47.

Zhitnikov

Indelman

(2022b) Simplified risk aware decision making with belief dependent rewards in partially observable domains. Artificial Intelligence 312: 103775.

48.

Zhitnikov

Indelman

(2024) Simplified continuous high dimensional belief space planning with adaptive probabilistic belief-dependent constraints. IEEE Transactions on Robotics 40: 1684–1705.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.09 MB

0.00 MB

No compromise in solution quality: Speeding up belief-dependent continuous partially observable Markov decision processes via adaptive multilevel simplification

Abstract

Keywords

Get full access to this article

References

Supplementary Material