Markov Decision Processes (MDPs) are a classical framework for stochastic sequential decision problems, based on an enumerated state space representation. More compact and structured representations have been proposed: factorization techniques use state variables representations, while decomposition techniques are based on a partition of the state space into sub-regions and take advantage of the resulting structure of the state transition graph. We use a family of probabilistic exploration-like planning problems in order to study the influence of the modeling structure on the MDP solution. We first discuss the advantages and drawbacks of a graph based representation of the state space, then present our comparisons of two decomposition techniques, and propose to use a global approach combining both state space factorization and decomposition techniques. On the exploration problem instance, it is proposed to take advantage of the natural topological structure of the navigation space, which is partitioned into regions. A number of local policies are optimized within each region, that become the macro-actions of the global abstract MDP resulting from the decomposition. The regions are the corresponding macro-states in the abstract MDP. The global abstract MDP is obtained in a factored form, combining all the initial MDP state variables and one macro-state “region” variable standing for the different possible macro-states corresponding to the regions. Further research is presently conducted on efficient solution algorithms implementing the same hybrid approach for tackling large size MDPs.
Mots-clés : probabilistic planning, dynamic programming, Markov decision processes, application to autonomous decision making
@article{RO_2006__40_2_195_0, author = {Teichteil-K\"onigsbuch, Florent and Fabiani, Patrick}, title = {Influence of modeling structure in probabilistic sequential decision problems}, journal = {RAIRO - Operations Research - Recherche Op\'erationnelle}, pages = {195--234}, publisher = {EDP-Sciences}, volume = {40}, number = {2}, year = {2006}, doi = {10.1051/ro:2006019}, zbl = {1112.90092}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ro:2006019/} }
TY - JOUR AU - Teichteil-Königsbuch, Florent AU - Fabiani, Patrick TI - Influence of modeling structure in probabilistic sequential decision problems JO - RAIRO - Operations Research - Recherche Opérationnelle PY - 2006 SP - 195 EP - 234 VL - 40 IS - 2 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ro:2006019/ DO - 10.1051/ro:2006019 LA - en ID - RO_2006__40_2_195_0 ER -
%0 Journal Article %A Teichteil-Königsbuch, Florent %A Fabiani, Patrick %T Influence of modeling structure in probabilistic sequential decision problems %J RAIRO - Operations Research - Recherche Opérationnelle %D 2006 %P 195-234 %V 40 %N 2 %I EDP-Sciences %U http://archive.numdam.org/articles/10.1051/ro:2006019/ %R 10.1051/ro:2006019 %G en %F RO_2006__40_2_195_0
Teichteil-Königsbuch, Florent; Fabiani, Patrick. Influence of modeling structure in probabilistic sequential decision problems. RAIRO - Operations Research - Recherche Opérationnelle, Tome 40 (2006) no. 2, pp. 195-234. doi : 10.1051/ro:2006019. http://archive.numdam.org/articles/10.1051/ro:2006019/
[1] Decision-theoretic military operations planning, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 402-412.
, and ,[2] Dynamic Programming. Princeton University Press, Princeton, NJ (1957). | MR | Zbl
,[3] Neuro-dynamic programming: an overview (1995).
and ,[4] Labeled rtdp: Improving the convergence of real-time dynamic programming, in Proceedings of 13th Conf. ICAPS 2003, Trento, Italy (2003) 12-21.
and ,[5] Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, USA, AAAI Press / The MIT Press (1996) 1168-1175.
and ,[6] Correlated action effects in decision theoretic regression, in Uncertainty in Artificial Intelligence (1997) 30-37.
,[7] Structured reachability analysis for Markov decision processes, in Uncertainty in Artificial Intelligence (1998) 24-32.
, and ,[8] Decision-theoretic planning: Structural assumptions and computational leverage. J. Artificial Intell. Res. 11 (1999) 1-94. | Zbl
, and ,[9] Stochastic dynamic programming with factored representations. Artificial Intell. 121 (2000) 49-107. | Zbl
, and ,[10] Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Computer science, U. of Illinois, Providence R.I. (1998).
,[11] A model for reasoning about persistence and causation. Computational Intelligence 5 (1989) 142-150.
and ,[12] Decomposition techniques for planning in stochastic domains, in Proceedings of the 14th IJCAI 1995, San Francisco, CA (1995) 1121-1129.
and ,[13] Abstraction and approximate decision-theoretic planning. Artificial Intell. 89 (1997) 219-283. | Zbl
and ,[14] Hierarchical reinforcement learning using the maxq value function decomposition. J. Artificial Intell. Res. 13 (2000) 227-303. | Zbl
,[15] A survey of sparse matrix research, in Proceedings of the IEEE, Prentice Hall, New York 65 (1977) 500-535.
,[16] Direct Methods for Sparse Matrices. Clarendon Press, Oxford (1986). | MR | Zbl
, and ,[17] Solving pomdp's using selected past events, in Proceedings of the 14th ECAI 2000, Berlin, Germany (July 2000) 281-285.
,[18] Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Reasoning under Uncertainty in Robotics - RUR'2005, Edinburgh, Scotland (2005).
and ,[19] Symbolic heuristic search for factored markov decision processes, in Proceedings of 18th Conf. AAAI 2002, Edmonton, Alberta, Canada (2002) 455-460.
and ,[20] Symbolic generalization for on-line planning, in Proceedings of 19th Conf. UAI 2003, Acapulco, Mexico (2003) 209-216.
, and ,[21] Solving factored mdps with continuous and discrete variables, in Proceedings of 20th Conf. UAI 2004, Banff, Canada (2004).
, and ,[22] Efficient solution algorithms for factored mdps. J. Artificial Intell. Res. 19 (2003) 399-468. | Zbl
, , and ,[23] Lao*: A heuristic search algorithm that finds solutions with loops. Artificial Intell. 129 (2001) 35-62. | Zbl
and ,[24] Hierarchical solution of markov decision processes using macro-actions, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 220-229.
, , , and ,[25] Spudd: Stochastic planning using decision diagrams, in Proceedings of 15th Conf. UAI 1999, San Francisco, CA (1999) 279-288.
, , and ,[26] Optimal and approximate stochastic planning using decision diagrams. Technical Report TR-2000-05, University of British Columbia, 10 (2000).
, , and ,[27] Solving factored mdps using non-homogeneous partitions. Artificial Intell. 147 (2003) 225-251. | Zbl
and ,[28] Heuristic refinements of approximate linear programming for factored continuous-state markov decision processes, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 306-314.
and ,[29] A Game-Theoretic Framework for Robot Motion Planning. Electrical engineering, University of Illinois, Urbana-Champaign (1995).
,[30] A survey of algorithmic methods for partially observed markov decision processes. Technical Report 28, Annals of Operation Research (1991). | MR | Zbl
,[31] Flexible decomposition algorithms for weakly coupled markov decision problems, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 422-430.
,[32] Policy-contingent abstraction for robust robot control, in Conference on Uncertainty in Articifical Intelligence (UAI) (2003) 477-484.
, and ,[33] Markov Decision Processes. John Wiley & Sons, INC (1994). | MR | Zbl
,[34] Algebraic Decision Diagrams and Their Applications, in IEEE /ACM International Conference on CAD, Santa Clara, California, 1993. IEEE Computer Society Press 188-191.
, , , , , and ,[35] Iterative Methods for Sparse Linear Systems. Society of Industrial and Applied Mathematics, second edition (2003). | MR | Zbl
,[36] Graph partitioning techniques for markov decision processes decomposition, in Proceedings of the 15th ECAI 2002, Lyon, France (July 2002) 670-674.
,[37] APRICODD: Approximate policy construction using decision diagrams, in NIPS (2000) 1089-1095.
, and ,[38] Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998).
and .[39] Processus Décisionnel de Markov décomposé et factorisé pour l'optimisation d'une stratégie d'exploration. Revue d'Intelligence Artificielle 20 (2006) 133-179.
and ,[40] Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Planning under Uncertainty for Autonomous Systems ICAPS'2005, Monterey, CA, USA (2005).
and ,[41] Deployment and Maintenance of a Constellation of Satellites: a Benchmark, in Proceedings of ICAPS'03 Workshop on Planning under Uncertainty and Incomplete Information, Trento, Italy (June 2003).
, and ,Cité par Sources :