We study a time homogeneous discrete composite-action Markov decision process (CMDP) which needs to make multiple decisions at each state. In this particular Markov decision process, the state variables are divided into two separable sets and a two-dimensional composite action is chosen at each decision epoch. To solve a composite-action Markov decision process, we propose a novel linear programming model (Contracted Linear Programming Model, CLPM). We show that the CLPM model obtains the optimal state values of a CMDP process. We analyze and compare the number of variables and constraints of the CLPM model and the Traditional Linear Programming Model (TLPM). Computational experiments compare running times and memory usage of the two models. The CLPM model outperforms the TLPM model in both time complexity and space complexity by theoretical analysis and computational experiments.
Accepté le :
DOI : 10.1051/ro/2018081
Mots-clés : Markov decision process, linear programming, optimal state value, action
@article{RO_2019__53_5_1749_0, author = {Zhang, Zhicong and Li, Shuai and Yan, Xiaohui and Zhang, Liangwei}, title = {A linear programming based approach for composite-action {Markov} decision processes}, journal = {RAIRO - Operations Research - Recherche Op\'erationnelle}, pages = {1749--1761}, publisher = {EDP-Sciences}, volume = {53}, number = {5}, year = {2019}, doi = {10.1051/ro/2018081}, mrnumber = {4016528}, zbl = {1431.90173}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ro/2018081/} }
TY - JOUR AU - Zhang, Zhicong AU - Li, Shuai AU - Yan, Xiaohui AU - Zhang, Liangwei TI - A linear programming based approach for composite-action Markov decision processes JO - RAIRO - Operations Research - Recherche Opérationnelle PY - 2019 SP - 1749 EP - 1761 VL - 53 IS - 5 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ro/2018081/ DO - 10.1051/ro/2018081 LA - en ID - RO_2019__53_5_1749_0 ER -
%0 Journal Article %A Zhang, Zhicong %A Li, Shuai %A Yan, Xiaohui %A Zhang, Liangwei %T A linear programming based approach for composite-action Markov decision processes %J RAIRO - Operations Research - Recherche Opérationnelle %D 2019 %P 1749-1761 %V 53 %N 5 %I EDP-Sciences %U http://archive.numdam.org/articles/10.1051/ro/2018081/ %R 10.1051/ro/2018081 %G en %F RO_2019__53_5_1749_0
Zhang, Zhicong; Li, Shuai; Yan, Xiaohui; Zhang, Liangwei. A linear programming based approach for composite-action Markov decision processes. RAIRO - Operations Research - Recherche Opérationnelle, Tome 53 (2019) no. 5, pp. 1749-1761. doi : 10.1051/ro/2018081. http://archive.numdam.org/articles/10.1051/ro/2018081/
The optimal control of a two-stage tandem queueing system with flexible servers. Prob. Eng. Inf Sci. 16 (2002) 453–469. | DOI | MR | Zbl
, , ,Multi-actor Markov decision processes. J. Appl. Prob. 42 (2005) 15–26. | DOI | MR | Zbl
and ,Throughput maximization for tandem lines with two stations and flexible servers. Oper. Res. 53 (2005) 516–531. | DOI | Zbl
and ,Throughput maximization for two station tandem systems: a proof of the Andradóttir-Ayhan conjecture. Queue. Syst. 67 (2011) 365–386. | DOI | MR | Zbl
and ,On the introduction of an agile, temporary workforce into a tandem queueing system. Queue. Syst. 51 (2005) 135–171. | DOI | MR | Zbl
, and ,Priority tandem queueing model with admission control. Comput. Ind. Eng. 61 (2011) 131–140. | DOI
and ,Robustness of efficient server assignment policies to service time distributions in finite-buffered lines. Nav. Res. Logist. 57 (2010) 563–582. | DOI | MR | Zbl
, and ,Flexible servers in understaffed tandem lines, Prod. Oper. Manage. 21 (2012) 761–777. | DOI
, and .Optimal control of flexible servers in two tandem queues with operating costs. Prob. Eng. Inf. Sci. 22 (2008) 107–131. | DOI | MR | Zbl
,Optimal stochastic scheduling of two interconnected queues with varying service rates. Oper. Res. Lett. 36 (2008) 492–495. | DOI | MR | Zbl
,Markov decision processes with multidimensional action spaces. Eur. J. Oper. Res. 200 (2010) 625–628. | DOI | MR | Zbl
,Markov Decision Processes: Discrete Dynamic Stochastic Programming. John Wiley & Sons Inc, New York, NY (1994). | DOI | MR | Zbl
,Heuristics for allocation of reconfigurable resources in a serial line with reliability considerations. IIE Trans. 40 (2008) 595–611. | DOI
, and ,Nested Markov decision framework for coordinating pavement improvement with capacity expansion. J. Transp. Eng. 138 (2012) 387–394. | DOI
,Cité par Sources :