A linear programming based approach for composite-action Markov decision processes
RAIRO - Operations Research - Recherche Opérationnelle, Tome 53 (2019) no. 5, pp. 1749-1761.

We study a time homogeneous discrete composite-action Markov decision process (CMDP) which needs to make multiple decisions at each state. In this particular Markov decision process, the state variables are divided into two separable sets and a two-dimensional composite action is chosen at each decision epoch. To solve a composite-action Markov decision process, we propose a novel linear programming model (Contracted Linear Programming Model, CLPM). We show that the CLPM model obtains the optimal state values of a CMDP process. We analyze and compare the number of variables and constraints of the CLPM model and the Traditional Linear Programming Model (TLPM). Computational experiments compare running times and memory usage of the two models. The CLPM model outperforms the TLPM model in both time complexity and space complexity by theoretical analysis and computational experiments.

Reçu le :
Accepté le :
DOI : 10.1051/ro/2018081
Classification : 90C40, 90C05
Mots-clés : Markov decision process, linear programming, optimal state value, action
Zhang, Zhicong 1 ; Li, Shuai 1 ; Yan, Xiaohui 1 ; Zhang, Liangwei 1

1
@article{RO_2019__53_5_1749_0,
     author = {Zhang, Zhicong and Li, Shuai and Yan, Xiaohui and Zhang, Liangwei},
     title = {A linear programming based approach for composite-action {Markov} decision processes},
     journal = {RAIRO - Operations Research - Recherche Op\'erationnelle},
     pages = {1749--1761},
     publisher = {EDP-Sciences},
     volume = {53},
     number = {5},
     year = {2019},
     doi = {10.1051/ro/2018081},
     mrnumber = {4016528},
     zbl = {1431.90173},
     language = {en},
     url = {http://archive.numdam.org/articles/10.1051/ro/2018081/}
}
TY  - JOUR
AU  - Zhang, Zhicong
AU  - Li, Shuai
AU  - Yan, Xiaohui
AU  - Zhang, Liangwei
TI  - A linear programming based approach for composite-action Markov decision processes
JO  - RAIRO - Operations Research - Recherche Opérationnelle
PY  - 2019
SP  - 1749
EP  - 1761
VL  - 53
IS  - 5
PB  - EDP-Sciences
UR  - http://archive.numdam.org/articles/10.1051/ro/2018081/
DO  - 10.1051/ro/2018081
LA  - en
ID  - RO_2019__53_5_1749_0
ER  - 
%0 Journal Article
%A Zhang, Zhicong
%A Li, Shuai
%A Yan, Xiaohui
%A Zhang, Liangwei
%T A linear programming based approach for composite-action Markov decision processes
%J RAIRO - Operations Research - Recherche Opérationnelle
%D 2019
%P 1749-1761
%V 53
%N 5
%I EDP-Sciences
%U http://archive.numdam.org/articles/10.1051/ro/2018081/
%R 10.1051/ro/2018081
%G en
%F RO_2019__53_5_1749_0
Zhang, Zhicong; Li, Shuai; Yan, Xiaohui; Zhang, Liangwei. A linear programming based approach for composite-action Markov decision processes. RAIRO - Operations Research - Recherche Opérationnelle, Tome 53 (2019) no. 5, pp. 1749-1761. doi : 10.1051/ro/2018081. http://archive.numdam.org/articles/10.1051/ro/2018081/

H.S. Ahn, I. Duenyas, M.E. Lewis, The optimal control of a two-stage tandem queueing system with flexible servers. Prob. Eng. Inf Sci. 16 (2002) 453–469. | DOI | MR | Zbl

H.S. Ahn and R. Righter, Multi-actor Markov decision processes. J. Appl. Prob. 42 (2005) 15–26. | DOI | MR | Zbl

S. Andradóttir and H. Ayhan, Throughput maximization for tandem lines with two stations and flexible servers. Oper. Res. 53 (2005) 516–531. | DOI | Zbl

J.J. Hasenbein and B. Kim, Throughput maximization for two station tandem systems: a proof of the Andradóttir-Ayhan conjecture. Queue. Syst. 67 (2011) 365–386. | DOI | MR | Zbl

D.L. Kaufman, H.S. Ahn and M.E. Lewis, On the introduction of an agile, temporary workforce into a tandem queueing system. Queue. Syst. 51 (2005) 135–171. | DOI | MR | Zbl

C. Kim and S. Dudin, Priority tandem queueing model with admission control. Comput. Ind. Eng. 61 (2011) 131–140. | DOI

E. Krkzlar, S. Andradóttir and H. Ayhan, Robustness of efficient server assignment policies to service time distributions in finite-buffered lines. Nav. Res. Logist. 57 (2010) 563–582. | DOI | MR | Zbl

E. Krkzlar, S. Andradóttir and H. Ayhan. Flexible servers in understaffed tandem lines, Prod. Oper. Manage. 21 (2012) 761–777. | DOI

D.G. Pandelis, Optimal control of flexible servers in two tandem queues with operating costs. Prob. Eng. Inf. Sci. 22 (2008) 107–131. | DOI | MR | Zbl

D.G. Pandelis, Optimal stochastic scheduling of two interconnected queues with varying service rates. Oper. Res. Lett. 36 (2008) 492–495. | DOI | MR | Zbl

D.G. Pandelis, Markov decision processes with multidimensional action spaces. Eur. J. Oper. Res. 200 (2010) 625–628. | DOI | MR | Zbl

M.L. Puterman, Markov Decision Processes: Discrete Dynamic Stochastic Programming. John Wiley & Sons Inc, New York, NY (1994). | DOI | MR | Zbl

C.H. Wu, D.G. Down and M.E. Lewis, Heuristics for allocation of reconfigurable resources in a serial line with reliability considerations. IIE Trans. 40 (2008) 595–611. | DOI

J. Yang, Nested Markov decision framework for coordinating pavement improvement with capacity expansion. J. Transp. Eng. 138 (2012) 387–394. | DOI

Cité par Sources :