Scheduling Multilayer Divisible Computations
RAIRO - Operations Research - Recherche Opérationnelle, Tome 49 (2015) no. 2, pp. 339-368.

We analyze scheduling multilayer divisible computations. Multilayer computations consist of a chain of parallel applications, such that one application produces input for the next one. A simple form of multilayer computations are MapReduce parallel applications. The operations of mapping and reducing are two divisible applications with precedence constraints. We propose a divisible load model and give an algorithm for scheduling multilayer divisible computations. The algorithm is tested in a series of computational experiments. We draw conclusions on schedule patterns and determinants of the performance.

Reçu le :
Accepté le :
DOI : 10.1051/ro/2014050
Classification : 90B35, 68M20, 68M14
Mots clés : Scheduling, divisible loads, parallel processing, multilayer computations, MapReduce
Berlińska, Joanna 1 ; Drozdowski, Maciej 2

1 Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Umultowska 87, 61-614 Poznań, Poland.
2 Institute of Computing Science, Poznań University of Technology, Piotrowo 2, 60-965 Poznań, Poland.
@article{RO_2015__49_2_339_0,
     author = {Berli\'nska, Joanna and Drozdowski, Maciej},
     editor = {Blazewicz, Jacek and Pesch, Erwin and Philipps, Cynthia and Trystram, Denis and Zhang, Guochuan},
     title = {Scheduling {Multilayer} {Divisible} {Computations}},
     journal = {RAIRO - Operations Research - Recherche Op\'erationnelle},
     pages = {339--368},
     publisher = {EDP-Sciences},
     volume = {49},
     number = {2},
     year = {2015},
     doi = {10.1051/ro/2014050},
     zbl = {1310.90041},
     language = {en},
     url = {http://archive.numdam.org/articles/10.1051/ro/2014050/}
}
TY  - JOUR
AU  - Berlińska, Joanna
AU  - Drozdowski, Maciej
ED  - Blazewicz, Jacek
ED  - Pesch, Erwin
ED  - Philipps, Cynthia
ED  - Trystram, Denis
ED  - Zhang, Guochuan
TI  - Scheduling Multilayer Divisible Computations
JO  - RAIRO - Operations Research - Recherche Opérationnelle
PY  - 2015
SP  - 339
EP  - 368
VL  - 49
IS  - 2
PB  - EDP-Sciences
UR  - http://archive.numdam.org/articles/10.1051/ro/2014050/
DO  - 10.1051/ro/2014050
LA  - en
ID  - RO_2015__49_2_339_0
ER  - 
%0 Journal Article
%A Berlińska, Joanna
%A Drozdowski, Maciej
%E Blazewicz, Jacek
%E Pesch, Erwin
%E Philipps, Cynthia
%E Trystram, Denis
%E Zhang, Guochuan
%T Scheduling Multilayer Divisible Computations
%J RAIRO - Operations Research - Recherche Opérationnelle
%D 2015
%P 339-368
%V 49
%N 2
%I EDP-Sciences
%U http://archive.numdam.org/articles/10.1051/ro/2014050/
%R 10.1051/ro/2014050
%G en
%F RO_2015__49_2_339_0
Berlińska, Joanna; Drozdowski, Maciej. Scheduling Multilayer Divisible Computations. RAIRO - Operations Research - Recherche Opérationnelle, Tome 49 (2015) no. 2, pp. 339-368. doi : 10.1051/ro/2014050. http://archive.numdam.org/articles/10.1051/ro/2014050/

R. Agrawal and H.V. Jagadish, Partitioning techniques for large-grained parallelism. IEEE Trans. Comput. 37 (1988) 1627–1634. | DOI

M. Al-Fares, A. Loukissas, A. Vahdat and A. Scalable, Commodity data center network architecture, SIGCOMM ’08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication.

B. Allcock, J. Bester, J. Bresnahan, A.L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel and S. Tuecke, Data management and transfer in high-performance computational grid environments. Parallel Computing 28 (2002) 749–771. | DOI

O. Beaumont, H. Casanova, A. Legrand, Y. Robert and Y. Yang, Scheduling divisible loads on star and tree networks: results and open problems. IEEE Trans. Parallel Distrib. Syst. 16 (2005) 207–218. | DOI

J. Berlińska, M. Drozdowski and M. Lawenda, Experimental study of scheduling with memory constraints using hybrid methods. J. Comput. Appl. Math. 232 (2009) 638–654. | DOI | Zbl

J. Berlińska and M. Drozdowski, Dominance properties for divisible MapReduce computations, Institute of Computing Science, Poznań University of Technology, Tech. Rep. RA-09/09, 2009, http://www.cs.put.poznan.pl/mdrozdowski/rapIIn/ra0909.pdf.

J. Berlińska and M. Drozdowski, Heuristics for multi-round divisible loads scheduling with limited memory. Parallel Computing 36 (2010) 199–211. | DOI | Zbl

J. Berlińska and M. Drozdowski, Scheduling divisible MapReduce computations. J. Parallel Distrib. Comput. 71 (2011) 450–459. | DOI

V. Bharadwaj, D. Ghose, V. Mani and T. Robertazzi, Scheduling divisible loads in parallel and distributed systems. IEEE Computer Society Press, Los Alamitos, CA (1996).

J. Błażewicz, W. Cellary, R. Słowiński and J. Wȩglarz, Scheduling under resource constraints – deterministic models. Ann. Oper. Res. 7 (1986). | Zbl

J. Błażewicz and M. Drozdowski, Scheduling divisible jobs on hypercubes. Parallel Computing 21 (1995) 1945–1956. | DOI

J. Błażewicz, K. Ecker, E. Pesch, G. Schmidt and J. Wȩglarz, Scheduling computer and manufacturing processes. Springer-Verlag, Heidelberg (1996). | Zbl

Y. Bu, B. Howe, M. Balazinska and M.D. Ernst, HaLoop: Efficient iterative data processing on large clusters. Proc. VLDB Endowment 3 (2010) 285–296. | DOI

H. Casanova, Network modeling issues for GRID application scheduling. Int. J. Found. Comput. Sci. 16 (2005) 145–162. | DOI

Y.-C. Cheng and T.G. Robertazzi, Distributed computation with communication delay. IEEE Trans. Aerospace Electron. Syst. 24 (1988) 700–712. | DOI

Y.-C. Cheng and T.G. Robertazzi, Distributed computation for a tree network with communication delays. IEEE Trans. Aerospace Electron. Syst. 26 (1990) 511–516. | DOI

N. Comino and V.L. Narasimhan, A novel data distribution technique for host-client type parallel applications. IEEE Trans. Parallel Distrib. Syst. 13 (2002) 97–110. | DOI

J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, OSDI’04: Sixth Symposium on Operating system design and implementation, San Francisco, CA, December, 2004, http://labs.google.com/papers/mapreduce.html.

M. Drozdowski, Scheduling for Parallel Processing. Springer (2009). | Zbl

M. Drozdowski and W. Głazek, Scheduling divisible loads in a three-dimensional mesh of processors. Parallel Computing 25 (1999) 381–404. | DOI | Zbl

M. Drozdowski and P. Wolniewicz, Experiments with scheduling divisible tasks in clusters of workstations, in: A. Bode, T. Ludwig, W. Karl and R. Wismuller (Eds.), Proceedings of 6th Euro-Par Conference. Lecture Notes Comput. Sci. 1900 (2000) 311–319.

J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.H. Bae, J. Qiu and G. Fox, Twister: A runtime for iterative MapReduce, in Proc. of the 19th ACM International symposium on High Performance Distributed Computing (2010) 810–818.

N. Farrington, E. Rubow and A. Vahdat, Data center switch architecture in the age of merchant silicon, in Proc. of 17th IEEE symposium on High Performance Interconnects, 2009. HOTI (2009), 93–102.

D. Ghose and H.J. Kim, Load partitioning and trade-off study for large matrix-vector computations in multicast bus networks with communication delays. J. Parallel Distrib. Comput. 55 (1998) 32–59. | DOI | Zbl

M.D. Grammatikakis, D.F. Hsu and M. Kraetzel, Parallel system interconnections and communications, CRC Press, Boca Raton (2001).

T. Gunarathne, B. Zhang, T.L. Wu and J. Qiu, Portable parallel programming on cloud and HPC: Scientific applications of Twister4Azure, in Proc. of the 4th International conference on utility and cloud computing (2011) 97–104.

E.L. Lawler and J. Labetoulle, On preemptive scheduling of unrelated parallel processors by linear programming. J. ACM 25 (1978) 612–619. | DOI | Zbl

X. Li, V. Bharadwaj and C.C. Ko, Processing divisible loads on single-level tree networks with buffer constraints. IEEE Trans. Aerospace Electron. Syst. 36 (2000) 1298–1308. | DOI

X. Li, V. Bharadwaj and C.C. Ko, Distributed image processing on a network of workstations. Int. J. Comput. Appl. 25 (2003) 1–10.

T. Lim and T.G. Robertazzi, Efficient parallel video processing through concurrent communication on a multi-port star network, in Proc. of the 40th conference on Information Sciences and Systems, 2006.

J. Lin and C. Dyer, Data-intensive text processing with MapReduce. Morgan & Claypool, 2010.

Lpsolve reference guide, 2010, http://lpsolve.sourceforge.net/5.5/.

R. Pike, S. Dorward, R. Griesemer and S. Quinlan, Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13 (2005) 277–298.

K. van der Raadt, Y. Yang and H. Casanova, Practical divisible load scheduling on grid platforms with APST-DV. Proceedings of the 19th IPDPS’05 (2005) 29b.

C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski and C. Kozyrakis, Evaluating MapReduce for Multi-core and Multiprocessor Systems, in Proc. of International Symposium on High Performance Computer Architecture (HPCA), 2007, pp. 13–24.

T. Robertazzi, Ten reasons to use divisible load theory. IEEE Computer 36 (2003) 63–68. | DOI

J. Sohn, T.G. Robertazzi and S. Luryi, Optimizing Computing Costs Using Divisible Load Analysis. IEEE Trans. Parallel Distrib. Syst. 9 (1998) 225–234. | DOI

A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency vol. A, vol. 24 of Algorithms and Combinatorics. Springer (2003). | Zbl

T. White, Hadoop: The Definitive Guide, O’Reilly Media, 2012.

H. Yang, A. Dasdan, R.-L. Hsiao and D.S. Parker, Map-Reduce-Merge: Simplified relational data processing on large clusters, in Proc. of the 2007 ACM SIGMOD international conference on Management of Data (2007) 1029–1040.

Y. Yang, H. Casanova, M. Drozdowski, M. Lawenda and A. Legrand, On the complexity of multi-round divisible load scheduling, INRIA Rhône-Alpes, Research Report 6096, 2007, http://hal.inria.fr/inria-00123711/en/.

Cité par Sources :