We analyze scheduling multilayer divisible computations. Multilayer computations consist of a chain of parallel applications, such that one application produces input for the next one. A simple form of multilayer computations are MapReduce parallel applications. The operations of mapping and reducing are two divisible applications with precedence constraints. We propose a divisible load model and give an algorithm for scheduling multilayer divisible computations. The algorithm is tested in a series of computational experiments. We draw conclusions on schedule patterns and determinants of the performance.
Accepté le :
DOI : 10.1051/ro/2014050
Mots-clés : Scheduling, divisible loads, parallel processing, multilayer computations, MapReduce
@article{RO_2015__49_2_339_0, author = {Berli\'nska, Joanna and Drozdowski, Maciej}, editor = {Blazewicz, Jacek and Pesch, Erwin and Philipps, Cynthia and Trystram, Denis and Zhang, Guochuan}, title = {Scheduling {Multilayer} {Divisible} {Computations}}, journal = {RAIRO - Operations Research - Recherche Op\'erationnelle}, pages = {339--368}, publisher = {EDP-Sciences}, volume = {49}, number = {2}, year = {2015}, doi = {10.1051/ro/2014050}, zbl = {1310.90041}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ro/2014050/} }
TY - JOUR AU - Berlińska, Joanna AU - Drozdowski, Maciej ED - Blazewicz, Jacek ED - Pesch, Erwin ED - Philipps, Cynthia ED - Trystram, Denis ED - Zhang, Guochuan TI - Scheduling Multilayer Divisible Computations JO - RAIRO - Operations Research - Recherche Opérationnelle PY - 2015 SP - 339 EP - 368 VL - 49 IS - 2 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ro/2014050/ DO - 10.1051/ro/2014050 LA - en ID - RO_2015__49_2_339_0 ER -
%0 Journal Article %A Berlińska, Joanna %A Drozdowski, Maciej %E Blazewicz, Jacek %E Pesch, Erwin %E Philipps, Cynthia %E Trystram, Denis %E Zhang, Guochuan %T Scheduling Multilayer Divisible Computations %J RAIRO - Operations Research - Recherche Opérationnelle %D 2015 %P 339-368 %V 49 %N 2 %I EDP-Sciences %U http://archive.numdam.org/articles/10.1051/ro/2014050/ %R 10.1051/ro/2014050 %G en %F RO_2015__49_2_339_0
Berlińska, Joanna; Drozdowski, Maciej. Scheduling Multilayer Divisible Computations. RAIRO - Operations Research - Recherche Opérationnelle, New challenges in scheduling theory, Tome 49 (2015) no. 2, pp. 339-368. doi : 10.1051/ro/2014050. http://archive.numdam.org/articles/10.1051/ro/2014050/
Partitioning techniques for large-grained parallelism. IEEE Trans. Comput. 37 (1988) 1627–1634. | DOI
and ,M. Al-Fares, A. Loukissas, A. Vahdat and A. Scalable, Commodity data center network architecture, SIGCOMM ’08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication.
Data management and transfer in high-performance computational grid environments. Parallel Computing 28 (2002) 749–771. | DOI
, , , , , , , , and ,Scheduling divisible loads on star and tree networks: results and open problems. IEEE Trans. Parallel Distrib. Syst. 16 (2005) 207–218. | DOI
, , , and ,Experimental study of scheduling with memory constraints using hybrid methods. J. Comput. Appl. Math. 232 (2009) 638–654. | DOI | Zbl
, and ,J. Berlińska and M. Drozdowski, Dominance properties for divisible MapReduce computations, Institute of Computing Science, Poznań University of Technology, Tech. Rep. RA-09/09, 2009, http://www.cs.put.poznan.pl/mdrozdowski/rapIIn/ra0909.pdf.
Heuristics for multi-round divisible loads scheduling with limited memory. Parallel Computing 36 (2010) 199–211. | DOI | Zbl
and ,Scheduling divisible MapReduce computations. J. Parallel Distrib. Comput. 71 (2011) 450–459. | DOI
and ,V. Bharadwaj, D. Ghose, V. Mani and T. Robertazzi, Scheduling divisible loads in parallel and distributed systems. IEEE Computer Society Press, Los Alamitos, CA (1996).
Scheduling under resource constraints – deterministic models. Ann. Oper. Res. 7 (1986). | Zbl
, , and ,Scheduling divisible jobs on hypercubes. Parallel Computing 21 (1995) 1945–1956. | DOI
and ,J. Błażewicz, K. Ecker, E. Pesch, G. Schmidt and J. Wȩglarz, Scheduling computer and manufacturing processes. Springer-Verlag, Heidelberg (1996). | Zbl
HaLoop: Efficient iterative data processing on large clusters. Proc. VLDB Endowment 3 (2010) 285–296. | DOI
, , and ,Network modeling issues for GRID application scheduling. Int. J. Found. Comput. Sci. 16 (2005) 145–162. | DOI
,Distributed computation with communication delay. IEEE Trans. Aerospace Electron. Syst. 24 (1988) 700–712. | DOI
and ,Distributed computation for a tree network with communication delays. IEEE Trans. Aerospace Electron. Syst. 26 (1990) 511–516. | DOI
and ,A novel data distribution technique for host-client type parallel applications. IEEE Trans. Parallel Distrib. Syst. 13 (2002) 97–110. | DOI
and ,J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, OSDI’04: Sixth Symposium on Operating system design and implementation, San Francisco, CA, December, 2004, http://labs.google.com/papers/mapreduce.html.
M. Drozdowski, Scheduling for Parallel Processing. Springer (2009). | Zbl
Scheduling divisible loads in a three-dimensional mesh of processors. Parallel Computing 25 (1999) 381–404. | DOI | Zbl
and ,M. Drozdowski and P. Wolniewicz, Experiments with scheduling divisible tasks in clusters of workstations, in: A. Bode, T. Ludwig, W. Karl and R. Wismuller (Eds.), Proceedings of 6th Euro-Par Conference. Lecture Notes Comput. Sci. 1900 (2000) 311–319.
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.H. Bae, J. Qiu and G. Fox, Twister: A runtime for iterative MapReduce, in Proc. of the 19th ACM International symposium on High Performance Distributed Computing (2010) 810–818.
N. Farrington, E. Rubow and A. Vahdat, Data center switch architecture in the age of merchant silicon, in Proc. of 17th IEEE symposium on High Performance Interconnects, 2009. HOTI (2009), 93–102.
Load partitioning and trade-off study for large matrix-vector computations in multicast bus networks with communication delays. J. Parallel Distrib. Comput. 55 (1998) 32–59. | DOI | Zbl
and ,M.D. Grammatikakis, D.F. Hsu and M. Kraetzel, Parallel system interconnections and communications, CRC Press, Boca Raton (2001).
T. Gunarathne, B. Zhang, T.L. Wu and J. Qiu, Portable parallel programming on cloud and HPC: Scientific applications of Twister4Azure, in Proc. of the 4th International conference on utility and cloud computing (2011) 97–104.
On preemptive scheduling of unrelated parallel processors by linear programming. J. ACM 25 (1978) 612–619. | DOI | Zbl
and ,Processing divisible loads on single-level tree networks with buffer constraints. IEEE Trans. Aerospace Electron. Syst. 36 (2000) 1298–1308. | DOI
, and ,Distributed image processing on a network of workstations. Int. J. Comput. Appl. 25 (2003) 1–10.
, and ,T. Lim and T.G. Robertazzi, Efficient parallel video processing through concurrent communication on a multi-port star network, in Proc. of the 40th conference on Information Sciences and Systems, 2006.
J. Lin and C. Dyer, Data-intensive text processing with MapReduce. Morgan & Claypool, 2010.
Lpsolve reference guide, 2010, http://lpsolve.sourceforge.net/5.5/.
Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13 (2005) 277–298.
, , and ,K. van der Raadt, Y. Yang and H. Casanova, Practical divisible load scheduling on grid platforms with APST-DV. Proceedings of the 19th IPDPS’05 (2005) 29b.
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski and C. Kozyrakis, Evaluating MapReduce for Multi-core and Multiprocessor Systems, in Proc. of International Symposium on High Performance Computer Architecture (HPCA), 2007, pp. 13–24.
Ten reasons to use divisible load theory. IEEE Computer 36 (2003) 63–68. | DOI
,Optimizing Computing Costs Using Divisible Load Analysis. IEEE Trans. Parallel Distrib. Syst. 9 (1998) 225–234. | DOI
, and ,A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency vol. A, vol. 24 of Algorithms and Combinatorics. Springer (2003). | Zbl
T. White, Hadoop: The Definitive Guide, O’Reilly Media, 2012.
H. Yang, A. Dasdan, R.-L. Hsiao and D.S. Parker, Map-Reduce-Merge: Simplified relational data processing on large clusters, in Proc. of the 2007 ACM SIGMOD international conference on Management of Data (2007) 1029–1040.
Y. Yang, H. Casanova, M. Drozdowski, M. Lawenda and A. Legrand, On the complexity of multi-round divisible load scheduling, INRIA Rhône-Alpes, Research Report 6096, 2007, http://hal.inria.fr/inria-00123711/en/.
Cité par Sources :