✦ LIBER ✦

[ACM Press Proceeding of the second joint WOSP/SIPEW international conference - Karlsruhe, Germany (2011.03.14-2011.03.16)] Proceeding of the second joint WOSP/SIPEW international conference on Performance engineering - ICPE '11 - Performance modeling in mapreduce environments

✍ Scribed by Cherkasova, Ludmila

Book ID: 121404160
Publisher: ACM Press
Year: 2011
Weight: 286 KB
Category: Article
ISBN: 1450305199
DOI: 10.1145/1958746.1958752

No coin nor oath required. For personal study only.

✦ Synopsis

Unstructured data is the largest and fastest growing portion of most enterprise's assets, often representing 70% to 80% of online data. These steep increase in volume of information being produced often exceeds the capabilities of existing commercial databases. MapReduce and its open-source implementation Hadoop represent an economically compelling alternative that offers an efficient distributed computing platform for handling large volumes of data and mining petabytes of unstructured information. It is increasingly being used across the enterprise for advanced data analytics, business intelligence, and enabling new applications associated with data retention, regulatory compliance, e-discovery and litigation issues.However, setting up a dedicated Hadoop cluster requires a significant capital expenditure that can be difficult to justify. Cloud computing offers a compelling alternative and allows users to rent resources in a "pay-as-you-go" fashion. For example, a list of offered Amazon Web Services includes MapReduce environment for rent. It is an attractive and cost-efficient option for many users because acquiring and maintaining complex, large-scale infrastructures is a difficult and expensive decision.One of the open questions in such environments is the amount of resources that a user should lease from the service provider. Currently, there is no available methodology to easily answer this question, and the task of estimating required resources to meet application performance goals is the solely user's responsibility. The users need to perform adequate application testing, performance evaluation, capacity planning estimation, and then request appropriate amount of resources from the service provider. To address these problems we need to understand: "What do we need to know about a MapReduce job for building an efficient and accurate modeling framework? Can we extract a representative job profile that reflects a set of critical performance characteristics of the underlying application during all job execution phases, i.e., map, shuffle, sort and reduce phases? What metrics should be included in the job profile?" We discuss a profiling technique for MapReduce applications that aims to construct a compact job profile that is comprised of performance invariants which are independent of the amount of resources assigned to the job (i.e., the size of the Hadoop cluster) and the size of the input dataset. The challenge is how to accurately predict application performance in the large production environment and for processing large datasets fromCopyright is held by the author/owner(s).

📜 SIMILAR VOLUMES

[ACM Press Proceeding of the second join

[ACM Press Proceeding of the second joint WOSP/SIPEW international conference - Karlsruhe, Germany (2011.03.14-2011.03.16)] Proceeding of the second joint WOSP/SIPEW international conference on Performance engineering - ICPE '11 - Performance modeling in mapreduce environments

✍ Cherkasova, Ludmila 📂 Article 📅 2011 🏛 ACM Press ⚖ 286 KB

[ACM Press Proceeding of the second join

✍ Huang, Dawei; Ye, Deshi; He, Qinming; Chen, Jianhai; Ye, Kejiang 📂 Article 📅 2011 🏛 ACM Press ⚖ 685 KB

[ACM Press Proceeding of the second join

✍ Berube, Paul; Preuss, Adam; Amaral, Jose Nelson 📂 Article 📅 2011 🏛 ACM Press ⚖ 442 KB

[ACM Press the first joint WOSP/SIPEW in

[ACM Press the first joint WOSP/SIPEW international conference - San Jose, California, USA (2010.01.28-2010.01.30)] Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering - WOSP/SIPEW '10 - Phymss

✍ Chişe, Cosmina; Jurca, Ioan 📂 Article 📅 2010 🏛 ACM Press ⚖ 485 KB

[ACM Press the first joint WOSP/SIPEW in

✍ Avritzer, Alberto; Tanikella, Rajanikanth; James, Kiran; Cole, Robert G.; Weyuke 📂 Article 📅 2010 🏛 ACM Press ⚖ 561 KB

[ACM Press the third joint WOSP/SIPEW in

[ACM Press the third joint WOSP/SIPEW international conference - Boston, Massachusetts, USA (2012.04.22-2012.04.25)] Proceedings of the third joint WOSP/SIPEW international conference on Performance Engineering - ICPE '12 - Understanding performance modeling for modular mobile-cloud applications

✍ Giurgiu, Ioana 📂 Article 📅 2012 🏛 ACM Press ⚖ 555 KB