๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

[ACM Press the 2nd ACM Symposium - Cascais, Portugal (2011.10.26-2011.10.28)] Proceedings of the 2nd ACM Symposium on Cloud Computing - SOCC '11 - Query optimization for massively parallel data processing

โœ Scribed by Wu, Sai; Li, Feng; Mehrotra, Sharad; Ooi, Beng Chin


Book ID
121763362
Publisher
ACM Press
Year
2011
Weight
497 KB
Category
Article
ISBN
1450309763

No coin nor oath required. For personal study only.

โœฆ Synopsis


MapReduce has been widely recognized as an efficient tool for large-scale data analysis. It achieves high performance by exploiting parallelism among processing nodes while providing a simple interface for upper-layer applications. Some vendors have enhanced their data warehouse systems by integrating MapReduce into the systems. However, existing MapReduce-based query processing systems, such as Hive, fall short of the query optimization and competency of conventional database systems. Given an SQL query, Hive translates the query into a set of MapReduce jobs sentence by sentence. This design assumes that the user can optimize his query before submitting it to the system. Unfortunately, manual query optimization is time consuming and difficult, even to an experienced database user or administrator. In this paper, we propose a query optimization scheme for MapReduce-based processing systems. Specifically, we embed into Hive a query optimizer which is designed to generate an efficient query plan based on our proposed cost model. Experiments carried out on our in-house cluster confirm the effectiveness of our query optimizer.


๐Ÿ“œ SIMILAR VOLUMES