JOIN ORDER QUERY OPTIMIZATION USING HIVE IN THE BIG DATA ENVIRONMENT

In the Big Data world, billions of internet users use various social platforms and generate large digital data from different resources every second. The main drawback of conventional database systems is the inability to manage huge and complex datasets along with big data challenges, measured in terms of petabytes, zettabytes so on. New mechanisms are needed to store and process this big data. Apache Hadoop framework uses the MapReduce programming model to handle big data that gave great performance by developing parallelism among processing nodes, which is utilized by many companies like Yahoo, Facebook, and so on. MapReduce has a few limitations for data analysis; it requires a high-level language like Hive for processing large datasets. Apache Hive is an open-source engine assembled on top of Hadoop that uses the HiveQL Structured Language, similar to relational SQL for query processing. Join is the most frequent and most expensive operation in Hive query processing which increases the processing cost and time. Optimization plays an important role to enhance query processing. Although join order query optimization can greatly improve the Hive query processing speed and cost, it is avoided by most researchers. This paper examines the different join-order techniques to find the optimal Query Execution Plan (QEP) with minimum cost in a large search space to increase query performance on Hadoop-Hive. We will also discuss the feasibility and limitations of the join order strategies for Hive queries which will assist the researcher in interesting directions for a highly efficient join order processing system in the future.

___________________________________________________________________________________

 

Keywords: Big Data, Hadoop, Hive, Join Order Query Optimization Technique.


DOI:

Article DOI:

DOI URL:


Download Full Paper:

Download