A Comparative Study of Parallel and Distributed Big Data programming models: Methodologies, Challenges and Future Directions

Fiza Gulzar Hussain; Muhammad Wasim; Ayesha Nasir

doi:10.54692/lgurjcsit.2023.073365

Fiza Gulzar Hussain University of Management and Technology
Muhammad Wasim
Ayesha Nasir

DOI: https://doi.org/10.54692/lgurjcsit.2023.073365

Keywords: Programming Models, Parallel Computing, Distributed Computing, Big Data, MapReduce, SQL, Bulk Synchronous Parallel, Directed Acyclic Graph, Message Passing Interface

Abstract

According to a survey conducted in 2021, users share about 4 petabytes of data on Facebook daily. The exponential increase in data (called big data) plays a vital role in machine learning, internet of things (IoT), and business intelligence applications. Due to the rapid increase in big data, research in big data programming models gained much interest in the past decade. Today, many programming paradigms exist to handle big data, and selecting an appropriate model for a project is critical for its success. This study provides an in-depth analysis of big data programming models such as MapReduce, Directed Acyclic Graph (DAG), Bulk Synchronous Parallel (BSP), and SQL. We conduct a comparative study of distributed and parallel big data programming models and categorize these models into three classes: traditional data processing, graph-based processing, and query-based processing models. Furthermore, we evaluate these programming models based on different parameters like performance, data processing, storage, fault-tolerant, suitable language, and machine learning support. Finally, we highlight the benchmark datasets used for big data programming models and discuss the challenges of models along with future directions for the research community.

Vol 6 No 1 Vol 6 No 2 Vol 6 No 3 Vol 6 No 4	Vol 5 No 1 Vol 5 No 2 Vol 5 No 3 Vol 5 No 4	Vol 4 No 1 Vol 4 No 2 Vol 4 No 3 Vol 4 No 4
Vol 3 No 1 Vol 3 No 2 Vol 3 No 3 Vol 3 No 4	Vol 2 No 1 Vol 2 No 2 Vol 2 No 3 Vol 2 No 4	Vol 1 No 1 Vol 1 No 2 Vol 1 No 3 Vol 1 No 4

A Comparative Study of Parallel and Distributed Big Data programming models: Methodologies, Challenges and Future Directions

Abstract

Most read articles by the same author(s)

As per LGU revised policy (with effect from January 2025).

Principal Contact