Modern Enterprise Data Analysis: From Legacy MapReduce to Cloud-Native Architectures

Sohil Sri Mani Yeshwanth Grandhi

doi:10.21275/SR22222084707

Modern Enterprise Data Analysis: From Legacy MapReduce to Cloud-Native Architectures

Sohil Sri Mani Yeshwanth Grandhi

Abstract: This paper presents a comprehensive analysis of enterprise data processing methodologies, tracing the evolution from traditional MapReduce-based approaches to contemporary cloud-native architectures. We demonstrate a modernized implementation of large-scale data analysis originally conceived using Hadoop ecosystem technologies (Hive, Pig, HBase) but re-engineered with current technologies including Apache Spark, Delta Lake, and cloud-native services. Our work includes data processing pipelines, machine learning implementations using PySpark and MLlib, and real-time analytics capabilities. We provide comparative analysis showing significant performance improvements over legacy approaches and discuss emerging alternatives including serverless architectures, data lake houses, and AI-enhanced analytics platforms. The research demonstrates a 3.8x performance improvement in processing throughput and 60.

Keywords: Big Data, Apache Spark, Cloud Computing, Data Analytics, Machine Learning, Enterprise Systems, Data Lake, Real-time Processing

How to Cite?: Sohil Sri Mani Yeshwanth Grandhi, "Modern Enterprise Data Analysis: From Legacy MapReduce to Cloud-Native Architectures", Volume 11 Issue 2, February 2022, International Journal of Science and Research (IJSR), Pages: 1384-1389, https://www.ijsr.net/getabstract.php?paperid=SR22222084707, DOI: https://dx.doi.org/10.21275/SR22222084707

Download Citation: APA | MLA | BibTeX | EndNote | RefMan