International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064

Downloads: 1 | Views: 75 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Informative Article | Information Technology | India | Volume 9 Issue 5, May 2020 | Rating: 5 / 10

Ensuring Data Integrity in Big Data Ingestion: Techniques and Best Practices for Data Quality Assurance

Sree Sandhya Kona [5]

Abstract: In the era of big data, the quality of data ingested into analytical systems profoundly impacts the accuracy of insights and the efficacy of decision - making processes. Ensuring high - quality data during the ingestion phase is crucial, yet it presents significant challenges, including the handling of inaccuracies, inconsistencies, and incomplete information. This article delves into the fundamental techniques and best practices for data quality assurance in big data ingestion. It explores essential strategies across three main areas: data validation, data cleansing, and data enrichment. Data validation techniques discussed include both pre - and post - ingestion checks, such as schema validation and anomaly detection. In data cleansing, we address methods for identifying and correcting errors, including data imputation and systematic error correction. Furthermore, the article highlights data enrichment strategies that enhance the utility and context of the ingested data, such as data merging and augmentation. We also examine the role of automated tools in integrating these practices into data pipelines and the importance of continuous monitoring and feedback mechanisms to sustain data integrity. Through a combination of theoretical frameworks and real - world case studies, this article aims to provide a comprehensive guide to improving data quality in big data projects, thus supporting more reliable and insightful business analytics.

Keywords: Data Quality Assurance, Data Validation, Data Cleansing, Data Enrichment, Schema Validation, Anomaly Detection, Data Imputation, Data Merging, Data Augmentation, Automated Data Tools, Business Intelligence

Edition: Volume 9 Issue 5, May 2020,

Pages: 1866 - 1869

How to Download this Article?

Type Your Valid Email Address below to Receive the Article PDF Link

Verification Code will appear in 2 Seconds ... Wait