Downloads: 0
Research Paper | Computer Science and Information Technology | Volume 15 Issue 5, May 2026 | Pages: 144 - 151 | India
Automated Web Data Extraction Using OCR and RPA
Abstract: This research paper presents an automated system for web data extraction by integrating Optical Character Recognition (OCR) and Robotic Process Automation (RPA). The proposed system addresses the growing need for efficient extraction of unstructured data from web sources, particularly from documents, images, and screenshots embedded within web interfaces. By combining OCR engines for text recognition with RPA bots for workflow automation, the system achieves seamless extraction, validation, and consolidation of data into structured formats. The methodology employs a hybrid approach where RPA automates navigation and data capture while OCR handles text recognition from non-selectable sources. Experimental results demonstrate that the integrated system reduces manual processing time by approximately 65% for batch operations and achieves extraction accuracy exceeding 95% for standardized documents. The system's modular architecture enables deployment across diverse domains including financial document processing, invoice management, and form data extraction, offering significant improvements in operational efficiency and data accuracy.
Keywords: Optical Character Recognition, Robotic Process Automation, Web Data Extraction, Intelligent Document Processing, Workflow Automation
How to Cite?: Abhishek Tiwari, Amandeep Ahlawat, Mohit Poriya, Satyam, Himani Chaudhary, "Automated Web Data Extraction Using OCR and RPA", Volume 15 Issue 5, May 2026, International Journal of Science and Research (IJSR), Pages: 144-151, https://www.ijsr.net/getabstract.php?paperid=SR26430222214, DOI: https://dx.dx.doi.org/10.21275/SR26430222214