Research Paper | Computer Science & Engineering | India | Volume 2 Issue 3, March 2013
Web Data Extraction and Alignment
M. Jude Victor, D. John Aravindhar, V. Dheepa
Web databases generate query result pages based on a user’s query. Automatically extracting the data from these query result pages is very important for many applications, such as data integration, which need to cooperate with multiple web databases. We present a novel data extraction and alignment method called CTVS that combines both tag and value similarity. CTVS automatically extracts data from query result pages by first identifying and segmenting the query result records (QRRs) in the query result pages and then aligning the segmented QRRs into a table, in which the data values from the same attribute are put into the same column. We also design a new record alignment algorithm that aligns the attributes in a record, first pair wise and then holistically, by combining the tag and data value similarity information. Experimental results show that CTVS achieves high precision and outperforms existing state-of-the-art data extraction methods.
Keywords: Data Extraction, Automatic Wrapper Generation, Data Record Alignment, Information Integration
Edition: Volume 2 Issue 3, March 2013
Pages: 129 - 132
How to Cite this Article?
M. Jude Victor, D. John Aravindhar, V. Dheepa, "Web Data Extraction and Alignment", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=IJSROFF2013098, Volume 2 Issue 3, March 2013, 129 - 132
127 PDF Views | 106 PDF Downloads
Similar Articles with Keyword 'Data Extraction'
Review Papers, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014
Pages: 1191 - 1194Web Data Extraction by Using Trinity
Sayali Khodade, Nilav Mukharjee
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015
Pages: 1579 - 1582Data Hiding in H.264/AVC Video Encryption with XOR-ed User Information and Data in File Format
Neenu Shereef
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014
Pages: 1152 - 1154A Survey on Content based Video Retrieval Using Speech and Text information
Laxmikant S. Kate, M. M. Waghmare
Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 10, October 2015
Pages: 1434 - 1436Method for Repossession of Content Based Video using Speech and Text Information
Manasi A. Kabade, U.A. Jogalekar
Research Paper, Computer Science & Engineering, India, Volume 7 Issue 9, September 2018
Pages: 1248 - 1253Smart Non Redundant Data Extraction for Efficient Testing
Anurag Sahu
Similar Articles with Keyword 'Automatic Wrapper Generation'
Review Papers, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014
Pages: 1191 - 1194Web Data Extraction by Using Trinity
Sayali Khodade, Nilav Mukharjee
Research Paper, Computer Science & Engineering, India, Volume 2 Issue 3, March 2013
Pages: 129 - 132Web Data Extraction and Alignment
M. Jude Victor, D. John Aravindhar, V. Dheepa