Survey Paper | Computer Science & Engineering | India | Volume 3 Issue 12, December 2014
A Survey on Duplicate Detection in Hierarchical Data
Nikhil Gawande, S. R. Todamal
Although there has been a lot work done on identifying duplicates in relational data, but only a few solutions focus on identifying duplicates in more complex hierarchical structures, like XML data. In this paper, we have demonstrated the novel method for XML duplicate detection, called XMLDup. XMLDup method implements the Bayesian network to calculate and determine the probability of two XML nodes, considering not only the information within the XML nodes, but also the way that the information is structured. In addition, to increase the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments and comparisons, we show that our algorithm is able to achieve high precision and recall scores in several datasets. XMLDup method helps us to improve both efficiency and of effectiveness.
Keywords: duplicate detection, record linkage, entity resolution, XML, Bayesian networks, data cleaning, optimization
Edition: Volume 3 Issue 12, December 2014
Pages: 751 - 754
How to Cite this Article?
Nikhil Gawande, S. R. Todamal, "A Survey on Duplicate Detection in Hierarchical Data", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=SUB14438, Volume 3 Issue 12, December 2014, 751 - 754
130 PDF Views | 108 PDF Downloads
Similar Articles with Keyword 'duplicate detection'
Informative Article, Computer Science & Engineering, India, Volume 9 Issue 12, December 2020
Pages: 85 - 88CBCD Methods in Video Copy Detection
Jan Mary Thomas
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015
Pages: 2676 - 2680Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm
Shital Gaikwad, Nagaraju Bogiri
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014
Pages: 751 - 754A Survey on Duplicate Detection in Hierarchical Data
Nikhil Gawande, S. R. Todamal
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015
Pages: 1217 - 1219Removing Dedepulication Using Pattern Serach Suffix Arrays
Pratiksha Dhande, Supriya Kumari, Sushmita Tupe, Laukik Shah
M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 7, July 2015
Pages: 721 - 723Techniques for Duplicate Detection in Hierarchical Data
Suvarna Kale, Basha Vankudothu
Similar Articles with Keyword 'record linkage'
M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015
Pages: 2296 - 2300Clustering Tree based Implementation of Record Linkage on Many-to-Many Relation
V. Balvannanathan, R. Siva
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014
Pages: 751 - 754A Survey on Duplicate Detection in Hierarchical Data
Nikhil Gawande, S. R. Todamal
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015
Pages: 1217 - 1219Removing Dedepulication Using Pattern Serach Suffix Arrays
Pratiksha Dhande, Supriya Kumari, Sushmita Tupe, Laukik Shah
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015
Pages: 757 - 760Record Deduplication Approaches and Algorithm for Removing Duplicate Data
Nikita A. Pande, Namrata D. Ghuse
Similar Articles with Keyword 'entity resolution'
M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015
Pages: 2296 - 2300Clustering Tree based Implementation of Record Linkage on Many-to-Many Relation
V. Balvannanathan, R. Siva
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014
Pages: 751 - 754A Survey on Duplicate Detection in Hierarchical Data
Nikhil Gawande, S. R. Todamal
M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 3 Issue 3, March 2014
Pages: 286 - 291Fast and Accurate Incremental Entity Relationships
Rajeshkumar S, Geofrin Shirly S
M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 6 Issue 11, November 2017
Pages: 330 - 333Rule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD)
Liji S, Nithya M
Similar Articles with Keyword 'XML'
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 5, May 2015
Pages: 3021 - 3028Smart Type-Ahead Search in XML
Supriya. N. Chaudhari, Vaishali M. Deshmukh
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 5, May 2014
Pages: 1628 - 1631A Survey On XML-Injection Attack Detection Systems
Swati Ramesh Kesharwani, Aarti Deshpande
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015
Pages: 2676 - 2680Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm
Shital Gaikwad, Nagaraju Bogiri
Research Paper, Computer Science & Engineering, India, Volume 2 Issue 11, November 2013
Pages: 200 - 203Integration of a City GIS Data with Google Map API and Google Earth API for a Web Based 3D Geospatial Application
Akanbi A. K, Agunbiade O. Y
Research Paper, Computer Science & Engineering, India, Volume 8 Issue 9, September 2019
Pages: 937 - 938Updating XML Files using a Tool based on DOM Parser
Nehal Pandey, Deepak Pase, Priyanka Chaudhari
Similar Articles with Keyword 'Bayesian networks'
Research Paper, Computer Science & Engineering, China, Volume 9 Issue 4, April 2020
Pages: 1544 - 1554Recent Developments on Probabilistic Graphical Model Applied in Data Analysis
Kan'Sam Nadjak, Guisheng Yin
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014
Pages: 751 - 754A Survey on Duplicate Detection in Hierarchical Data
Nikhil Gawande, S. R. Todamal
Review Papers, Computer Science & Engineering, India, Volume 4 Issue 5, May 2015
Pages: 2058 - 2061Medical Diagnosis for Liver Cancer using Classification Techniques
Reetu, Narender Kumar
Research Paper, Computer Science & Engineering, India, Volume 9 Issue 8, August 2020
Pages: 281 - 286Cancer Prediction Using Machine Learning Algorithms
Mohit Agrawal
Research Paper, Computer Science & Engineering, India, Volume 3 Issue 2, February 2014
Pages: 83 - 87Exploring Mutational Pathways of HIV Using Genetic Algorithm
K. M. Monica
Similar Articles with Keyword 'data cleaning'
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014
Pages: 1850 - 1856A Review on Detection of Outliers Over High Dimensional Streaming Data Using Cluster Based Hybrid Approach
Abhishek B. Mankar, Namrata Ghuse
Review Papers, Computer Science & Engineering, India, Volume 4 Issue 1, January 2015
Pages: 2180 - 2182Review of Improved Cross Redundant Data Cleaning Algorithm for RFID and WSN Integration
Jayashri M. Dupare, N. U. Sambhe
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015
Pages: 2525 - 2528Efficient Technique for Network Lifetime Enhancement by Cleaning Dirty Data
Komal V. Shiyale, Pranay D. Saraf
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014
Pages: 751 - 754A Survey on Duplicate Detection in Hierarchical Data
Nikhil Gawande, S. R. Todamal
M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 7, July 2015
Pages: 721 - 723Techniques for Duplicate Detection in Hierarchical Data
Suvarna Kale, Basha Vankudothu
Similar Articles with Keyword 'optimization'
Survey Paper, Computer Science & Engineering, India, Volume 5 Issue 10, October 2016
Pages: 1554 - 1557Paid and Non-Paid Marketing Strategies for Search Engine Optimization
Elton D'souza; Gursheen Grewal; Divya Unnikrishnan; Neelam Phadnis
Research Paper, Computer Science & Engineering, India, Volume 10 Issue 3, March 2021
Pages: 262 - 265Data Gathering Optimization Using ACO and Genetic Algorithm in WSN
Shabir Ur Rashid, Mrigana Walia
Research Paper, Computer Science & Engineering, India, Volume 2 Issue 5, May 2013
Pages: 297 - 300Balancing the Trade-Offs between Data Availability and Query Delay in MANETs
Umar I. Masumdar, N. S. Killarikar
Review Papers, Computer Science & Engineering, India, Volume 10 Issue 1, January 2021
Pages: 594 - 596A Review of Replication Strategies to Increase Data Availability for Data Intensive Applications in Cloud
K. Sreelatha
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014
Pages: 2473 - 2476A Survey in Wireless Networks to Enhance An Error Minimization Framework For Localizing Jammers
Sneha V.Tiwari, Trupti Dange