A Survey on Duplicate Detection in Hierarchical Data
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
www.ijsr.net | Open Access | Fully Refereed | Peer Reviewed International Journal

ISSN: 2319-7064

Survey Paper | Computer Science & Engineering | India | Volume 3 Issue 12, December 2014

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Although there has been a lot work done on identifying duplicates in relational data, but only a few solutions focus on identifying duplicates in more complex hierarchical structures, like XML data. In this paper, we have demonstrated the novel method for XML duplicate detection, called XMLDup. XMLDup method implements the Bayesian network to calculate and determine the probability of two XML nodes, considering not only the information within the XML nodes, but also the way that the information is structured. In addition, to increase the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments and comparisons, we show that our algorithm is able to achieve high precision and recall scores in several datasets. XMLDup method helps us to improve both efficiency and of effectiveness.

Keywords: duplicate detection, record linkage, entity resolution, XML, Bayesian networks, data cleaning, optimization

Edition: Volume 3 Issue 12, December 2014

Pages: 751 - 754

Share this Article

How to Cite this Article?

Nikhil Gawande, S. R. Todamal, "A Survey on Duplicate Detection in Hierarchical Data", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=SUB14438, Volume 3 Issue 12, December 2014, 751 - 754

130 PDF Views | 108 PDF Downloads

Download Article PDF



Similar Articles with Keyword 'duplicate detection'

Informative Article, Computer Science & Engineering, India, Volume 9 Issue 12, December 2020

Pages: 85 - 88

CBCD Methods in Video Copy Detection

Jan Mary Thomas

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 2676 - 2680

Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm

Shital Gaikwad, Nagaraju Bogiri

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 751 - 754

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015

Pages: 1217 - 1219

Removing Dedepulication Using Pattern Serach Suffix Arrays

Pratiksha Dhande, Supriya Kumari, Sushmita Tupe, Laukik Shah

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 7, July 2015

Pages: 721 - 723

Techniques for Duplicate Detection in Hierarchical Data

Suvarna Kale, Basha Vankudothu

Share this Article

Similar Articles with Keyword 'record linkage'

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015

Pages: 2296 - 2300

Clustering Tree based Implementation of Record Linkage on Many-to-Many Relation

V. Balvannanathan, R. Siva

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 751 - 754

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015

Pages: 1217 - 1219

Removing Dedepulication Using Pattern Serach Suffix Arrays

Pratiksha Dhande, Supriya Kumari, Sushmita Tupe, Laukik Shah

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015

Pages: 757 - 760

Record Deduplication Approaches and Algorithm for Removing Duplicate Data

Nikita A. Pande, Namrata D. Ghuse

Share this Article

Similar Articles with Keyword 'entity resolution'

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015

Pages: 2296 - 2300

Clustering Tree based Implementation of Record Linkage on Many-to-Many Relation

V. Balvannanathan, R. Siva

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 751 - 754

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 3 Issue 3, March 2014

Pages: 286 - 291

Fast and Accurate Incremental Entity Relationships

Rajeshkumar S, Geofrin Shirly S

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 6 Issue 11, November 2017

Pages: 330 - 333

Rule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD)

Liji S, Nithya M

Share this Article

Similar Articles with Keyword 'XML'

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 5, May 2015

Pages: 3021 - 3028

Smart Type-Ahead Search in XML

Supriya. N. Chaudhari, Vaishali M. Deshmukh

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 5, May 2014

Pages: 1628 - 1631

A Survey On XML-Injection Attack Detection Systems

Swati Ramesh Kesharwani, Aarti Deshpande

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 2676 - 2680

Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm

Shital Gaikwad, Nagaraju Bogiri

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 2 Issue 11, November 2013

Pages: 200 - 203

Integration of a City GIS Data with Google Map API and Google Earth API for a Web Based 3D Geospatial Application

Akanbi A. K, Agunbiade O. Y

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 8 Issue 9, September 2019

Pages: 937 - 938

Updating XML Files using a Tool based on DOM Parser

Nehal Pandey, Deepak Pase, Priyanka Chaudhari

Share this Article

Similar Articles with Keyword 'Bayesian networks'

Research Paper, Computer Science & Engineering, China, Volume 9 Issue 4, April 2020

Pages: 1544 - 1554

Recent Developments on Probabilistic Graphical Model Applied in Data Analysis

Kan'Sam Nadjak, Guisheng Yin

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 751 - 754

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 5, May 2015

Pages: 2058 - 2061

Medical Diagnosis for Liver Cancer using Classification Techniques

Reetu, Narender Kumar

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 9 Issue 8, August 2020

Pages: 281 - 286

Cancer Prediction Using Machine Learning Algorithms

Mohit Agrawal

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 3 Issue 2, February 2014

Pages: 83 - 87

Exploring Mutational Pathways of HIV Using Genetic Algorithm

K. M. Monica

Share this Article

Similar Articles with Keyword 'data cleaning'

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 1850 - 1856

A Review on Detection of Outliers Over High Dimensional Streaming Data Using Cluster Based Hybrid Approach

Abhishek B. Mankar, Namrata Ghuse

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 1, January 2015

Pages: 2180 - 2182

Review of Improved Cross Redundant Data Cleaning Algorithm for RFID and WSN Integration

Jayashri M. Dupare, N. U. Sambhe

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 2525 - 2528

Efficient Technique for Network Lifetime Enhancement by Cleaning Dirty Data

Komal V. Shiyale, Pranay D. Saraf

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 751 - 754

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 7, July 2015

Pages: 721 - 723

Techniques for Duplicate Detection in Hierarchical Data

Suvarna Kale, Basha Vankudothu

Share this Article

Similar Articles with Keyword 'optimization'

Survey Paper, Computer Science & Engineering, India, Volume 5 Issue 10, October 2016

Pages: 1554 - 1557

Paid and Non-Paid Marketing Strategies for Search Engine Optimization

Elton D'souza; Gursheen Grewal; Divya Unnikrishnan; Neelam Phadnis

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 10 Issue 3, March 2021

Pages: 262 - 265

Data Gathering Optimization Using ACO and Genetic Algorithm in WSN

Shabir Ur Rashid, Mrigana Walia

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 2 Issue 5, May 2013

Pages: 297 - 300

Balancing the Trade-Offs between Data Availability and Query Delay in MANETs

Umar I. Masumdar, N. S. Killarikar

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 10 Issue 1, January 2021

Pages: 594 - 596

A Review of Replication Strategies to Increase Data Availability for Data Intensive Applications in Cloud

K. Sreelatha

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 2473 - 2476

A Survey in Wireless Networks to Enhance An Error Minimization Framework For Localizing Jammers

Sneha V.Tiwari, Trupti Dange

Share this Article
Top