International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
www.ijsr.net | Most Trusted Research Journal Since Year 2012

ISSN: 2319-7064



Survey Paper | Computer Science & Engineering | India | Volume 3 Issue 12, December 2014

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Although there has been a lot work done on identifying duplicates in relational data, but only a few solutions focus on identifying duplicates in more complex hierarchical structures, like XML data. In this paper, we have demonstrated the novel method for XML duplicate detection, called XMLDup. XMLDup method implements the Bayesian network to calculate and determine the probability of two XML nodes, considering not only the information within the XML nodes, but also the way that the information is structured. In addition, to increase the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments and comparisons, we show that our algorithm is able to achieve high precision and recall scores in several datasets. XMLDup method helps us to improve both efficiency and of effectiveness.

Keywords: duplicate detection, record linkage, entity resolution, XML, Bayesian networks, data cleaning, optimization

Edition: Volume 3 Issue 12, December 2014

Pages: 751 - 754


How to Cite this Article?

Nikhil Gawande, S. R. Todamal, "A Survey on Duplicate Detection in Hierarchical Data", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=SUB14438, Volume 3 Issue 12, December 2014, 751 - 754

28 PDF Views | 29 PDF Downloads

Download Article PDF



Similar Articles with Keyword 'duplicate detection'

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 2676 - 2680

Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm

Shital Gaikwad, Nagaraju Bogiri

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 1775 - 1778

Efficient and Robust Detection of Duplicate Videos in a Large Database: A Survey

Soni R. Ragho, C. S. Biradar

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 7, July 2015

Pages: 721 - 723

Techniques for Duplicate Detection in Hierarchical Data

Suvarna Kale, Basha Vankudothu

Share this article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015

Pages: 1217 - 1219

Removing Dedepulication Using Pattern Serach Suffix Arrays

Pratiksha Dhande, Supriya Kumari, Sushmita Tupe, Laukik Shah

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 751 - 754

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Share this article



Similar Articles with Keyword 'record linkage'

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015

Pages: 2296 - 2300

Clustering Tree based Implementation of Record Linkage on Many-to-Many Relation

V. Balvannanathan, R. Siva

Share this article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015

Pages: 1217 - 1219

Removing Dedepulication Using Pattern Serach Suffix Arrays

Pratiksha Dhande, Supriya Kumari, Sushmita Tupe, Laukik Shah

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 751 - 754

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Share this article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015

Pages: 757 - 760

Record Deduplication Approaches and Algorithm for Removing Duplicate Data

Nikita A. Pande, Namrata D. Ghuse

Share this article



Similar Articles with Keyword 'entity resolution'

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015

Pages: 2296 - 2300

Clustering Tree based Implementation of Record Linkage on Many-to-Many Relation

V. Balvannanathan, R. Siva

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 3 Issue 3, March 2014

Pages: 286 - 291

Fast and Accurate Incremental Entity Relationships

Rajeshkumar S, Geofrin Shirly S

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 751 - 754

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 6 Issue 11, November 2017

Pages: 330 - 333

Rule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD)

Liji S, Nithya M

Share this article



Similar Articles with Keyword 'XML'

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 2676 - 2680

Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm

Shital Gaikwad, Nagaraju Bogiri

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 12, December 2015

Pages: 1846 - 1849

An Innovative Technique to Detect Malicious Applications in Android

Sharvari Prakash Chorghe, Dr. Narendra Shekokar

Share this article

Research Paper, Computer Science & Engineering, India, Volume 3 Issue 6, June 2014

Pages: 2094 - 2098

An Improved Web Mining Technique to Fetch Web Data Using Apriori and Decision Tree

Rupinder Kaur, Kamaljit Kaur

Share this article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 5, May 2015

Pages: 3021 - 3028

Smart Type-Ahead Search in XML

Supriya. N. Chaudhari, Vaishali M. Deshmukh

Share this article

Comparative Studies, Computer Science & Engineering, India, Volume 3 Issue 6, June 2014

Pages: 2099 - 2104

Comparative Study of Web Content Mining Techniques for HTML and XML Contents

Rupinder Kaur, Kamaljit Kaur

Share this article



Similar Articles with Keyword 'Bayesian networks'

Research Paper, Computer Science & Engineering, China, Volume 9 Issue 4, April 2020

Pages: 1544 - 1554

Recent Developments on Probabilistic Graphical Model Applied in Data Analysis

Kan'Sam Nadjak, Guisheng Yin

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 751 - 754

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal

Share this article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 5, May 2015

Pages: 2058 - 2061

Medical Diagnosis for Liver Cancer using Classification Techniques

Reetu, Narender Kumar

Share this article

Research Paper, Computer Science & Engineering, India, Volume 3 Issue 2, February 2014

Pages: 83 - 87

Exploring Mutational Pathways of HIV Using Genetic Algorithm

K. M. Monica

Share this article



Similar Articles with Keyword 'data cleaning'

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 2525 - 2528

Efficient Technique for Network Lifetime Enhancement by Cleaning Dirty Data

Komal V. Shiyale, Pranay D. Saraf

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 3 Issue 3, March 2014

Pages: 286 - 291

Fast and Accurate Incremental Entity Relationships

Rajeshkumar S, Geofrin Shirly S

Share this article

Research Paper, Computer Science & Engineering, India, Volume 3 Issue 7, July 2014

Pages: 1797 - 1802

Tracing Visitors Online Behaviors by Using VOB Algorithm for Effective Web Usage Mining

Jyoti Ashokkumar Aidale, Sonali Rangdale

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 1850 - 1856

A Review on Detection of Outliers Over High Dimensional Streaming Data Using Cluster Based Hybrid Approach

Abhishek B. Mankar, Namrata Ghuse

Share this article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 1, January 2015

Pages: 2180 - 2182

Review of Improved Cross Redundant Data Cleaning Algorithm for RFID and WSN Integration

Jayashri M. Dupare, N. U. Sambhe

Share this article



Similar Articles with Keyword 'optimization'

Research Paper, Computer Science & Engineering, India, Volume 9 Issue 6, June 2020

Pages: 1151 - 1153

Search Engine Optimization (SEO) Techniques

Kanika Arora

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 5 Issue 4, April 2016

Pages: 2038 - 2044

Fragmentation and Duplication of Data for the Best Cloud Performance and Security (FDBPS)

Prof. Jaya Kumar B L, Saranya C

Share this article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 5, May 2015

Pages: 3174 - 3177

Optimizing Dynamic Dependence Graph

Toshi Sharma, Madhuri Sharma

Share this article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 926 - 930

Memetic Algorithm: Hybridization of Hill Climbing with Replacement Operator

Gagandeep Sharma, Naveen Kumar, Ashu Khokhar

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 1422 - 1426

Incremental CFP-Tree Optimization For Efficient Representative Pattern Set Mining

Vivek Satpute, Prof. Digambar Padulkar

Share this article
Top