Downloads: 123 | Views: 152
Survey Paper | Computer Science & Engineering | India | Volume 5 Issue 1, January 2016
A Methodological Survey on MapReduce for Identification of Duplicate Images
Amol S. Deshmukh | Prof. P. D. Lambhate
Abstract: Duplicate image identification for deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data in storage. The technique is used to improve storage utilization by avoiding duplicate data. With the explosive growth of digital data, deduplication schemes are widely employed to backup data and minimize network and storage overhead by detecting and eliminating redundancy among data. In this paper we propose the Duplicate image identification using MapReduce technique which improves efficiency and reliability of the System. MapReduce is simple and parallel computing techniques normally used for analyzing the huge data. Traditional deduplication schemes works if and only if the second image having the same underlying bits as first. This restricts the performance of many applications as exact images need to be there if want to succeed. In many practical applications where the storage restriction is present, users uploads the modified images varying with the quality or resolution. Experimental results demonstrate in a real dataset, the proposed approach not only effectively saves storage space, but also significantly improves the retrieval precision of duplicate images. In addition, the selection of the images can meet the requirements of peoples perception.
Keywords: Duplicate image identification, Deduplication, MapReduce technique, big data, data partitioning, Pearson Correlation
Edition: Volume 5 Issue 1, January 2016,
Pages: 206 - 210