Downloads: 113 | Views: 174 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1
Survey Paper | Information Technology | India | Volume 3 Issue 11, November 2014
A Survey on an Efficient Data Caching Mechanism for Big Data Application
Shakil B. Tamboli | Smita Shukla Patel
Abstract: Now a day Big Data has turned the attention of the academia and IT industry towards it to think due to information is generated and collected at unimaginable rate that rapidly exceeds the very large range. The easy availability of big data indicated the need to manage it for useful purpose, business need, scientific research, future predictions for community welfare, lifestyle enhancements etc. The observations stated by researchers are that Google processes 50TB, Twitter 20TB data everyday which is huge in volume, velocity and variety. To streamline the big data issues several solutions are developed on high computing machines and through large scale nodes with the help of distributed processing technologies and software tools like MapReduce from Google, Hadoop of Apache foundation and its eco system. However it is also the observation that these technologies and tools needs to be modified due to problems exists such as not making usefulness of intermediate data, inefficiency in output, increased overhead on processors, inefficient storage technologies and poor security. The purpose of this survey is to appreciate and think for probable enhancements that can be possible for the forthcoming requirements of future. One of the observations among various issues of enhancements for big data where in this paper concentration is provided that large amount of intermediate data generated by map and reduce operation is not used when task finish and thrashed away as well as incremental computations are not treated well by the existing cache mechanism. Hence the research will be done to use cache mechanism efficiently to optimize computational time and reduce storage overhead for real time data over the distributed file system (DFS). The survey focuses on big data domain orientation, the technologies applied for execution of big data applications and its eco system, literature survey from various existing practices towards improvements in optimization of computational time and reduction in space of storage system as well as to improve the performance, efficiency, scalability and architecture and proposed new system architecture to achieve above aspects.
Keywords: Google, MapReduce, Hadoop, Cache Mechanism, Distributed File System
Edition: Volume 3 Issue 11, November 2014,
Pages: 1242 - 1247