Satwik Kumar Shiri, Satyam Thusu
Abstract: Machine learning methods often improve their accuracy by using models with more parameters trained on large numbers of data sets. Building such models on a single machine is often impractical because of expansive measure of calculation required. In this paper, we focus on developing a general technique for parallel programming of some of the machine learning algorithms. Our work is in distinct to the tradition in machine learning of designing ways to speed up a single algorithm at a time.We show that algorithms that fit the Statistical Query model can be composed in a certain summation form, which allows them to be effectively parallelized. The central idea of this approach is to allow a future programmer or user to accelerate machine learning applications.
Keywords: MapReduce, Machine Learning, Large Data Sets, Algorithms