If you have gone through part 1 of this series, you have used Cloudera's Hadoop Quickstart VM to setup a working instance of Hadoop and have running instance of various Hadoop services. Now it is a good time to go back to a little theory and see how different pieces fit with each other. HortonWorks, another Hadoop distributor, has got an excellent tutorial for Hadoop and each of its accompanying services, which you see in below image (taken from the above URL). You can go to this page and read about Hadoop in detail. However I am going to summarize and simplify some of the content and definitions which make it easy for a beginner to quickly understand and proceed. OK, so this is the definition of Hadoop on HortonWorks site: Apache Hadoop® is an open source framework for distributed storage and processing of large sets of data on commodity hardware. You will agree that biggest challenges in any computing are very basic: storage and processing. Processing could be any o...