Introduction to Hadoop Development

You will learn how to use Apache Hadoop and write MapReduce programs. You will begin with a quick overview of installing Hadoop, setting it up in a cluster, and then proceed to writing data analytic programs. The course will present the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. The course will further examine related technologies such as Hive, Pig, and Apache Accumulo. Apache Accumulo is a highly scalable structured store based on Google's BigTable, written in Java and operates over the Hadoop Distributed File System (HDFS). Hive is data warehouse software for querying and managing large datasets. Pig is a platform to take advantage of parallelization when running data analysis. Finally, you will observe how Hadoop works in and supports cloud computing and explore examples with Amazon Web Services and case studies.This class is focused on the Hadoop 2.0 (pre-)release.This course is approximately 40% lecture and 60% hands-on labs.
Introduction to Java - Experience developing Java with EclipseIntroduction to Unix - Exposure to bash or tcsh shell useData Persistence with JPA 2 - Experience using JPA and data access
5 Days/Lecture & Lab
  • What is Hadoop?
  • Starting Hadoop
  • Components of Hadoop
  • Writing basic MapReduce programs
  • Advanced MapReduce
  • Programming Practices
  • Cookbook
  • Managing Hadoop
  • Running Hadoop in the cloud
  • Programming with Pig
  • Overview Hadoop Related Technologies
  • Case studies

Related Scheduled Courses