This advanced course provides Java programmers a deep-dive into Hadoop application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop using the Hortonworks Data Platform, including how to implement combiners, partitioners, secondary sorts, custom input and output formats, joining large datasets, unit testing, and developing UDFs for Pig and Hive. Labs are run on a 7-node cluster running in a virtual machine that students can keep for use after the training.
Before taking this course, students must have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Gradle. No prior Hadoop knowledge is required.
4 Days/Lecture & Lab
This course is designed for experienced Java software engineers who need to develop Java MapReduce applications for Hadoop.
Configuring a Hadoop Development Environment
- Putting data into HDFS using Java
- Write a distributed grep MapReduce application
- Write an inverted index MapReduce application
- Configure and use a combiner
- Writing custom combiners and partitioners
- Globally sort output using the TotalOrderPartitioner
- Writing a MapReduce job to sort data using a composite key
- Writing a custom InputFormat class
- Writing a custom OutputFormat class
- Compute a simple moving average of stock price data
- Use data compression
- Define a RawComparator
- Perform a map-side join
- Using a Bloom filter
- Unit testing a MapReduce job
- Importing data into HBase
- Writing an HBase MapReduce job
- Writing User-Defined Pig and Hive functions
- Defining an Oozie workflow