Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan cluster deployment and growth, how to install, maintain, monitor, troubleshoot and optimize Hadoop. They will also practice cluster bulk data load, get familiar with various Hadoop distributions, and practice installing and managing Hadoop ecosystem tools. The course finishes off with discussion of securing cluster with Kerberos.Format: Lectures and hands on labs. (50% lecture + 50% labs). Pace of the class is determined by the students.
Before taking this course, students should have the following skills:Be comfortable with basic Linux system administrationBasic scripting skillsKnowledge of Hadoop and Distributed Computing is not required, but will be introduced and explained in the course.
3-4 Days/Lecture & Lab
This course is designed for Hadoop administrators.
- Planning and Installation
- HDFS Operations
- Data Ingestion
- MapReduce Operations and Administration
- YARN New Architecture and New Capabilities
- Advanced Topics
- Optional Tracks