Loading Course Schedule...
PT27490
Training Summary
Hadoop is a mature Big Data environment, with Hive is the de-facto standard for the SQL interface. Today, the computations in Hadoop are usually done with Spark. Spark offers an optimized compute engine that includes batch, and real-time streaming, and machine learning.
Prerequisites
Before attending this course, students should understand the basics of SQL and Python, as well as have prior exposure to software design.
Duration
5 Days/Lecture & Lab
Audience
The audience for this class includes Business Analysts, Software Developers, and Managers.
Course Topics
- Why Hadoop?
- The Hadoop Platform
- Hive Basics
- New in Hive 3
- HBase
- Sqoop
- The Big Picture
- Spark Introduction
- First Look at Spark
- Spark Data Structures
- Caching
- DataFrames and Datasets
- Spark SQL
- Spark and Hadoop
- Spark API
- Spark ML Overview
- GraphX
- Spark Streaming