Hadoop+Spark

PT27490
Training Summary
Hadoop is a mature Big Data environment, with Hive is the de-facto standard for the SQL interface. Today, the computations in Hadoop are usually done with Spark. Spark offers an optimized compute engine that includes batch, and real-time streaming, and machine learning.
Prerequisites
Before attending this course, students should understand the basics of SQL and Python, as well as have prior exposure to software design.
Duration
5 Days/Lecture & Lab
Audience
The audience for this class includes Business Analysts, Software Developers, and Managers.
Course Topics
  • Why Hadoop?
  • The Hadoop Platform
  • Hive Basics
  • New in Hive 3
  • HBase
  • Sqoop
  • The Big Picture
  • Spark Introduction
  • First Look at Spark
  • Spark Data Structures
  • Caching
  • DataFrames and Datasets
  • Spark SQL
  • Spark and Hadoop
  • Spark API
  • Spark ML Overview
  • GraphX
  • Spark Streaming

Related Scheduled Courses