Apache Spark

PT10273
Summary
This class provides a solid foundation in apache spark. Spark is a next generation processing framework that provides from 10x to 100x performance increase over traditional map/reduce processing. In this course you will write both traditional batch processing and streaming applications.
Prerequisites
Scala or Python experience is recommended.
Duration
4 Days/Lecture & Lab
Audience
This course is designed for Developers who are tasked with writing Spark applications.
Topics
  • Spark Basics
  • The Hadoop Distributed File System
  • Spark and Hadoop
  • RDDs
  • Running Spark on a Cluster
  • Parallel Programming with Spark
  • Caching and Persistence
  • Writing Spark Applications
  • Spark Streaming
  • Common Spark Algorithms
  • Improving Spark Performance

Related Scheduled Courses