This three day training course will teach you how to harness Apache Spark 2.3 for large scale data analysis, building big data applications and data processing pipelines. You will learn how to program Spark as efficiently and effectively as possible, by targeting the latest version of the platform (Spark 2.3), and learning the modern approach necessary to fully leverage the advantages it offers. The entirety of the course is taught hands-on, using real code and interactive examples. In addition, longer labs allow attendees to work together to apply their growing Spark knowledge to solve common challenges faced by organizations running complex Big Data applications in production. While we’re enthusiastic about many of the products in the Big Data ecosystem, the focus of this training course is to make you as proficient and effective as possible with open source Apache Spark, enabling you to apply the fundamental skills gained to whichever products and tools work best for you. Targeting the latest version of the Spark platform, Apache Spark 2.3, will teach you how to optimize your Spark code to fully leverage the internal changes that make Spark 2.3 faster and more effective. At the same time, this training course will help prepare you for the future of the platform, by teaching you the modern approach to Spark programming required by future releases of the platform.
Before taking this course, attendees should have some knowledge of SQL and some background programming in Python, Java, Scala, or R.
3 Days/Lecture & Lab
Data analysts, engineers, and scientists who want to conduct analytics with Big Data or build end-to-end applications and data processing pipelines.
- DataFrame/Dataset and SQL Analytics
- Machine Learning Overview
- Streaming Overview
- RDDs and Deep Dive Part 1
- Catalyst/Tungsten and Deep Dive Part 2
- Deployment Overview
- Apache Spark Streaming in Depth
- Machine Learning