Comprehensive Programming for Apache Spark 2.3 Instructor-Led Training Course

Loading Course Schedule...

Add To Watch List Request In Your Area Request Private Class View PDF

PT15260

Training Summary

This three day training course will teach you how to harness Apache Spark 2.3 for large scale data analysis, building big data applications and data processing pipelines. You will learn how to program Spark as efficiently and effectively as possible, by targeting the latest version of the platform (Spark 2.3), and learning the modern approach necessary to fully leverage the advantages it offers. The entirety of the course is taught hands-on, using real code and interactive examples. In addition, longer labs allow attendees to work together to apply their growing Spark knowledge to solve common challenges faced by organizations running complex Big Data applications in production. While we’re enthusiastic about many of the products in the Big Data ecosystem, the focus of this training course is to make you as proficient and effective as possible with open source Apache Spark, enabling you to apply the fundamental skills gained to whichever products and tools work best for you. Targeting the latest version of the Spark platform, Apache Spark 2.3, will teach you how to optimize your Spark code to fully leverage the internal changes that make Spark 2.3 faster and more effective. At the same time, this training course will help prepare you for the future of the platform, by teaching you the modern approach to Spark programming required by future releases of the platform.

Prerequisites

Before taking this course, attendees should have some knowledge of SQL and some background programming in Python, Java, Scala, or R.

Duration

3 Days/Lecture & Lab

Audience

Data analysts, engineers, and scientists who want to conduct analytics with Big Data or build end-to-end applications and data processing pipelines.

Course Topics

Introduction

DataFrame/Dataset and SQL Analytics

Machine Learning Overview

Streaming Overview

RDDs and Deep Dive Part 1

Catalyst/Tungsten and Deep Dive Part 2

Deployment Overview

Apache Spark Streaming in Depth

Machine Learning

Comprehensive Programming for Apache Spark 2.3

Loading Course Schedule...

Related Scheduled Courses

GSA #GS-35F-0486W