This three day training course for Data Scientists and Analysts will teach you how to harness Apache Spark 2.3 for large scale data analysis, predictive modeling, and machine learning tasks. You will learn how to program Spark as efficiently and effectively as possible, by targeting the latest version of the platform, and learning the modern approach necessary to fully leverage the advantages it offers. The entirety of the course is taught hands-on, using real code and interactive examples. In addition, longer labs allow attendees to work together to apply their growing Spark knowledge to solve common challenges faced by organizations running complex Big Data applications in production. Both lectures and lab activities use real-world datasets, so that you can practice getting Apache Spark to work well in-spite of real-world challenges. You’ll also gain hands-on experience with performance tuning and troubleshooting. Apache Spark 2 brings a suite of new features and speed improvements – but it also works differently under the hood, and requires a slightly different approach to programming in-oder to get the most out of it. This course focuses entirely on Spark 2 and will teach you how to program for the latest version of Spark (currently Spark 2.3) in the most performant, most effective, and easiest way possible.
There are no prerequisites for this course.
3 Days/Lecture & Lab
This course is designed for data scientists or analysts involved in predictive modeling, who want to explore machine learning where data is too large for single-machine tools.
- DataFrame/Dataset and SQL Analytics
- Machine Learning Overview
- Streaming Overview
- Using Apache Spark with the ML / Predictive Analytics Process
- Understanding Apache Spark Job Performance
- Additional Spark ML Algorithms and Features
- Integrating Apache Spark with Other Machine Learning Systems
- Extending Spark ML
- Apache Spark Model Deployment Patterns
- Apache Spark Cluster Deployment Overview (Optional)