Scaling Out: Effective Cluster Computing with Distributed Dask

PT26966
Summary
This class addresses the transition from working successfully on a single server or experimenting with a minimal cluster to achieving successful, reliable, repeatable use of larger Dask compute clusters. We focus on a deep dive into all of the critical components in a distributed Dask cluster, how they work together, and how you can configure them to maximize throughput and minimize costs.
Prerequisites
Students should have experience in Python and Pandas and/or SQL programming, both at a basic level.
Duration
1 Day/Lecture & Lab
Audience
This course is intended for engineers or data scientists who typically work with large data clusters.
Topics
  • Introduction
  • Distributed Dask: Cast of Characters
  • Basic Operation of Dask Clusters
  • Tasks
  • Distributed Data
  • Resource usage and Resilience
  • Best Practices, Debugging
  • Use Case Example: Orchestrating Batch ML Scoring
  • Q & A Discussion

Related Scheduled Courses