This class addresses the transition from working successfully on a single server or experimenting with a minimal cluster to achieving successful, reliable, repeatable use of larger Dask compute clusters. We focus on a deep dive into all of the critical components in a distributed Dask cluster, how they work together, and how you can configure them to maximize throughput and minimize costs.
Students should have experience in Python and Pandas and/or SQL programming, both at a basic level.
1 Day/Lecture & Lab
This course is intended for engineers or data scientists who typically work with large data clusters.
- Distributed Dask: Cast of Characters
- Basic Operation of Dask Clusters
- Distributed Data
- Resource usage and Resilience
- Best Practices, Debugging
- Use Case Example: Orchestrating Batch ML Scoring
- Q & A Discussion