Scaling Python with Dask

PT26956
Summary
If you’ve taken your data skills from zero to one with PyData (Pandas, Scikit-Learn, and friends) then this class will help you use larger data sets that won’t fit in memory and will help you distribute your workloads to accelerate your code with Dask. During the first two sessions, you’ll learn to use Python skills you already have to query and transform data, build models, and scale your custom code. During the last two sessions, we’ll peek under the hood to learn how Dask works and “look inside” with real-time animated dashboards. We’ll cover options for deploying clusters, troubleshooting, sample use cases, and best practices. This entire class is delivered through interactive, web-based JupyterLab notebooks that you can keep and refer to whenever you need. You’ll also receive 10,000 Coiled Cloud credits per month for 3 months so you can continue your learning journey without limitations.
Prerequisites
  • Python, basic level
  • PyData stack (Pandas, NumPy, scikit-learn), basic level
Duration
4 Half Days/Lecture & Lab
Audience
If you’ve taken your data skills from zero to one with PyData (Pandas, Scikit-Learn, and friends) then this class will help you use larger data sets that won’t fit in memory and will help you distribute your workloads to accelerate your code with Dask.
Topics
  • Introduction
  • Parellelize Python Code
  • Dask Dataframe
  • Dask Array
  • Scaling Your Own Code
  • Graphical User Interfaces
  • Machine Learning
  • Thinking About Distributed Deployment
  • Distributed Dask: Cast of Characters
  • Basic Operation of Dask Clusters
  • Tasks
  • Distributed Data
  • Resource usage and Resilience
  • Debugging
  • Use Case Example: Orchestrating Batch ML ::Scoring [optional, per timing]
  • Best Practices

Related Scheduled Courses