Architecting Large-Scale Data Systems with Dask

PT26958
Training Summary
This class explores the best ways to leverage Dask within enterprise data architectures. Most enterprises make heavy use of elements core to Dask (e.g., data manipulation and machine learning); activities external to Dask (e.g., using SQL for reporting and data extraction); and activities orthogonal to Dask but still critical to the success of the overall system (e.g., data storage). Moreover, staff and skillsets are often different across these areas. We explore options and patterns for getting the best out of both Dask and non-Dask elements of the system.
Prerequisites
The following prerequisites are required for this course:
  • Python, basic level
  • JVM/Hadoop/Spark/Kafka ecosystem, basic level
  • Large-scale data storage patterns, basic level
  • Understanding of ML concepts and workflow, basic level
Duration
1 Day/Lecture & Lab
Audience
This course is intended for those who manage data storage.
Course Topics
  • Introduction
  • Integrating Data
  • Data Processing, ETL, and Feature Engineering
  • Data Output
  • Additional Goals, Challenges, and Opportunities

Related Scheduled Courses