Processing Unstructured Data and Dask Bag

Training Summary
This class module focuses on Dask Bag, a functional-programming pattern for distributed computation over unstructured or heterogeneous data. Dask Bag is useful for initial processing of unstructured text, large collections of heterogeneous business records which require special processing, images or diagrams, etc. The class focuses on functional style, the Bag API, and best practices.
Students should have experience in Python at a basic to intermediate level. Additionally, some knowledge of functional programming is helpful but not required.
1 Day/Lecture & Lab
This course is intended for engineers or data scientists who typically work with large data collections.
Course Topics
  • Introduction
  • Core Bag APIs and Operations
  • Best Practices

Related Scheduled Courses