Processing Unstructured Data and Dask Bag

PT26965
Training Summary
This class module focuses on Dask Bag, a functional-programming pattern for distributed computation over unstructured or heterogeneous data. Dask Bag is useful for initial processing of unstructured text, large collections of heterogeneous business records which require special processing, images or diagrams, etc. The class focuses on functional style, the Bag API, and best practices.
Prerequisites
Students should have experience in Python at a basic to intermediate level. Additionally, some knowledge of functional programming is helpful but not required.
Duration
1 Day/Lecture & Lab
Audience
This course is intended for engineers or data scientists who typically work with large data collections.
Course Topics
  • Introduction
  • Core Bag APIs and Operations
  • Best Practices

Related Scheduled Courses