Apache Spark 2.0 Analyzing the City of San Francisco's Open Data



In this hands-on tutorial presented at Code for San Francisco we'll look at how to use Apache Spark to analyze datasets published by the City of San Francisco, through SF Open Data.

The workshop will focus on how to use Spark SQL and DataFrames to retrieve insights and visualizations from fire service calls made to the San Francisco Fire Department on July 4th of this year. The demos and labs are targeted for an audience with some general programming or SQL query experience, but little to no experience with Spark.

We'll begin with some brief theory and lecture on Spark, before diving into several demos where we'll perform visualizations and analysis on the data.

Try the labs yourself:
  1. Sign-up for Databricks Community Edition (free)
  2. Download the labs
  3. Login to Databricks CE, and import the labs

To follow along, view the static HTML version of the material covered.


To dive deeper into Apache Spark, check out ProTech's upcoming public classes, and private training courses, delivered onsite for your team.

Published July 13, 2016