In this presentation from QCon SF, I'll give you a tour of some of the engines in Spark. We'll start off by discussing some of the internals that make Spark 10 - 100 times faster than Hadoop MapReduce and Hive, and then I'll jump into an example project where I'll demonstrate Spark's ability to rapidly process Big Data.
We'll also cover:
- Extracting information with RDDs
- Querying data using DataFrames
- Visualizing and plotting data
- Creating a machine-learning pipeline with Spark-ML and MLLib
The presentation will also give you a feel for how you can leverage Databricks - the cloud-based workspace from the original Spark team.
To learn more about Spark training for you or your team, check out our Spark Developer Bootcamp.