Machine Learning with Apache Spark

Spark is a new and very popular Big Data processing engine. Spark MLLib is a de facto standard for machine learning in Big Data. This course is intended for data scientists and software engineers. It maintains an optimal balance of theory and practice. For each machine learning concept, we first discuss the foundations, its applicability and limitations. Then we explain the implementation and use, and specific use cases. This is achieved through a combination of about 50% lecture, 50% lab work.

4 Days/Lecture & Lab

This course is designed for data scientists and software engineers.

  • Introductions and overviews
  • SVM (Supervised Vector Machines)
  • Logistic Regression
  • Linear regression
  • Naive Bayes
  • Decision Trees
  • Clustering (K-Means)
  • LDA (Latent Dirichlet Allocation)
  • Principal Component Analysis (PCA)
  • Recommendation (Collaborative filtering)
  • Graphs – graph operations
  • Graphs – optimizations with Pregel

Before taking this course, students should have a familiarity with programming in at least one language and be able to navigate Linux command line. Student should also have a basic knowledge of command line Linux editors (VI / nano).

Copyright © 2017 ProTech. All Rights Reserved.

Sign In Create Account


Social Media