HDP Analyst Data Science

Catalog Home Databases, Business Intelligence & Data Science Big Data
Your Training Location:  

This course provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Before taking this course, students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course.

3 Days/Lecture & Lab

This course is designed for architects, software developers, analysts, and data scientists who need to apply data science and machine learning on Hadoop.

Setting Up a Development Environment

  • Block Storage
  • Using HDFS Commands
  • MapReduce
  • Using Apache Mahout for Machine Learning
  • Apache Pig
  • Getting Started with Apache Pig
  • Exploring Data with Pig
  • Using the IPython Notebook
  • The NumPy Package
  • The pandas Library
  • Data Analysis with Python
  • Interpolating Data Points
  • Defining a Pig UDF in Python
  • Streaming Python with Pig
  • Classification with Scikit-Learn
  • Computing K-Nearest Neighbor
  • Generating a K-Means Clustering
  • POS Tagging Using a Decision Tree
  • Using NLTK for Natural Language Processing
  • Classifying Text using Naive Bayes
  • Using Spark Transformations and Actions
  • Using Spark MLlib
  • Creating a Spam Classifier with MLlib

< >

Copyright © 2020 ProTech. All Rights Reserved.

Sign In Create Account


Social Media