Hortonworks HDP Developer Apache Pig and Hive

Catalog Home Databases, Business Intelligence & Data Science Big Data

This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.

Before taking this course, students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

4 Days/Lecture & Lab

This course is designed for software developers who need to understand and develop applications for Hadoop.

Use HDFS commands to add/remove files and folders

  • Use Sqoop to transfer data between HDFS and a RDBMS
  • Run MapReduce and YARN application jobs
  • Explore, transform, split and join datasets using Pig
  • Use Pig to transform and export a dataset for use with Hive
  • Use HCatLoader and HCatStorer
  • Use Hive to discover useful information in a dataset
  • Describe how Hive queries get executed as MapReduce jobs
  • Perform a join of two datasets with Hive
  • Use advanced Hive features: windowing, views, ORC files
  • Use Hive analytics functions
  • Write a custom reducer in Python
  • Analyze clickstream data and compute quantiles with DataFu
  • Use Hive to compute ngrams on Avro-formatted files
  • Define an Oozie workflow
  • Use Spark Core to read files and perform data analysis
  • Create and join DataFrames with Spark SQL




< >

Recently Viewed Courses:

Copyright © 2018 ProTech. All Rights Reserved.

Sign In Create Account

Navigation

Social Media