Apache Hadoop Big Data Training

by Tertiary Infotech Pte. Ltd

These are the exercise files used for Apache Hadoop Big Data Training course.

The course outline can be found in

https://www.tertiarycourses.com.sg/apache-hadoop-big-data-training.html

https://www.tertiarycourses.com.my/apache-hadoop-big-data-training-malaysia.html

Day1

Module 1: Get Started on Apache Hadoop

Why Hadoop?
Differnece between HBase and Hadoop

Module 2: Hadoop Core Components

Java Virutal Machine (JVM)
HDFS
Hadoop Cluster Components
Exploring Hadoop Platforms

Module 3: Setup Hadoop Development Environment

Setup Cloudera Hadoop VM
Adding Hadoop LIbraries
Programming Languages

Module 4: MapReduce 2.0/YARN

What is MapReduce?
MapReduce Components
MapReduce on HDFS

Module 5: Hive

What is Hive?
Hive Queries
Analyzing data with Hive

Day 2

Module 6: Pig

What is Pig
Pig Data types
Pig Commands

Module 7: Connectors and Workflows

Introducing Sqoop
Importing Data with Sqoop
Introuducing Flume
Importing Data with Sqoop
Introducing Zookeeper
Using Zookeeper to co-ordindate workflow
Introducing Oozie
Scheduling jobs using Oozie

Module 8: Exploring Other Hadoop Libraries

Introducing Impala
Introducing Mahout
Introduing Storm

Module 8: Apache Spark Basics

Why Apache Spark?
Apache Spark Components
Apache Spark Commmands