These are the exercise files used for Apache Hadoop Big Data Training course.
The course outline can be found in
https://www.tertiarycourses.com.sg/apache-hadoop-big-data-training.html
https://www.tertiarycourses.com.my/apache-hadoop-big-data-training-malaysia.html
Module 1: Get Started on Apache Hadoop
- Why Hadoop?
- Differnece between HBase and Hadoop
Module 2: Hadoop Core Components
- Java Virutal Machine (JVM)
- HDFS
- Hadoop Cluster Components
- Exploring Hadoop Platforms
Module 3: Setup Hadoop Development Environment
- Setup Cloudera Hadoop VM
- Adding Hadoop LIbraries
- Programming Languages
Module 4: MapReduce 2.0/YARN
- What is MapReduce?
- MapReduce Components
- MapReduce on HDFS
Module 5: Hive
- What is Hive?
- Hive Queries
- Analyzing data with Hive
Module 6: Pig
- What is Pig
- Pig Data types
- Pig Commands
Module 7: Connectors and Workflows
- Introducing Sqoop
- Importing Data with Sqoop
- Introuducing Flume
- Importing Data with Sqoop
- Introducing Zookeeper
- Using Zookeeper to co-ordindate workflow
- Introducing Oozie
- Scheduling jobs using Oozie
Module 8: Exploring Other Hadoop Libraries
- Introducing Impala
- Introducing Mahout
- Introduing Storm
Module 8: Apache Spark Basics
- Why Apache Spark?
- Apache Spark Components
- Apache Spark Commmands