Big Data and Hadoop

Duration (in days):

3 Description:

Big Data means different things to different people. In this course, we'll ensure that you have a solid defiition based on fundamenal computational theory. We will also go through Hadoop and show the Hadoop infrastructure.

Objectives:

Undestand Big Data
Understand distributed architectures
Understand Hadoop
Undderstand and master how to write a Map Reduce algorithm
Understand how to use Apache Pig
Understand how to use Apache Hive

Prerequisites:

Big Data means different things to different people. In this course, we'll ensure that you have a solid defiition based on fundamenal computational theory. We will also go through Hadoop and show the Hadoop infrastructure.

Audience

Programmers
Architects
Data Engineers
Data Scientists
Managers that want to understand the value of Hadoop and Big Data

Outline

Big Data

What is Big Data?
Why horizontal scaling?
The fundamental problems, theories and solutions in distributed computing
What is the CAP theorem and why is it important?
Principles of distributed computing

Map Reduce

How does Map Reduce fit into the Big Data Picture?
Map Reduce in Java

Hadoop

What is Hadoop?
Why Hadoop?
Building Map Reduce in Hadoop

Apache Hive

What is Hive?
Why Hive?
Hive SQL
Hive Joins
HCatalog
Performance tips in Hive

Apache Pig

What is Pig?
Why Pig?
Pig Latin
Advanced Pig
Pig joins
Pig performance

Related tools and a comparison

Apache Spark
HBase
Kafka
Flume