Big Data and Hadoop
Duration (in days):
3
Description:
Big Data means different things to different people. In this course, we'll ensure that you have a solid defiition based on fundamenal computational theory. We will also go through Hadoop and show the Hadoop infrastructure.
Objectives:
- Undestand Big Data
- Understand distributed architectures
- Understand Hadoop
- Undderstand and master how to write a Map Reduce algorithm
- Understand how to use Apache Pig
- Understand how to use Apache Hive
Prerequisites:
Big Data means different things to different people. In this course, we'll ensure that you have a solid defiition based on fundamenal computational theory. We will also go through Hadoop and show the Hadoop infrastructure.
Audience
- Programmers
- Architects
- Data Engineers
- Data Scientists
- Managers that want to understand the value of Hadoop and Big Data
Outline
Big Data
What is Big Data?
Why horizontal scaling?
The fundamental problems, theories and solutions in distributed computing
What is the CAP theorem and why is it important?
Principles of distributed computing
Map Reduce
How does Map Reduce fit into the Big Data Picture?
Map Reduce in Java
Hadoop
What is Hadoop?
Why Hadoop?
Building Map Reduce in Hadoop
Apache Hive
What is Hive?
Why Hive?
Hive SQL
Hive Joins
HCatalog
Performance tips in Hive
Apache Pig
What is Pig?
Why Pig?
Pig Latin
Advanced Pig
Pig joins
Pig performance
Related tools and a comparison
Apache Spark
HBase
Kafka
Flume