Big Data and Hadoop

Duration (in days): 

3

Description:

Big Data means different things to different people. In this course, we'll ensure that you have a solid defiition based on fundamenal computational theory. We will also go through Hadoop and show the Hadoop infrastructure.

Objectives:

  • Undestand Big Data
  • Understand distributed architectures
  • Understand Hadoop
  • Undderstand and master how to write a Map Reduce algorithm
  • Understand how to use Apache Pig
  • Understand how to use Apache Hive

Prerequisites:

Big Data means different things to different people. In this course, we'll ensure that you have a solid defiition based on fundamenal computational theory. We will also go through Hadoop and show the Hadoop infrastructure.

Audience

  • Programmers
  • Architects
  • Data Engineers
  • Data Scientists
  • Managers that want to understand the value of Hadoop and Big Data

Outline

Big Data

  • What is Big Data? 

  • Why horizontal scaling?

  • The fundamental problems, theories and solutions in distributed computing

  • What is the CAP theorem and why is it important?

  • Principles of distributed computing

Map Reduce

  • How does Map Reduce fit into the Big Data Picture?

  • Map Reduce in Java

Hadoop

  • What is Hadoop?

  • Why Hadoop?

  • Building Map Reduce in Hadoop

Apache Hive

  • What is Hive?

  • Why Hive?

  • Hive SQL

  • Hive Joins

  • HCatalog

  • Performance tips in Hive

Apache Pig

  • What is Pig?

  • Why Pig?

  • Pig Latin

  • Advanced Pig

  • Pig joins

  • Pig performance

Related tools and a comparison

  • Apache Spark

  • HBase

  • Kafka

  • Flume

© 2020 Northscaler