top of page

Apache Spark

Duration (in days): 

3

Apache Spark

Description:

Apache Spark has become a very popular computational framework for processing big data and streaming information. With it's rich set of libraries and highly optimized computational model, companies are now able to process massive amounts of information and assemble insights at record time with the Apache Spark Machine Learning and Graph libraies.

In this course, we'll bring you up to speed on Apache Spark and its libraries (Such as MLLib and GraphX). 

Objectives:

  • Learn how Apache Spark achieves near linear horizontal scale

  • Learn the fundamenal principles of assembling a distributed algorithm

  • Learn about Spark's RDD's, DataFrames, and DataSets

  • Learn Spark Streaming (as well as Structured Streaming)

  • Learn how to use the Spark MLLib to build machine learning algorithms

  • Learn how to use GraphX to build graph-algorithms

Prerequisites:

Apache Spark has become a very popular computational framework for processing big data and streaming information. With it's rich set of libraries and highly optimized computational model, companies are now able to process massive amounts of information and assemble insights at record time with the Apache Spark Machine Learning and Graph libraies.

In this course, we'll bring you up to speed on Apache Spark and its libraries (Such as MLLib and GraphX). 

Audience

  • Asipiring Spark Programmers
  • Information Architects
  • Data Analysts
  • Data Engineers
  • Data Scientists

Outline

The fundamentals

  • What is Big Data?

  • Why horizontal scaling?

  • The fundamental problems, theories and solutions in distributed computing

  • What is Spark?

  • Why Spark?

Resilient Distributed Dataset

  • Functional programming in.a nutshell

  • Programming with RDD

  • Building distributed algorithms using RDD

  • How and why it works?

DataFrames and DataSets

  • What are DataFrames?

  • What are DataSets?

  • Spark SQL

  • Building distributed algorithms with DataFrames and DataSets

Streaming in Spark

  • What is streaming?

  • How does Spark solve Streaming?

  • Structured Streaming vs Spark Streaming

  • Streaming from Kafka

  • Other streaming platforms

  • Distributed algorithms and Spark Streaming

MLLib

  • An introduction to Machine Learning

  • MLLib

  • Machine learning use cases

  • Building machine learning pipelines

GraphX

  • Graph Theory and Algorithms

  • What is GraphX?

  • Some common graph problems

  • Examples of GraphX solutions

bottom of page