Apache Spark
Duration (in days):
3
Description:
Apache Spark has become a very popular computational framework for processing big data and streaming information. With it's rich set of libraries and highly optimized computational model, companies are now able to process massive amounts of information and assemble insights at record time with the Apache Spark Machine Learning and Graph libraies.
In this course, we'll bring you up to speed on Apache Spark and its libraries (Such as MLLib and GraphX).
Objectives:
Learn how Apache Spark achieves near linear horizontal scale
Learn the fundamenal principles of assembling a distributed algorithm
Learn about Spark's RDD's, DataFrames, and DataSets
Learn Spark Streaming (as well as Structured Streaming)
Learn how to use the Spark MLLib to build machine learning algorithms
Learn how to use GraphX to build graph-algorithms
Prerequisites:
Apache Spark has become a very popular computational framework for processing big data and streaming information. With it's rich set of libraries and highly optimized computational model, companies are now able to process massive amounts of information and assemble insights at record time with the Apache Spark Machine Learning and Graph libraies.
In this course, we'll bring you up to speed on Apache Spark and its libraries (Such as MLLib and GraphX).
Audience
- Asipiring Spark Programmers
- Information Architects
- Data Analysts
- Data Engineers
- Data Scientists
Outline
The fundamentals
What is Big Data?
Why horizontal scaling?
The fundamental problems, theories and solutions in distributed computing
What is Spark?
Why Spark?
Resilient Distributed Dataset
Functional programming in.a nutshell
Programming with RDD
Building distributed algorithms using RDD
How and why it works?
DataFrames and DataSets
What are DataFrames?
What are DataSets?
Spark SQL
Building distributed algorithms with DataFrames and DataSets
Streaming in Spark
What is streaming?
How does Spark solve Streaming?
Structured Streaming vs Spark Streaming
Streaming from Kafka
Other streaming platforms
Distributed algorithms and Spark Streaming
MLLib
An introduction to Machine Learning
MLLib
Machine learning use cases
Building machine learning pipelines
GraphX
Graph Theory and Algorithms
What is GraphX?
Some common graph problems
Examples of GraphX solutions