• No products in the cart.

ratings 

Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, it offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce.

PRIVATE
Course Access

Unlimited Duration

Last Updated

March 5, 2021

Students Enrolled

Total Reviews

Posted by
Certification

This “skills-centric” course is about 50% hands-on lab and 50% lecture, designed to train attendees in core big data/ Spark development and use skills, coupling the most current, effective techniques with the soundest industry practices.In this course you will learn about:

· Spark Ecosystem

· Spark Shell

· Spark Data structures (RDD, DataFrame, Dataset)

· Spark SQL

· Modern data formats and Spark

· Spark API

· Spark & Hadoop & Hive

· Spark ML overview

· GraphX

· Time-permitting: Spark Streaming

Time-permitting: Optional Capstone Workshop (Time-Permitting)

Course Curriculum

    • Big data, Hadoop, Spark 00:00:00
    • Spark concepts and architecture 00:00:00
    • Spark components overview 00:00:00
    • Labs: installing and running Spark 00:00:00
    • Spark shell 00:00:00
    • Analyzing dataset – part 1 00:00:00
    • Labs: Spark shell exploration 00:00:00
    • Partitions 00:00:00
    • Distributed execution 00:00:00
    • Operations: transformations and actions 00:00:00
    • Labs: Unstructured data analytics using RDDs 00:00:00
    • Caching overview 00:00:00
    • Various caching mechanisms available in Spark 00:00:00
    • In memory file systems 00:00:00
    • Caching use cases and best practices 00:00:00
    • Labs: Benchmark of caching performance 00:00:00
    • DataFrames Intro 00:00:00
    • Loading structured data (JSON, CSV) using DataFrames 00:00:00
    • Using schema 00:00:00
    • Specifying schema for DataFrames 00:00:00
    • Labs: DataFrames, Datasets, Schema 00:00:00
    • Spark SQL concepts and overview 00:00:00
    • Defining tables and importing datasets 00:00:00
    • Querying data using SQL 00:00:00
    • Handling various storage formats: JSON, Parquet, ORC 00:00:00
    • Labs: querying structured data using SQL; evaluating data formats 00:00:00
    • Hadoop Primer: HDFS, YARN 00:00:00
    • Hadoop + Spark architecture 00:00:00
    • Running Spark on Hadoop YARN 00:00:00
    • Processing HDFS files using Spark 00:00:00
    • Spark & Hive 00:00:00
    • Overview of Spark APIs in Scala / Python 00:00:00
    • The lifecycle of a Spark application 00:00:00
    • Spark APIs 00:00:00
    • Deploying Spark applications on YARN 00:00:00
    • Labs: Developing and deploying a Spark application 00:00:00
    • Machine Learning primer 00:00:00
    • Machine Learning in Spark: MLib / ML 00:00:00
    • Spark ML overview (newer Spark2 version) 00:00:00
    • Algorithms overview: Clustering, Classifications, Recommendations 00:00:00
    • Labs: Writing ML applications in Spark 00:00:00
    • GraphX library overview 00:00:00
    • GraphX APIs 00:00:00
    • Create a Graph and navigating it 00:00:00
    • Shortest distance 00:00:00
    • Pregel API 00:00:00
    • Labs: Processing graph data using Spark 00:00:00
    • Spark Streaming 00:00:00
    • Workshop 00:00:00

Course Reviews

Profile Photo
ashar hafeez
0
62

Students

About Instructor

Pak

Course Events

[wplms_eventon_events]

More Courses by Insturctor

© 2021 Ernesto.  All rights reserved.  
X