• No products in the cart.

ratings 

Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, it offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce.

PRIVATE
Course Access

Unlimited Duration

Last Updated

March 4, 2021

Students Enrolled

Total Reviews

Posted by
Certification

This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development focused on Spark and related tools.In this course you will learn about:

· The essentials of Spark architecture and applications

· How to execute Spark Programs

· How to create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)

· How to persist and restore data frames

· Essential NOSQL access

· How to integrate machine learning into Spark applications

How to use Spark Streaming and Kafka to create streaming applications

Course Curriculum

    • Hadoop Ecosystem 00:00:00
    • Hadoop YARN vs. Mesos 00:00:00
    • Spark vs. Map/Reduce 00:00:00
    • Spark with Map/Reduce: Lambda Architecture 00:00:00
    • Spark in the Enterprise Data Science Architecture 00:00:00
    • Spark Shell 00:00:00
    • RDDs: Resilient Distributed Datasets 00:00:00
    • Data Frames 00:00:00
    • Spark 2 Unified DataFrames 00:00:00
    • Spark Sessions 00:00:00
    • Functional Programming 00:00:00
    • Spark SQL 00:00:00
    • MLib 00:00:00
    • Structured Streaming 00:00:00
    • Spark R 00:00:00
    • Spark and Python 00:00:00
    • Coding with RDDs 00:00:00
    • Transformations 00:00:00
    • Actions 00:00:00
    • Lazy Evaluation and Optimization 00:00:00
    • RDDs in Map/Reduce 00:00:00
    • RDDs vs. DataFrames 00:00:00
    • Unified Dataframes (UDF) in Spark 2.0 00:00:00
    • Partitioning 00:00:00
    • Spark Sessions 00:00:00
    • Running Applications 00:00:00
    • Logging 00:00:00
    • RDD Persistence 00:00:00
    • DataFrame and Unified DataFrame Persistence 00:00:00
    • Spark Streaming 00:00:00
    • Ingesting data 00:00:00
    • Parquet Files 00:00:00
    • Relational Databases 00:00:00
    • Graph Databases (Neo4J, GraphX) 00:00:00
    • Interacting with Hive 00:00:00
    • Accessing Cassandra Data 00:00:00
    • Document Databases (MongoDB, CouchDB) 00:00:00
    • Map/Reduce and Lambda Integration 00:00:00
    • Camel Integration 00:00:00
    • Drools and Spark 00:00:00
    • MLib and Mahout 00:00:00
    • Classification 00:00:00
    • Clustering 00:00:00
    • Decision Trees 00:00:00
    • Decompositions 00:00:00
    • Pipelines 00:00:00
    • Spark Packages 00:00:00
    • Spark SQL 00:00:00
    • SQL and DataFrames 00:00:00
    • Spark SQL and Hive 00:00:00
    • Spark SQL and JDBC 00:00:00
    • Graph APIs 00:00:00
    • GraphX 00:00:00
    • ETL in GraphX 00:00:00
    • Exploratory Analysis 00:00:00
    • Graph computation 00:00:00
    • Pregel API Overview 00:00:00
    • GraphX Algorithms 00:00:00
    • Using Web Notebooks (Zeppelin, Jupyter) 00:00:00
    • R on Spark 00:00:00
    • Python on Spark 00:00:00
    • Scala on Spark 00:00:00
    • Parallelizing Spark Applications 00:00:00
    • Clustering concerns for Developers 00:00:00
    • Monitoring Spark Performance 00:00:00
    • Tuning Memory 00:00:00
    • Tuning CPU 00:00:00
    • Tuning Data Locality Troubleshooting 00:00:00

Course Reviews

Profile Photo
ashar hafeez
0
62

Students

About Instructor

Pak

Course Events

[wplms_eventon_events]

More Courses by Insturctor

© 2021 Ernesto.  All rights reserved.  
X