• No products in the cart.

ratings 

Hadoop Developer Foundation | Working with Hadoop, HDFS, Hive, Yarn, Spark and More is a lab-intensive hands-on Hadoop course that explores processing large data streams in the Hadoop Ecosystem.

PRIVATE
Course Access

Unlimited Duration

Last Updated

March 11, 2021

Students Enrolled

Total Reviews

Posted by
Certification

This “skills-centric” course is about 50% hands-on lab and 50% lecture, designed to train you in core big data/ Spark development and use skills, coupling the most current, effective techniques with the soundest industry practices. in this course you will learn about:

· Introduction to Hadoop

· HDFS

· YARN

· Data Ingestion

· HBase

· Oozie

· Working with Hive

· Hive (Advanced)

· Hive in Cloudera

· Working with Spark

· Spark Basics

· Spark Shell

· RDDs (Condensed coverage)

· Spark Dataframes & Datasets

· Spark SQL

· Spark API programming

· Spark and Hadoop

· Machine Learning (ML / MLlib)

· GraphX

· Spark Streaming

Course Curriculum

    • Hadoop history, concepts 00:00:00
    • Ecosystem 00:00:00
    • Distributions 00:00:00
    • High-level architecture 00:00:00
    • Hadoop myths 00:00:00
    • Hadoop challenges 00:00:00
    • Hardware and software 00:00:00
    • Lab: first look at Hadoop 00:00:00
    • Design and architecture 00:00:00
    • Concepts (horizontal scaling, replication, data locality, rack awareness) 00:00:00
    • Daemons: Namenode, Secondary Namenode, Datanode 00:00:00
    • Communications and heart-beats 00:00:00
    • Data integrity 00:00:00
    • Read and write path 00:00:00
    • Namenode High Availability (HA), Federation 00:00:00
    • Labs: Interacting with HDFS 00:00:00
    • YARN Concepts and architecture 00:00:00
    • Evolution from MapReduce to YARN 00:00:00
    • Labs: Running a sample YARN program 00:00:00
    • Flume for logs and other data ingestion into HDFS 00:00:00
    • Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL 00:00:00
    • Copying data between clusters (distcp) 00:00:00
    • Using S3 as complementary to HDFS 00:00:00
    • Data ingestion best practices and architectures 00:00:00
    • Oozie for scheduling events on Hadoop 00:00:00
    • Labs: setting up and using Flume, the same for Sqoop 00:00:00
    • (Covered in brief) 00:00:00
    • Concepts and architecture 00:00:00
    • HBase vs RDBMS vs Cassandra 00:00:00
    • HBase Java API 00:00:00
    • Time series data on HBase 00:00:00
    • Labs: Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise 00:00:00
    • Introduction to Oozie 00:00:00
    • Features of Oozie 00:00:00
    • Oozie Workflow 00:00:00
    • Creating a MapReduce Workflow 00:00:00
    • Start, End, and Error Nodes 00:00:00
    • Parallel Fork and Join Nodes 00:00:00
    • Workflow Jobs Lifecycle 00:00:00
    • Workflow Notifications 00:00:00
    • Workflow Manager 00:00:00
    • Creating and Running a Workflow 00:00:00
    • Exercise: Create an Oozie Workflow from Terminal 00:00:00
    • Exercise: Create an Oozie Workflow Using Java API 00:00:00
    • Oozie Coordinator Sub-groups 00:00:00
    • Oozie Coordinator Components, Variables, and Parameters 00:00:00
    • Exercise: Create an Oozie Workflow from HUE 00:00:00
    • Architecture and design 00:00:00
    • Data types 00:00:00
    • SQL support in Hive 00:00:00
    • Creating Hive tables and querying 00:00:00
    • Partitions 00:00:00
    • Joins 00:00:00
    • Text processing 00:00:00
    • Labs: various labs on processing data with Hive 00:00:00
    • Transformation, Aggregation 00:00:00
    • Working with Dates, Timestamps, and Arrays 00:00:00
    • Converting Strings to Date, Time, and Numbers 00:00:00
    • Create new Attributes, Mathematical Calculations, Windowing Functions 00:00:00
    • Use Character and String Functions 00:00:00
    • Binning and Smoothing 00:00:00
    • Processing JSON Data 00:00:00
    • Execution Engines (Tez, MR, Spark) 00:00:00
    • Many labs 00:00:00
      • Big Data, Hadoop, Spark 00:00:00
      • What’s new in Spark v2 00:00:00
      • Spark concepts and architecture 00:00:00
      • Spark ecosystem (core, spark sql, mlib, streaming) 00:00:00
      • Labs: Installing and running Spark 00:00:00
      • Spark web UIs 00:00:00
      • Analyzing dataset – part 1 00:00:00
      • Labs: Spark shell exploration 00:00:00
      • RDDs concepts 00:00:00
      • RDD Operations / transformations 00:00:00
      • Labs : Unstructured data analytics using RDDs 00:00:00
      • Data model concepts 00:00:00
      • Partitions 00:00:00
      • Distributed processing 00:00:00
      • Failure handling 00:00:00
      • Caching and persistence 00:00:00
      • Lab on the above 00:00:00
      • Intro to Dataframe / Dataset 00:00:00
      • Programming in Dataframe / Dataset API 00:00:00
      • Loading structured data using Dataframes 00:00:00
      • Labs: Dataframes, Datasets, Caching 00:00:00
      • Spark SQL concepts and overview 00:00:00
      • Defining tables and importing datasets 00:00:00
      • Querying data using SQL 00:00:00
      • Handling various storage formats : JSON / Parquet / ORC 00:00:00
      • Labs: querying structured data using SQL; evaluating data formats 00:00:00
      • Introduction to Spark API 00:00:00
      • Submitting the first program to Spark 00:00:00
      • Debugging / logging 00:00:00
      • Configuration properties 00:00:00
      • Labs : Programming in Spark API, Submitting jobs 00:00:00
      • Hadoop Primer: HDFS / YARN 00:00:00
      • Hadoop + Spark architecture 00:00:00
      • Running Spark on YARN 00:00:00
      • Processing HDFS files using Spark 00:00:00
      • Spark & Hive 00:00:00
      • Lab 00:00:00
      • Team design workshop 00:00:00
      • The class will be broken into teams 00:00:00
      • The teams will get a name and a task 00:00:00
      • They will architect a complete solution to a specific useful problem, present it, and defend the architecture based on the best practices they have learned in class 00:00:00
      • Machine Learning primer 00:00:00
      • Machine Learning in Spark: MLlib / ML 00:00:00
      • Spark ML overview (newer Spark2 version) 00:00:00
      • Algorithms: Clustering, Classifications, Recommendations 00:00:00
      • Labs: Writing ML applications in Spark 00:00:00
      • GraphX library overview 00:00:00
      • GraphX APIs 00:00:00
      • Labs: Processing graph data using Spark 00:00:00
      • Streaming concepts 00:00:00
      • Evaluating Streaming platforms 00:00:00
      • Spark streaming library overview 00:00:00
      • Streaming operations 00:00:00
      • Sliding window operations 00:00:00
      • Structured Streaming 00:00:00
      • Continuous streaming 00:00:00
      • Spark & Kafka streaming 00:00:00
      • Labs: Writing spark streaming applications 00:00:00

    Course Reviews

    Profile Photo
    ashar hafeez
    0
    62

    Students

    About Instructor

    Pak

    Course Events

    [wplms_eventon_events]

    More Courses by Insturctor

    © 2021 Ernesto.  All rights reserved.  
    X