
ratings
Hadoop Developer Foundation | Working with Hadoop, HDFS, Hive, Yarn, Spark and More is a lab-intensive hands-on Hadoop course that explores processing large data streams in the Hadoop Ecosystem.
Unlimited Duration
March 11, 2021
This “skills-centric” course is about 50% hands-on lab and 50% lecture, designed to train you in core big data/ Spark development and use skills, coupling the most current, effective techniques with the soundest industry practices. in this course you will learn about:
· Introduction to Hadoop
· HDFS
· YARN
· Data Ingestion
· HBase
· Oozie
· Working with Hive
· Hive (Advanced)
· Hive in Cloudera
· Working with Spark
· Spark Basics
· Spark Shell
· RDDs (Condensed coverage)
· Spark Dataframes & Datasets
· Spark SQL
· Spark API programming
· Spark and Hadoop
· Machine Learning (ML / MLlib)
· GraphX
· Spark Streaming
Course Curriculum
-
- Hadoop history, concepts 00:00:00
- Ecosystem 00:00:00
- Distributions 00:00:00
- High-level architecture 00:00:00
- Hadoop myths 00:00:00
- Hadoop challenges 00:00:00
- Hardware and software 00:00:00
- Lab: first look at Hadoop 00:00:00
-
- Design and architecture 00:00:00
- Concepts (horizontal scaling, replication, data locality, rack awareness) 00:00:00
- Daemons: Namenode, Secondary Namenode, Datanode 00:00:00
- Communications and heart-beats 00:00:00
- Data integrity 00:00:00
- Read and write path 00:00:00
- Namenode High Availability (HA), Federation 00:00:00
- Labs: Interacting with HDFS 00:00:00
- YARN Concepts and architecture 00:00:00
- Evolution from MapReduce to YARN 00:00:00
- Labs: Running a sample YARN program 00:00:00
- (Covered in brief) 00:00:00
- Concepts and architecture 00:00:00
- HBase vs RDBMS vs Cassandra 00:00:00
- HBase Java API 00:00:00
- Time series data on HBase 00:00:00
- Labs: Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise 00:00:00
- Architecture and design 00:00:00
- Data types 00:00:00
- SQL support in Hive 00:00:00
- Creating Hive tables and querying 00:00:00
- Partitions 00:00:00
- Joins 00:00:00
- Text processing 00:00:00
- Labs: various labs on processing data with Hive 00:00:00
- Big Data, Hadoop, Spark 00:00:00
- What’s new in Spark v2 00:00:00
- Spark concepts and architecture 00:00:00
- Spark ecosystem (core, spark sql, mlib, streaming) 00:00:00
- Labs: Installing and running Spark 00:00:00
- RDDs concepts 00:00:00
- RDD Operations / transformations 00:00:00
- Labs : Unstructured data analytics using RDDs 00:00:00
- Data model concepts 00:00:00
- Partitions 00:00:00
- Distributed processing 00:00:00
- Failure handling 00:00:00
- Caching and persistence 00:00:00
- Lab on the above 00:00:00
- Spark SQL concepts and overview 00:00:00
- Defining tables and importing datasets 00:00:00
- Querying data using SQL 00:00:00
- Handling various storage formats : JSON / Parquet / ORC 00:00:00
- Labs: querying structured data using SQL; evaluating data formats 00:00:00
- Hadoop Primer: HDFS / YARN 00:00:00
- Hadoop + Spark architecture 00:00:00
- Running Spark on YARN 00:00:00
- Processing HDFS files using Spark 00:00:00
- Spark & Hive 00:00:00
- Lab 00:00:00
- Machine Learning primer 00:00:00
- Machine Learning in Spark: MLlib / ML 00:00:00
- Spark ML overview (newer Spark2 version) 00:00:00
- Algorithms: Clustering, Classifications, Recommendations 00:00:00
- Labs: Writing ML applications in Spark 00:00:00
- Streaming concepts 00:00:00
- Evaluating Streaming platforms 00:00:00
- Spark streaming library overview 00:00:00
- Streaming operations 00:00:00
- Sliding window operations 00:00:00
- Structured Streaming 00:00:00
- Continuous streaming 00:00:00
- Spark & Kafka streaming 00:00:00
- Labs: Writing spark streaming applications 00:00:00
Course Reviews

Students