Working in a hands-on learning environment, led by our Hadoop expert instructor, students will learn about and explore: Organize a successful Hadoop rollout. Load, unload, and manage data in Hadoop. Integrate Hadoop with the existing information infrastructure.

Course Access

Unlimited Duration

Last Updated

January 6, 2021

Students Enrolled

Total Video Time


Posted by

Course Description

In today’s time, data with value is branched off into numerous databases across multiple companies. The challenge is bringing the data together. Integrating Hadoop shows how Hadoop is used to collect and load the data on physical devices and the cloud. The book begins with an introduction of Hadoop and the types of data fit for it. Next, it focuses on assembling the integration team and gives an overview of workloads in the organization. You will also identify data sources for Hadoop, such as No SQL Databases and Legacy/Relational Databases, distinguish between ETL and ELT, and learn how to load and unload data into Hadoop. You will also practice managing big data using methods such as Upserts and Use HBase, and discover the advantages of real-time computing and the basic structure of streaming data architecture. Finally, you will interact with the master data of an organization and learn the top 10 mistakes people commit while integrating Hadoop data and how to avoid them.

Profile Photo
3.86 3.857142857142857


About Instructor

More Courses by Insturctor

Course Curriculum

    • Introducing Hadoop 00:00:00
    • Hadoop Distributions 00:00:00
    • Assembling the Integration Team 00:00:00
    • Overview of Workloads for Hadoop in the Organization 00:00:00
    • Identifying Data Sources for Hadoop 00:00:00
    • Data Profiling 00:00:00
    • Analyzing and Profiling Source Systems and Data 00:00:00
    • Continued Need for More Speed 00:00:00
    • Preference with Hadoop 00:00:00
    • Is ETL Dead? 00:00:00
    • Advantages of Data Integration Tools 00:00:00
    • Methods of Data Loading 00:00:00
    • Path to Production 00:00:00
    • How-To with Talend Big Data 00:00:00
    • Big Data ELT 00:00:00
    • Importance of Data Quality in Hadoop 00:00:00
    • Stewardship of Big Data 00:00:00
    • Hadoop Extracts 00:00:00
    • Hadoop and SOA 00:00:00
    • Advantages of Real-Time Computing 00:00:00
    • How and Where to Use Spark 00:00:00
    • 8 Streaming Data 00:00:00
    • Streaming Data Technology Distinctions 00:00:00
    • Hadoop and Master Data Management 00:00:00
    • Integrating with Master Data 00:00:00
    • Data Virtualization 00:00:00
    • MDM and Hadoop Disconnects 00:00:00
    • 1. Integrating Data Without a Business Purpose 00:00:00
    • 2. Integrating Data into Hadoop for an Enterprise Data Repository 00:00:00
    • 3. Overemphasis on Data Integration Performance to the Detriment of Query Performance for Data Usage 00:00:00
    • 4. Not Refining Data to the Point of Usefulness 00:00:00
    • 5. Improper Node Specification 00:00:00
    • 6. Over-Reliance on Open Source Hadoop 00:00:00
    • 7. ETL instead of ELT 00:00:00
    • 8. Using MapReduce to Load Hadoop 00:00:00
    • 9. Using Spark through Hive to Load Hadoop 00:00:00
    • 10. Ignoring the Quality of the Data Being Loaded 00:00:00
    • Case Studies in Big Data Integration 00:00:00
    • Trends in Hadoop and Summary of Ideas 00:00:00

Course Reviews

No Reviews found for this course.

© 2021 Ernesto.  All rights reserved.