
ratingsÂ
Working in a hands-on learning environment, led by our Data Science with Python and Jupyter expert instructor, students will learn about and explore: Solve the day-to-day problems of data science with Spark. This unique cookbook consists of exciting and intuitive numerical recipes. Optimize your work by acquiring, cleaning, analyzing, predicting, and visualizing your data.
PRIVATE
Course Access
Unlimited Duration
Last Updated
July 29, 2021
Students Enrolled
20
Total Reviews
Posted by
Certification
Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. Learning about algorithms enables a wide range of applications, from everyday tasks such as product recommendations and spam filtering to cutting edge applications such as self-driving cars and personalized medicine. You will gain hands-on experience of applying these principles using Apache Spark, a resilient cluster computing system well suited for large-scale machine learning tasks. This book begins with a quick overview of setting up the necessary IDEs to facilitate the execution of code examples that will be covered in various chapters. It also highlights some key issues developers face while working with machine learning algorithms on the Spark platform. We progress by uncovering the various Spark APIs and the implementation of ML algorithms with developing classification systems, recommendation engines, text analytics, clustering, and learning systems. Toward the final chapters, we’ll focus on building high-end applications and explain various unsupervised methodologies and challenges to tackle when implementing with big data ML systems.
Course Curriculum
-
- Practical Machine Learning with Spark Using Scala 00:00:00
- Introduction 00:00:00
- Downloading and installing the JDK 00:00:00
- Downloading and installing IntelliJ 00:00:00
- Downloading and installing Spark 00:00:00
- Configuring IntelliJ to work with Spark and run Spark ML sample codes 00:00:00
- Running a sample ML code from Spark 00:00:00
- Identifying data sources for practical machine learning 00:00:00
- Running your first program using Apache Spark 2.0 with the IntelliJ IDE 00:00:00
- How to add graphics to your Spark program 00:00:00
-
- Just Enough Linear Algebra for Machine Learning with Spark 00:00:00
- Introduction 00:00:00
- Package imports and initial setup for vectors and matrices 00:00:00
- Creating DenseVector and setup with Spark 2.0 00:00:00
- Creating SparseVector and setup with Spark 00:00:00
- Creating dense matrix and setup with Spark 2.0 00:00:00
- Using sparse local matrices with Spark 2.0 00:00:00
- Performing vector arithmetic using Spark 2.0 00:00:00
- Performing matrix arithmetic using Spark 2.0 00:00:00
- Exploring RowMatrix in Spark 2.0 00:00:00
- Exploring Distributed IndexedRowMatrix in Spark 2.0 00:00:00
- Exploring distributed CoordinateMatrix in Spark 2.0 00:00:00
- Exploring distributed BlockMatrix in Spark 2.0 00:00:00
- Spark’s Three Data Musketeers for Machine Learning – Perfect Together 00:00:00
- Introduction 00:00:00
- Creating RDDs with Spark 2.0 using internal data sources 00:00:00
- Creating RDDs with Spark 2.0 using external data sources 00:00:00
- Transforming RDDs with Spark 2.0 using the filter() API 00:00:00
- Transforming RDDs with the super useful flatMap() API 00:00:00
- Transforming RDDs with set operation APIs 00:00:00
- RDD transformation/aggregation with groupBy() and reduceByKey() 00:00:00
- Transforming RDDs with the zip() API 00:00:00
- Join transformation with paired key-value RDDs 00:00:00
- Reduce and grouping transformation with paired key-value RDDs 00:00:00
- Creating DataFrames from Scala data structures 00:00:00
- Operating on DataFrames programmatically without SQL 00:00:00
- Loading DataFrames and setup from an external source 00:00:00
- Using DataFrames with standard SQL language – SparkSQL 00:00:00
- Working with the Dataset API using a Scala Sequence 00:00:00
- Creating and using Datasets from RDDs and back again 00:00:00
- Working with JSON using the Dataset API and SQL together 00:00:00
- Functional programming with the Dataset API using domain objects 00:00:00
- Practical Machine Learning with Regression and Classification in Spark 2.0 – Part I 00:00:00
- Introduction 00:00:00
- Fitting a linear regression line to data the old fashioned way 00:00:00
- Generalized linear regression in Spark 2.0 00:00:00
- Linear regression API with Lasso and L-BFGS in Spark 2.0 00:00:00
- Linear regression API with Lasso and ‘auto’ optimization selection in Spark 2.0 00:00:00
- Linear regression API with ridge regression and ‘auto’ optimization selection in Spark 2.0 00:00:00
- Isotonic regression in Apache Spark 2.0 00:00:00
- Multilayer perceptron classifier in Apache Spark 2.0 00:00:00
- One-vs-Rest classifier (One-vs-All) in Apache Spark 2.0 00:00:00
- Survival regression – parametric AFT model in Apache Spark 2.0 00:00:00
- Recommendation Engine that Scales with Spark 00:00:00
- Introduction 00:00:00
- Setting up the required data for a scalable recommendation engine in Spark 2.0 00:00:00
- Exploring the movies data details for the recommendation system in Spark 2.0 00:00:00
- Exploring the ratings data details for the recommendation system in Spark 2.0 00:00:00
- Building a scalable recommendation engine using collaborative filtering in Spark 2.0 00:00:00
- Optimization – Going Down the Hill with Gradient Descent 00:00:00
- Introduction 00:00:00
- Optimizing a quadratic cost function and finding the minima using just math to gain insight 00:00:00
- Coding a quadratic cost function optimization using Gradient Descent (GD) from scratch 00:00:00
- Coding Gradient Descent optimization to solve Linear Regression from scratch 00:00:00
- Normal equations as an alternative for solving Linear Regression in Spark 2.0 00:00:00
- Curse of High-Dimensionality in Big Data 00:00:00
- Introduction 00:00:00
- Two methods of ingesting and preparing a CSV file for processing in Spark 00:00:00
- Singular Value Decomposition (SVD) to reduce high-dimensionality in Spark 00:00:00
- Principal Component Analysis (PCA) to pick the most effective latent factor for machine learning in Spark 00:00:00
- Spark Streaming and Machine Learning Library 00:00:00
- Introduction 00:00:00
- Structured streaming for near real-time machine learning 00:00:00
- Streaming DataFrames for real-time machine learning 00:00:00
- Streaming Datasets for real-time machine learning 00:00:00
- Streaming data and debugging with queueStream 00:00:00
- Downloading and understanding the famous Iris data for unsupervised classification 00:00:00
- Streaming KMeans for a real-time on-line classifier 00:00:00
- Downloading wine quality data for streaming regression 00:00:00
- Streaming linear regression for a real-time regression 00:00:00
- Downloading Pima Diabetes data for supervised classification 00:00:00
- Streaming logistic regression for an on-line classifier 00:00:00
Course Reviews

4
4
1937
Students
About Instructor
Course Events
[wplms_eventon_events]
More Courses by Insturctor
{"title":"","show_title":"0","post_type":"course","taxonomy":"","term":"0","post_ids":"","course_style":"rated","featured_style":"generic","masonry":"","grid_columns":"clear1 col-md-12","column_width":"268","gutter":"30","grid_number":"2","infinite":"","pagination":"","grid_excerpt_length":"100","grid_link":"1","grid_search":"0","course_type":"","css_class":"","container_css":"","custom_css":""}