Tutorialspoint

50 Hours of Big Data, PySpark, AWS, Scala and Scraping

Big Data with Scala and Spark,PySpark and AWS,Data Scraping & Data Mining With Python, Mastering MongoDB for Beginners

Course Description

The course content is designed in a way that is Simple to follow and understand, expressive, exhaustive, practical with live coding, replete with quizzes, and rich with state-of-the-art and up-to-date knowledge of this field.

I. Scala

It’s true that Scala is not among the most-loved coding languages but don’t let this minor discomfort bother you. Scala is doubtless one of the most in-demand skills for data scientists and data engineers. And the reason for this is not far to seek: The supply of professionals with Scala skills is a long way from catching up with the demand.

The well-thought-out quizzes and mini-projects in this course will cover all the important aspects and it will make your Scala learning journey that much easier. This course includes an overview of Hadoop and Spark with a hands-on project with Scala Spark. Right through the course, every theoretical explanation is followed by practical implementation.

This course is designed to reflect the most in-demand Scala skills that you will start using right away at the workplace. The 6 mini-projects and one Scala Spark project included in this course are vital components of this course. These projects present you with a hands-on opportunity to experiment for yourself with trial and error. You get a chance to learn from the mistakes you commit. Importantly, it’s easy to understand the potential gaps that might exist between theory and practice.

Scala, a power-packed language, has the capability to leverage most of the functions in Python, such as designing machine learning models. You can use this high-level language for an assortment of applications, from web apps to machine learning.

II. PySpark and AWS

The hottest buzzwords in the Big Data analytics industry are Python and Apache Spark. PySpark supports the collaboration of Python and Apache Spark. In this course, you’ll start right from the basics and proceed to the advanced levels of data analysis. From cleaning data to building features and implementing machine learning (ML) models, you’ll learn how to execute end-to-end workflows using PySpark.

Right through the course, you’ll be using PySpark for performing data analysis. You’ll explore Spark RDDs, Dataframes, and a bit of Spark SQL queries. Also, you’ll explore the transformations and actions that can be performed on the data using Spark RDDs and data frames. You’ll also explore the ecosystem of Spark and Hadoop and their underlying architecture. You’ll use the Databricks environment for running the Spark scripts and explore it as well.

Finally, you’ll have a taste of Spark with AWS cloud. You’ll see how we can leverage AWS storages, databases, and computations, and how Spark can communicate with different AWS services and get its required data.

As this course is a detailed compilation of all the basics, it will motivate you to make quick progress and experience much more than what you have learned. At the end of each concept, you will be assigned Homework/tasks/activities/quizzes along with solutions. This is to evaluate and promote your learning based on the previous concepts and methods you have learned. Most of these activities will be coding-based, as the aim is to get you up and running with implementations.

III. Data Scraping and Data Mining from Beginner to Professional

Data scraping is the technique of extracting data from the internet. Data scraping is used for getting the data available on different websites and APIs. This also involves automating the web flows for extracting the data from different web pages.

This course is designed for beginners. We’ll spend sufficient time to lay a solid groundwork for newbies. Then, we will go far deep gradually with a lot of practical implementations where every step will be explained in detail.

As this course is essentially a compilation of all the basics, you will move ahead at a steady rate. You will experience more than what you have learned. At the end of every concept, we will be assigning you Home Work/assignments/activities/quizzes along with solutions. They will assess / (further build) your learning based on the previous data scraping and data mining concepts and methods. Most of these activities are designed to get you up and running with implementations.

The 4 hands-on projects included in this course are the most important part of this course. These projects allow you to experiment for yourself with trial and error. You will learn from your mistakes. Importantly, you will understand the potential gaps that may exist between theory and practice.

Data Scraping is undoubtedly a rewarding career that allows you to solve some of the most interesting real-world problems. You will be rewarded with a fabulous salary package, too. With a core understanding of Data Scraping, you can fine-tune your workplace skills and ensure emerging career growth.

IV. MongoDB

In this course, we'll go through the basics of MongoDB. We'll be using MongoDB to develop an understanding of the NoSQL databases. We’ll explore the basic, Create, Read, Update, and Delete operations in MongoDB. We’ll explore in detail the MongoDB query operators and project operators. Following that we’ll learn about MongoDB update operators. In the end, we’ll move to explore MongoDB with Node and Python. We’ll wind up this course with two projects, consisting of MongoDB with Django in which we’ll develop a CRUD-based application using Django and MongoDB and then we’ll implement an ETL pipeline using PySpark to dump the data in the MongoDB.

This course is designed for beginners. We’ll spend enough time to make a solid ground for newbies and they will go far deep gradually with a lot of practical implementations where every step will be explained in detail.

As this course is a compilation of all the basics, it will encourage you to move ahead and experience more than what you have learned. By the end of every concept, we will be assigning you Home Works/tasks/activities/quizzes along with solutions that will evaluate / (further build) your learning based on the previous concepts and methods. Several of these activities will be coding based to get you up and running with implementations.

With the increase of data there is a need to manage that, and not only manage it but also get useful data and insights out of it for business analytics and correct decision making, and for that the companies are actively looking for big data engineers. The major issue with big data is that it's so humongous that using regular data analysis techniques it is not possible to analyze it. Also due to continuously increasing data sources like IOT, SQL databases, NoSQL databases, social media platforms, point of sales, and streaming data it is hard to even manage all this data through conventional methods, and performing analytics on it is, as I just mentioned, is way beyond this. So we need new techniques and platforms for not only managing this data but also performing analysis on it and MongoDB supports all of this. We'll understand and learn using MongoDB which, in a nutshell, is a NoSQL database. All these skills are highly in demand.

So, without any further delay let’s get started with the course and embrace ourselves with the knowledge that waits for you.

  1. Scope of Scala:

    1. Understanding the variables in data types in Scala.

    2. Understanding the flow controls in Scala and different ways for controlling the flow.

    3. Understanding the functions and their usage in Scala.

    4. Understanding the classes and their usage in Scala.

    5. Understanding the data structures, namely: Lists, Lists Buffer, Maps, Sets, and Stack.

    6. Understanding Hadoop.

    7. Understanding the working of Spark.

    8. Understanding the difference between Spark Rdds and Spark Dfs.

    9. Understanding Map Reduce.

    10. ETL pipeline from AWS S3 to AWS RDS using Spark.

  2. Scope of PySpark:

    1. Spark / Hadoop applications, EcoSystem and Architecture

    2. PySpark RDDs

    3. PySpark RDD transformations

    4. PySpark RDD actions

    5. PySpark DataFrames

    6. PySpark DataFrames transformations

    7. PySpark DataFrames actions

    8. Collaborative filtering in PySpark

    9. Spark Streaming

    10. ETL Pipeline

    11. CDC and Replication on Going

  3. Scope of Data Scraping, Data Mining:

    1. Internet Browser execution and communication with the server.

    2. Request / Response to and from the server. Synchronous and Asynchronous

    3. Parsing data in response from the server.

    4. Difference between Synchronous and Asynchronous requests.

    5. Introductions to Tools for data scraping: Requests, BS4, Scrapy & Selenium.

    6. Explanation of different concepts like Python Requests Module, BS4 parsers functions, Scrapy for writing the spiders for crawling websites and extracting data, Selenium for understanding the automation and control of the web flows, etc.

  4. Scope of MongoDB:

    1. Understanding MongoDB CRUD, Query Operators, Projection Operators Update Operators

    2. Creating MongoDB cluster on Atlas

    3. Understanding MongoDB with Node

    4. Performing CRUD operation with Node in MongoDB Atlas

    5. Understanding MongoDB with Python

    6. Performing CRUD operations with Python in MongoDB Atlas

    7. Understanding MongoDB with Django

    8. Performing CRUD operation with Django in MongoDB Atlas

    9. Building APIs for CRUD operations in MongoDB through Django

    10. Understanding MongoDB with PySpark

After completing this information-packed course successfully, you will be able to:

  • Implement any project from scratch that requires Data Scraping, Data Mining, Scala, PySpark, AWS, and MongoDB knowledge.
  • Relate the concepts and practical aspects of learned technologies to real-world problems.
  • Gather data from websites in the smartest way.

Who this course is for:

  • People who are absolute beginners.
  • People who want to make smart solutions.
  • People who want to learn with real data.
  • People who love to learn theory and then implement it practically.
  • Data Scientists, Machine learning experts, and Drop Shippers.

Who this course is for:

  • People who are absolute beginners.
  • People who want to make smart solutions.
  • People who want to learn with real data.
  • People who love to learn theory and then implement it practically.
  • Data Scientists, Machine learning experts, and Drop Shippers.

Goals

  • Introduction and importance of this course in this day and age

  • Approach all essential concepts from the beginning

  • The clear unfolding of concepts with examples in Python, Scrapy, Scala, PySpark, and MongoDB

  • All theoretical explanations followed by practical implementations

  • Data Scraping & Data Mining for Beginners to Pro with Python

  • Master Big Data with Scala and Spark

  • Master Big Data With PySpark and AWS

  • Mastering MongoDB for Beginners

  • Building your own AI applications

Prerequisites

  • Basic understanding of HTML tags. Python, SQL, and Node JS

  • No prior knowledge of data scraping and Scala is needed. You start right from the basics and then gradually build your knowledge of the subject.

  • Basic understanding of programming.

  • A willingness to learn and practice.

  • Since we teach by practical implementations so practice is a must thing to do

Show More

Curriculum

  • Promo
    02:10
    Preview
  • Introduction: Why Data Scraping
    02:42
    Preview
  • Introduction: Applications of Data Scraping
    07:09
  • Introduction: Introduction of Instructor
    00:40
    Preview
  • Introduction: Introduction to Course, Scraping, Tools
    01:39
    Preview
  • Introduction: Projects Overview
    03:42
  • Introduction: Request for Your Honest Review
    01:18
  • Requests: Introduction to Python Requests
    03:57
    Preview
  • Requests: Hand on with Requests
    08:28
  • Requests: Extracting Quotes Manually
    10:05
  • Requests: Quiz(Extracting Authors)
    00:40
    Preview
  • Requests: Solution(Extracting Authors)
    06:11
  • Requests: Pagination
    09:46
  • Requests: Quiz(Extracting Author and Quotes)
    00:58
    Preview
  • Requests: Solution 01(Extracting Author and Quotes)
    06:27
  • Requests: Solution 02(Extracting Author and Quotes)
    05:52
  • Requests: Ajax Requests
    06:36
  • Requests: Ajax Requests for Cricinfo
    08:25
  • Requests: Ajax Requests Paggination
    03:53
  • Requests: Quiz(Extracting Top Stats from Cricinfo)
    01:22
  • Requests: Solution 01(Extracting Top Stats from Cricinfo)
    07:16
  • Requests: Solution 02(Extracting Top Stats from Cricinfo)
    09:17
  • Beautiful Soap 4(BS4): Introduction to BS4
    03:02
  • Beautiful Soap 4(BS4): Quiz(Difference between Requests and BS4)
    00:25
  • Beautiful Soap 4(BS4): Solution(Difference between Requests and BS4)
    01:04
  • Beautiful Soap 4(BS4): Hands on with BS4
    05:54
  • Beautiful Soap 4(BS4): Extracting Data from Tree
    08:50
  • Beautiful Soap 4(BS4): Extracting Quotes from the Website
    07:33
  • Beautiful Soap 4(BS4): Quiz(Extracting Author Names)
    00:38
  • Beautiful Soap 4(BS4): Solution(Extracting Author Names)
    05:28
  • Beautiful Soap 4(BS4): Attributes of Tags in BS4
    09:10
  • Beautiful Soap 4(BS4): Multi Valued Attributes of Tags in BS4
    03:55
  • Beautiful Soap 4(BS4): Scraping Movie Names from IMDB
    19:31
  • Beautiful Soap 4(BS4): Quiz(Getting the Rattings,Year,Name of the Movie)
    00:55
  • Beautiful Soap 4(BS4): Solution 01(Getting the Rattings,Year,Name of the Movie)
    07:00
  • Beautiful Soap 4(BS4): Solution 02(Getting the Rattings,Year,Name of the Movie)
    07:08
  • Beautiful Soap 4(BS4): Scraping Time,Genre and Releasing Date from IMDB 01
    05:09
  • Beautiful Soap 4(BS4): Scraping Time,Genre and Releasing Date from IMDB 02
    08:25
  • Beautiful Soap 4(BS4): Combining Two Requests Data for IMDB
    04:35
  • Beautiful Soap 4(BS4): Movies Recommender System (CreatingMovie Url)
    09:02
  • Beautiful Soap 4(BS4): Movies Recommender System (Creating Director Url)
    05:41
  • Beautiful Soap 4(BS4): Movies Recommender System using BS4(Getting Top 4 Movies)
    08:01
  • Beautiful Soap 4(BS4): Movies Recommender System using BS4(Merge All Requests Together)
    04:31
  • CSS Selectors: Introduction to CSS Selectors
    02:49
  • CSS Selectors: CSS Selectors Handson(Tags)
    05:17
  • CSS Selectors: Quiz(Tags)
    01:08
  • CSS Selectors: Solution(Tags)
    02:15
  • CSS Selectors: CSS Selectors Handson(Decendants, Id, Class)
    07:04
  • CSS Selectors: Quiz(Descendants)
    00:49
  • CSS Selectors: Solution(Descendants)
    01:50
  • CSS Selectors: Quiz(ID)
    00:44
  • CSS Selectors: Solution(ID)
    01:46
  • CSS Selectors: Solution(Class)
    01:00
  • CSS Selectors: Solution(Class)
    03:16
  • CSS Selectors: CSS Selectors Handson(Nested Tags, ID Tags, Class Tags)
    04:32
  • CSS Selectors: Quiz(Class with Tag)
    00:40
  • CSS Selectors: Solution(Class with Tag)
    02:26
  • CSS Selectors: CSS Selectors Handson(Coma Seprator, Universial Selectors
    06:31
  • CSS Selectors: Quiz(Combining Two Selectors)
    00:46
  • CSS Selectors: Solution(Combining Two Selectors)
    02:48
  • CSS Selectors: CSS Selectors Handson(Sibling Notations and Direct Child)
    07:24
  • CSS Selectors: Quiz(Adjacent Sibling)
    00:45
  • CSS Selectors: Solution(Adjacent Sibling)
    02:38
  • CSS Selectors: Quiz(General Sibling)
    00:57
  • CSS Selectors: Solution(General Sibling)
    02:59
  • CSS Selectors: CSS Selectors Handson(Child Selectors)
    07:19
  • CSS Selectors: Quiz(First Child)
    00:40
  • CSS Selectors: Solution(First Child)
    03:49
  • CSS Selectors: Quiz(Only Child)
    00:40
  • CSS Selectors: Solution(Only Child)
    02:58
  • CSS Selectors: Quiz(Last Child)
    00:44
  • CSS Selectors: Solution(Last Child)
    03:10
  • CSS Selectors: CSS Selectors Handson (Nigations, Attributes)
    06:36
  • CSS Selectors: Quiz(Negation)
    00:41
  • CSS Selectors: Solution(Negation)
    02:06
  • CSS Selectors: CSS Selectors Handson (Attributes, Attributes Values)
    03:51
  • CSS Selectors: Quiz(Attributes Values)
    00:39
  • CSS Selectors: Solution(Attributes Values)
    03:26
  • CSS Selectors: CSS Selectors Handson (Attributes Wild Cards Values)
    06:25
  • CSS Selectors: Quiz(Attributes Wild Card)
    00:50
  • CSS Selectors: Solution(Attributes Wild Card)
    02:49
  • Scrapy: Introduction to Scrapy
    04:10
  • Scrapy: Comparison of Scrapy and Requests
    03:40
  • Scrapy: Scrapy at a Glance Documentation
    08:31
  • Scrapy: Getting Started with Scrapy
    11:04
  • Scrapy: Running Documentation Spider 1
    03:25
  • Scrapy: Running Documentation Spider 2
    12:00
  • Scrapy: Writing Spider from the Scratch
    07:23
  • Scrapy: Understanding the Response(url, Status)
    07:09
  • Scrapy: Understanding the Response(headers)
    04:12
  • Scrapy: Understanding the Response(values in headers)
    06:51
  • Scrapy: Understanding the Response(body)
    06:04
  • Scrapy: Understanding the Response(request)
    04:41
  • Scrapy: Understanding the Response(meta)
    08:29
  • Scrapy: Understanding the Response(flags, certificate, ip_address, copy)
    05:16
  • Scrapy: Understanding the Response(replace, urljoin, follow, follow_all)
    08:07
  • Scrapy: Response CSS and Scrapy Shell
    09:26
  • Scrapy: Extracting quotes
    05:47
  • Scrapy: Understanding Nested selectors
    10:02
  • Scrapy: Extracting the Author and Quotes
    10:05
  • Scrapy: Checking for Next Page
    07:36
  • Scrapy: Checking for Next Page in Spider
    05:36
  • Scrapy: Checking for Next Page URL
    08:16
  • Scrapy: Scraping Quotes from Next Pages
    11:07
  • Scrapy: Exporting Extracted Data
    03:26
  • Scrapy: Quiz(Get The Tags)
    00:58
  • Scrapy: Solution(Get The Tags)
    07:30
  • Scrapy: Next Website
    01:28
  • Scrapy: CSS Selectors for Movie Names and URLs
    12:29
  • Scrapy: Combined CSS Selectors for Movie Names and URLs
    09:40
  • Scrapy: Sent request to the film info page
    08:16
  • Scrapy: Merge Data from Two Callbacks
    10:27
  • Scrapy: Extracting Movie Duration and Genres
    11:19
  • Scrapy: Exporting the Extracted Data
    08:27
  • Scrapy: Quiz(Extracting the Year)
    00:57
  • Scrapy: Solution(Extracting the Year)
    14:25
  • Scrapy: Getting Director Name and Url
    07:14
  • Scrapy: Getting Top Four Movies of Directors
    05:12
  • Scrapy: Extracting Data Anomaly (dont_filter Flag)
    07:53
  • Scrapy Project: Hugoboss webiste for scraping
    02:30
  • Scrapy Project: Understanding Site Structure
    07:11
  • Scrapy Project: Writing CSS Selectors for Listings
    07:43
  • Scrapy Project: Listings in Scrapy Shell
    04:20
  • Scrapy Project: Sending Request to Listings Urls
    07:23
  • Scrapy Project: Extracting Products Url from the Listings
    11:02
  • Scrapy Project: Sending Requests to Products of the Listings
    05:02
  • Scrapy Project: Writing CSS for getting the Product Info
    16:55
  • Scrapy Project: Getting the bigger Images of the Product
    07:54
  • Scrapy Project: Checking Next Page Url
    13:57
  • Scrapy Project: Adding Pagination to Spider and Running it
    09:40
  • Scrapy Project: Output of the Spider
    03:20
  • Selenium: Introduction To Selenium
    02:11
  • Selenium: Getting Started with Selenium
    03:36
  • Selenium: Configuring the Webdriver
    03:40
  • Selenium: Extracting Quotes
    10:16
  • Selenium: Extracting Quotes and Author Names
    07:17
  • Selenium: Quiz(Extracting Quotes)
    00:41
  • Selenium: Solution(Extracting Quotes)
    07:22
  • Selenium: Clicking on Button
    05:01
  • Selenium: Paggination and Extracting Data
    08:06
  • Selenium: Exception Handling for Unavailable Element
    05:41
  • Selenium: Navigating the Website for Login
    09:37
  • Selenium: Quiz(Log in and Extract Quote)
    00:43
  • Selenium: Solution(Log in and Extract Quote)
    07:03
  • Project Selenium: Overview of Project
    01:28
  • Project Selenium: Closing the Cookie Button
    03:26
  • Project Selenium: Setting the Language for Translation
    05:50
  • Project Selenium: Sending the Text for Transaltion
    03:46
  • Project Selenium: Downaloading the Translation
    03:55
  • Project Selenium: Reading Data from File for Translation
    03:44
  • Project Selenium: THANK YOU Bonus Video
    01:20
Tutorialspoint
Tutorialspoint
Tutorialspoint
Feedbacks
  • No Feedbacks Posted Yet..!
50 Hours of Big Data, PySpark, AWS, Scala and Scraping
This Course Includes
  • 54.5 hours
  • 622 Lectures
  • 6 Resources
  • Completion Certificate Sample Certificate
  • Lifetime Access Yes
  • Language English
  • 30-Days Money Back Guarantee

Sample Certificate

Use your certification to make a career change or to advance in your current career. Salaries are among the highest in the world.

We have 30 Million registered users and counting who have advanced their careers with us.

X

Sample Certificate