Tutorialspoint

Leap Year Sale! Use code FEB10 to get an extra 10% off

50 Hours of Big Data, PySpark, AWS, Scala and Scraping

person icon AISciences

4.4

50 Hours of Big Data, PySpark, AWS, Scala and Scraping

Big Data with Scala and Spark,PySpark and AWS,Data Scraping & Data Mining With Python, Mastering MongoDB for Beginners

updated on icon Updated on Mar, 2024

language icon Language - English

person icon AISciences

category icon Development,Data Science,PySpark

Lectures -622

Resources -6

Duration -54.5 hours

4.4

price-loader

30-days Money-Back Guarantee

Training 5 or more people ?

Get your team access to 9000+ top Tutorials Point courses anytime, anywhere.

Course Description

The course content is designed in a way that is Simple to follow and understand, expressive, exhaustive, practical with live coding, replete with quizzes, and rich with state-of-the-art and up-to-date knowledge of this field.

I. Scala

It’s true that Scala is not among the most-loved coding languages but don’t let this minor discomfort bother you. Scala is doubtless one of the most in-demand skills for data scientists and data engineers. And the reason for this is not far to seek: The supply of professionals with Scala skills is a long way from catching up with the demand.

The well-thought-out quizzes and mini-projects in this course will cover all the important aspects and it will make your Scala learning journey that much easier. This course includes an overview of Hadoop and Spark with a hands-on project with Scala Spark. Right through the course, every theoretical explanation is followed by practical implementation.

This course is designed to reflect the most in-demand Scala skills that you will start using right away at the workplace. The 6 mini-projects and one Scala Spark project included in this course are vital components of this course. These projects present you with a hands-on opportunity to experiment for yourself with trial and error. You get a chance to learn from the mistakes you commit. Importantly, it’s easy to understand the potential gaps that might exist between theory and practice.

Scala, a power-packed language, has the capability to leverage most of the functions in Python, such as designing machine learning models. You can use this high-level language for an assortment of applications, from web apps to machine learning.

II. PySpark and AWS

The hottest buzzwords in the Big Data analytics industry are Python and Apache Spark. PySpark supports the collaboration of Python and Apache Spark. In this course, you’ll start right from the basics and proceed to the advanced levels of data analysis. From cleaning data to building features and implementing machine learning (ML) models, you’ll learn how to execute end-to-end workflows using PySpark.

Right through the course, you’ll be using PySpark for performing data analysis. You’ll explore Spark RDDs, Dataframes, and a bit of Spark SQL queries. Also, you’ll explore the transformations and actions that can be performed on the data using Spark RDDs and data frames. You’ll also explore the ecosystem of Spark and Hadoop and their underlying architecture. You’ll use the Databricks environment for running the Spark scripts and explore it as well.

Finally, you’ll have a taste of Spark with AWS cloud. You’ll see how we can leverage AWS storages, databases, and computations, and how Spark can communicate with different AWS services and get its required data.

As this course is a detailed compilation of all the basics, it will motivate you to make quick progress and experience much more than what you have learned. At the end of each concept, you will be assigned Homework/tasks/activities/quizzes along with solutions. This is to evaluate and promote your learning based on the previous concepts and methods you have learned. Most of these activities will be coding-based, as the aim is to get you up and running with implementations.

III. Data Scraping and Data Mining from Beginner to Professional

Data scraping is the technique of extracting data from the internet. Data scraping is used for getting the data available on different websites and APIs. This also involves automating the web flows for extracting the data from different web pages.

This course is designed for beginners. We’ll spend sufficient time to lay a solid groundwork for newbies. Then, we will go far deep gradually with a lot of practical implementations where every step will be explained in detail.

As this course is essentially a compilation of all the basics, you will move ahead at a steady rate. You will experience more than what you have learned. At the end of every concept, we will be assigning you Home Work/assignments/activities/quizzes along with solutions. They will assess / (further build) your learning based on the previous data scraping and data mining concepts and methods. Most of these activities are designed to get you up and running with implementations.

The 4 hands-on projects included in this course are the most important part of this course. These projects allow you to experiment for yourself with trial and error. You will learn from your mistakes. Importantly, you will understand the potential gaps that may exist between theory and practice.

Data Scraping is undoubtedly a rewarding career that allows you to solve some of the most interesting real-world problems. You will be rewarded with a fabulous salary package, too. With a core understanding of Data Scraping, you can fine-tune your workplace skills and ensure emerging career growth.

IV. MongoDB

In this course, we'll go through the basics of MongoDB. We'll be using MongoDB to develop an understanding of the NoSQL databases. We’ll explore the basic, Create, Read, Update, and Delete operations in MongoDB. We’ll explore in detail the MongoDB query operators and project operators. Following that we’ll learn about MongoDB update operators. In the end, we’ll move to explore MongoDB with Node and Python. We’ll wind up this course with two projects, consisting of MongoDB with Django in which we’ll develop a CRUD-based application using Django and MongoDB and then we’ll implement an ETL pipeline using PySpark to dump the data in the MongoDB.

This course is designed for beginners. We’ll spend enough time to make a solid ground for newbies and they will go far deep gradually with a lot of practical implementations where every step will be explained in detail.

As this course is a compilation of all the basics, it will encourage you to move ahead and experience more than what you have learned. By the end of every concept, we will be assigning you Home Works/tasks/activities/quizzes along with solutions that will evaluate / (further build) your learning based on the previous concepts and methods. Several of these activities will be coding based to get you up and running with implementations.

With the increase of data there is a need to manage that, and not only manage it but also get useful data and insights out of it for business analytics and correct decision making, and for that the companies are actively looking for big data engineers. The major issue with big data is that it's so humongous that using regular data analysis techniques it is not possible to analyze it. Also due to continuously increasing data sources like IOT, SQL databases, NoSQL databases, social media platforms, point of sales, and streaming data it is hard to even manage all this data through conventional methods, and performing analytics on it is, as I just mentioned, is way beyond this. So we need new techniques and platforms for not only managing this data but also performing analysis on it and MongoDB supports all of this. We'll understand and learn using MongoDB which, in a nutshell, is a NoSQL database. All these skills are highly in demand.

So, without any further delay let’s get started with the course and embrace ourselves with the knowledge that waits for you.

  1. Scope of Scala:

    1. Understanding the variables in data types in Scala.

    2. Understanding the flow controls in Scala and different ways for controlling the flow.

    3. Understanding the functions and their usage in Scala.

    4. Understanding the classes and their usage in Scala.

    5. Understanding the data structures, namely: Lists, Lists Buffer, Maps, Sets, and Stack.

    6. Understanding Hadoop.

    7. Understanding the working of Spark.

    8. Understanding the difference between Spark Rdds and Spark Dfs.

    9. Understanding Map Reduce.

    10. ETL pipeline from AWS S3 to AWS RDS using Spark.

  2. Scope of PySpark:

    1. Spark / Hadoop applications, EcoSystem and Architecture

    2. PySpark RDDs

    3. PySpark RDD transformations

    4. PySpark RDD actions

    5. PySpark DataFrames

    6. PySpark DataFrames transformations

    7. PySpark DataFrames actions

    8. Collaborative filtering in PySpark

    9. Spark Streaming

    10. ETL Pipeline

    11. CDC and Replication on Going

  3. Scope of Data Scraping, Data Mining:

    1. Internet Browser execution and communication with the server.

    2. Request / Response to and from the server. Synchronous and Asynchronous

    3. Parsing data in response from the server.

    4. Difference between Synchronous and Asynchronous requests.

    5. Introductions to Tools for data scraping: Requests, BS4, Scrapy & Selenium.

    6. Explanation of different concepts like Python Requests Module, BS4 parsers functions, Scrapy for writing the spiders for crawling websites and extracting data, Selenium for understanding the automation and control of the web flows, etc.

  4. Scope of MongoDB:

    1. Understanding MongoDB CRUD, Query Operators, Projection Operators Update Operators

    2. Creating MongoDB cluster on Atlas

    3. Understanding MongoDB with Node

    4. Performing CRUD operation with Node in MongoDB Atlas

    5. Understanding MongoDB with Python

    6. Performing CRUD operations with Python in MongoDB Atlas

    7. Understanding MongoDB with Django

    8. Performing CRUD operation with Django in MongoDB Atlas

    9. Building APIs for CRUD operations in MongoDB through Django

    10. Understanding MongoDB with PySpark

After completing this information-packed course successfully, you will be able to:

  • Implement any project from scratch that requires Data Scraping, Data Mining, Scala, PySpark, AWS, and MongoDB knowledge.
  • Relate the concepts and practical aspects of learned technologies to real-world problems.
  • Gather data from websites in the smartest way.

Who this course is for:

  • People who are absolute beginners.
  • People who want to make smart solutions.
  • People who want to learn with real data.
  • People who love to learn theory and then implement it practically.
  • Data Scientists, Machine learning experts, and Drop Shippers.

Who this course is for:

  • People who are absolute beginners.
  • People who want to make smart solutions.
  • People who want to learn with real data.
  • People who love to learn theory and then implement it practically.
  • Data Scientists, Machine learning experts, and Drop Shippers.

Goals

What will you learn in this course:

  • Introduction and importance of this course in this day and age

  • Approach all essential concepts from the beginning

  • The clear unfolding of concepts with examples in Python, Scrapy, Scala, PySpark, and MongoDB

  • All theoretical explanations followed by practical implementations

  • Data Scraping & Data Mining for Beginners to Pro with Python

  • Master Big Data with Scala and Spark

  • Master Big Data With PySpark and AWS

  • Mastering MongoDB for Beginners

  • Building your own AI applications

Prerequisites

What are the prerequisites for this course?

  • Basic understanding of HTML tags. Python, SQL, and Node JS

  • No prior knowledge of data scraping and Scala is needed. You start right from the basics and then gradually build your knowledge of the subject.

  • Basic understanding of programming.

  • A willingness to learn and practice.

  • Since we teach by practical implementations so practice is a must thing to do

50 Hours of Big Data, PySpark, AWS, Scala and Scraping

Curriculum

Check out the detailed breakdown of what’s inside the course

Data Scraping & Data Mining for Beginners to Pro with Python
151 Lectures
  • play icon Promo 02:10 02:10
  • play icon Introduction: Why Data Scraping 02:42 02:42
  • play icon Introduction: Applications of Data Scraping 07:09 07:09
  • play icon Introduction: Introduction of Instructor 00:40 00:40
  • play icon Introduction: Introduction to Course, Scraping, Tools 01:39 01:39
  • play icon Introduction: Projects Overview 03:42 03:42
  • play icon Introduction: Request for Your Honest Review 01:18 01:18
  • play icon Requests: Introduction to Python Requests 03:57 03:57
  • play icon Requests: Hand on with Requests 08:28 08:28
  • play icon Requests: Extracting Quotes Manually 10:05 10:05
  • play icon Requests: Quiz(Extracting Authors) 00:40 00:40
  • play icon Requests: Solution(Extracting Authors) 06:11 06:11
  • play icon Requests: Pagination 09:46 09:46
  • play icon Requests: Quiz(Extracting Author and Quotes) 00:58 00:58
  • play icon Requests: Solution 01(Extracting Author and Quotes) 06:27 06:27
  • play icon Requests: Solution 02(Extracting Author and Quotes) 05:52 05:52
  • play icon Requests: Ajax Requests 06:36 06:36
  • play icon Requests: Ajax Requests for Cricinfo 08:25 08:25
  • play icon Requests: Ajax Requests Paggination 03:53 03:53
  • play icon Requests: Quiz(Extracting Top Stats from Cricinfo) 01:22 01:22
  • play icon Requests: Solution 01(Extracting Top Stats from Cricinfo) 07:16 07:16
  • play icon Requests: Solution 02(Extracting Top Stats from Cricinfo) 09:17 09:17
  • play icon Beautiful Soap 4(BS4): Introduction to BS4 03:02 03:02
  • play icon Beautiful Soap 4(BS4): Quiz(Difference between Requests and BS4) 00:25 00:25
  • play icon Beautiful Soap 4(BS4): Solution(Difference between Requests and BS4) 01:04 01:04
  • play icon Beautiful Soap 4(BS4): Hands on with BS4 05:54 05:54
  • play icon Beautiful Soap 4(BS4): Extracting Data from Tree 08:50 08:50
  • play icon Beautiful Soap 4(BS4): Extracting Quotes from the Website 07:33 07:33
  • play icon Beautiful Soap 4(BS4): Quiz(Extracting Author Names) 00:38 00:38
  • play icon Beautiful Soap 4(BS4): Solution(Extracting Author Names) 05:28 05:28
  • play icon Beautiful Soap 4(BS4): Attributes of Tags in BS4 09:10 09:10
  • play icon Beautiful Soap 4(BS4): Multi Valued Attributes of Tags in BS4 03:55 03:55
  • play icon Beautiful Soap 4(BS4): Scraping Movie Names from IMDB 19:31 19:31
  • play icon Beautiful Soap 4(BS4): Quiz(Getting the Rattings,Year,Name of the Movie) 00:55 00:55
  • play icon Beautiful Soap 4(BS4): Solution 01(Getting the Rattings,Year,Name of the Movie) 07:00 07:00
  • play icon Beautiful Soap 4(BS4): Solution 02(Getting the Rattings,Year,Name of the Movie) 07:08 07:08
  • play icon Beautiful Soap 4(BS4): Scraping Time,Genre and Releasing Date from IMDB 01 05:09 05:09
  • play icon Beautiful Soap 4(BS4): Scraping Time,Genre and Releasing Date from IMDB 02 08:25 08:25
  • play icon Beautiful Soap 4(BS4): Combining Two Requests Data for IMDB 04:35 04:35
  • play icon Beautiful Soap 4(BS4): Movies Recommender System (CreatingMovie Url) 09:02 09:02
  • play icon Beautiful Soap 4(BS4): Movies Recommender System (Creating Director Url) 05:41 05:41
  • play icon Beautiful Soap 4(BS4): Movies Recommender System using BS4(Getting Top 4 Movies) 08:01 08:01
  • play icon Beautiful Soap 4(BS4): Movies Recommender System using BS4(Merge All Requests Together) 04:31 04:31
  • play icon CSS Selectors: Introduction to CSS Selectors 02:49 02:49
  • play icon CSS Selectors: CSS Selectors Handson(Tags) 05:17 05:17
  • play icon CSS Selectors: Quiz(Tags) 01:08 01:08
  • play icon CSS Selectors: Solution(Tags) 02:15 02:15
  • play icon CSS Selectors: CSS Selectors Handson(Decendants, Id, Class) 07:04 07:04
  • play icon CSS Selectors: Quiz(Descendants) 00:49 00:49
  • play icon CSS Selectors: Solution(Descendants) 01:50 01:50
  • play icon CSS Selectors: Quiz(ID) 00:44 00:44
  • play icon CSS Selectors: Solution(ID) 01:46 01:46
  • play icon CSS Selectors: Solution(Class) 01:00 01:00
  • play icon CSS Selectors: Solution(Class) 03:16 03:16
  • play icon CSS Selectors: CSS Selectors Handson(Nested Tags, ID Tags, Class Tags) 04:32 04:32
  • play icon CSS Selectors: Quiz(Class with Tag) 00:40 00:40
  • play icon CSS Selectors: Solution(Class with Tag) 02:26 02:26
  • play icon CSS Selectors: CSS Selectors Handson(Coma Seprator, Universial Selectors 06:31 06:31
  • play icon CSS Selectors: Quiz(Combining Two Selectors) 00:46 00:46
  • play icon CSS Selectors: Solution(Combining Two Selectors) 02:48 02:48
  • play icon CSS Selectors: CSS Selectors Handson(Sibling Notations and Direct Child) 07:24 07:24
  • play icon CSS Selectors: Quiz(Adjacent Sibling) 00:45 00:45
  • play icon CSS Selectors: Solution(Adjacent Sibling) 02:38 02:38
  • play icon CSS Selectors: Quiz(General Sibling) 00:57 00:57
  • play icon CSS Selectors: Solution(General Sibling) 02:59 02:59
  • play icon CSS Selectors: CSS Selectors Handson(Child Selectors) 07:19 07:19
  • play icon CSS Selectors: Quiz(First Child) 00:40 00:40
  • play icon CSS Selectors: Solution(First Child) 03:49 03:49
  • play icon CSS Selectors: Quiz(Only Child) 00:40 00:40
  • play icon CSS Selectors: Solution(Only Child) 02:58 02:58
  • play icon CSS Selectors: Quiz(Last Child) 00:44 00:44
  • play icon CSS Selectors: Solution(Last Child) 03:10 03:10
  • play icon CSS Selectors: CSS Selectors Handson (Nigations, Attributes) 06:36 06:36
  • play icon CSS Selectors: Quiz(Negation) 00:41 00:41
  • play icon CSS Selectors: Solution(Negation) 02:06 02:06
  • play icon CSS Selectors: CSS Selectors Handson (Attributes, Attributes Values) 03:51 03:51
  • play icon CSS Selectors: Quiz(Attributes Values) 00:39 00:39
  • play icon CSS Selectors: Solution(Attributes Values) 03:26 03:26
  • play icon CSS Selectors: CSS Selectors Handson (Attributes Wild Cards Values) 06:25 06:25
  • play icon CSS Selectors: Quiz(Attributes Wild Card) 00:50 00:50
  • play icon CSS Selectors: Solution(Attributes Wild Card) 02:49 02:49
  • play icon Scrapy: Introduction to Scrapy 04:10 04:10
  • play icon Scrapy: Comparison of Scrapy and Requests 03:40 03:40
  • play icon Scrapy: Scrapy at a Glance Documentation 08:31 08:31
  • play icon Scrapy: Getting Started with Scrapy 11:04 11:04
  • play icon Scrapy: Running Documentation Spider 1 03:25 03:25
  • play icon Scrapy: Running Documentation Spider 2 12:00 12:00
  • play icon Scrapy: Writing Spider from the Scratch 07:23 07:23
  • play icon Scrapy: Understanding the Response(url, Status) 07:09 07:09
  • play icon Scrapy: Understanding the Response(headers) 04:12 04:12
  • play icon Scrapy: Understanding the Response(values in headers) 06:51 06:51
  • play icon Scrapy: Understanding the Response(body) 06:04 06:04
  • play icon Scrapy: Understanding the Response(request) 04:41 04:41
  • play icon Scrapy: Understanding the Response(meta) 08:29 08:29
  • play icon Scrapy: Understanding the Response(flags, certificate, ip_address, copy) 05:16 05:16
  • play icon Scrapy: Understanding the Response(replace, urljoin, follow, follow_all) 08:07 08:07
  • play icon Scrapy: Response CSS and Scrapy Shell 09:26 09:26
  • play icon Scrapy: Extracting quotes 05:47 05:47
  • play icon Scrapy: Understanding Nested selectors 10:02 10:02
  • play icon Scrapy: Extracting the Author and Quotes 10:05 10:05
  • play icon Scrapy: Checking for Next Page 07:36 07:36
  • play icon Scrapy: Checking for Next Page in Spider 05:36 05:36
  • play icon Scrapy: Checking for Next Page URL 08:16 08:16
  • play icon Scrapy: Scraping Quotes from Next Pages 11:07 11:07
  • play icon Scrapy: Exporting Extracted Data 03:26 03:26
  • play icon Scrapy: Quiz(Get The Tags) 00:58 00:58
  • play icon Scrapy: Solution(Get The Tags) 07:30 07:30
  • play icon Scrapy: Next Website 01:28 01:28
  • play icon Scrapy: CSS Selectors for Movie Names and URLs 12:29 12:29
  • play icon Scrapy: Combined CSS Selectors for Movie Names and URLs 09:40 09:40
  • play icon Scrapy: Sent request to the film info page 08:16 08:16
  • play icon Scrapy: Merge Data from Two Callbacks 10:27 10:27
  • play icon Scrapy: Extracting Movie Duration and Genres 11:19 11:19
  • play icon Scrapy: Exporting the Extracted Data 08:27 08:27
  • play icon Scrapy: Quiz(Extracting the Year) 00:57 00:57
  • play icon Scrapy: Solution(Extracting the Year) 14:25 14:25
  • play icon Scrapy: Getting Director Name and Url 07:14 07:14
  • play icon Scrapy: Getting Top Four Movies of Directors 05:12 05:12
  • play icon Scrapy: Extracting Data Anomaly (dont_filter Flag) 07:53 07:53
  • play icon Scrapy Project: Hugoboss webiste for scraping 02:30 02:30
  • play icon Scrapy Project: Understanding Site Structure 07:11 07:11
  • play icon Scrapy Project: Writing CSS Selectors for Listings 07:43 07:43
  • play icon Scrapy Project: Listings in Scrapy Shell 04:20 04:20
  • play icon Scrapy Project: Sending Request to Listings Urls 07:23 07:23
  • play icon Scrapy Project: Extracting Products Url from the Listings 11:02 11:02
  • play icon Scrapy Project: Sending Requests to Products of the Listings 05:02 05:02
  • play icon Scrapy Project: Writing CSS for getting the Product Info 16:55 16:55
  • play icon Scrapy Project: Getting the bigger Images of the Product 07:54 07:54
  • play icon Scrapy Project: Checking Next Page Url 13:57 13:57
  • play icon Scrapy Project: Adding Pagination to Spider and Running it 09:40 09:40
  • play icon Scrapy Project: Output of the Spider 03:20 03:20
  • play icon Selenium: Introduction To Selenium 02:11 02:11
  • play icon Selenium: Getting Started with Selenium 03:36 03:36
  • play icon Selenium: Configuring the Webdriver 03:40 03:40
  • play icon Selenium: Extracting Quotes 10:16 10:16
  • play icon Selenium: Extracting Quotes and Author Names 07:17 07:17
  • play icon Selenium: Quiz(Extracting Quotes) 00:41 00:41
  • play icon Selenium: Solution(Extracting Quotes) 07:22 07:22
  • play icon Selenium: Clicking on Button 05:01 05:01
  • play icon Selenium: Paggination and Extracting Data 08:06 08:06
  • play icon Selenium: Exception Handling for Unavailable Element 05:41 05:41
  • play icon Selenium: Navigating the Website for Login 09:37 09:37
  • play icon Selenium: Quiz(Log in and Extract Quote) 00:43 00:43
  • play icon Selenium: Solution(Log in and Extract Quote) 07:03 07:03
  • play icon Project Selenium: Overview of Project 01:28 01:28
  • play icon Project Selenium: Closing the Cookie Button 03:26 03:26
  • play icon Project Selenium: Setting the Language for Translation 05:50 05:50
  • play icon Project Selenium: Sending the Text for Transaltion 03:46 03:46
  • play icon Project Selenium: Downaloading the Translation 03:55 03:55
  • play icon Project Selenium: Reading Data from File for Translation 03:44 03:44
  • play icon Project Selenium: THANK YOU Bonus Video 01:20 01:20
Scala & Spark-Master Big Data with Scala and Spark
144 Lectures
Tutorialspoint
PySpark & AWS: Master Big Data With PySpark and AWS
157 Lectures
Tutorialspoint
MongoDB-Mastering MongoDB for Beginners (Theory & Projects)
170 Lectures
Tutorialspoint

Instructor Details

AISciences

AISciences

We are a group of experts, PhDs, and Practitioners of Artificial Intelligence, Computer Science, Machine Learning, and Statistics. Some of us work in big companies like Amazon, Google, Facebook, Microsoft, KPMG, BCG, and IBM.

We decided to produce a series of courses mainly dedicated to beginners and newcomers on the techniques and methods of Machine Learning, Statistics, Artificial Intelligence, and Data Science. 

Initially, our objective was to help only those who wish to understand these techniques more easily and to be able to start without too much theory and without a long reading. Today we also publish a more complete course on some topics for a wider audience.

Our courses have had phenomenal success. Our Courses have helped more than 100,000 students to master AI and Data Science.


 ✅  Stay Connected to Us. 

👉 Twitter: https://twitter.com/AISciencesLearn 

👉 Facebook: https://www.facebook.com/AISciencesLearn   

👉 LinkedIn: https://www.linkedin.com/company/ai-sciences/

👉 Website: http://www.aisciences.io   


✅ For Business Inquires: contact@aisciences.io  

Course Certificate

User your certification to make a career change or to advance in your current career. Salaries are among the highest in the world.

sample Tutorialspoint certificate

Our students work
with the Best

Related Video Courses

View More

Annual Membership

Become a valued member of Tutorials Point and enjoy unlimited access to our vast library of top-rated Video Courses

Subscribe now
Annual Membership

Online Certifications

Master prominent technologies at full length and become a valued certified professional.

Explore Now
Online Certifications

Talk to us

1800-202-0515