Creating a Web Crawler in Python using Socket Programming
How to search through different web pages to get email addresses , images and other useful information using python
Course Description
in this course we will teach you how you could write a program acting as a robot to search through different websites to get useful information that you need . unlike most web crawling and web scraping courses which use high level modules and in fact you are just copy pasting code , in this course we will first teach you all the concepts behind networking required for this course.
Networking section:
so if we want to write some code first we will explain the networking details behind that code and then we say why the code does what it does. after explaining everything in theory we will explore all of above concepts in action and in wireshark . so you will be familiar with concepts like tcp/ip , network address translation sockets and so on. after networking part we start programming section .
Programming section:
in this section first we use a high level module like requests . and we learn how to send http requests and receive the related http response . after that we go deeper by introducing you to socket module in python which is the most important module in networking in python . so we create a socket and we will learn about different methods in this module which with the help of them we start to send and receive data to webservers and vice versa . and we learn how to search through those data for our favorite and useful information . after that we create a website by making our kali linux acting as a webserver serving web pages and we learn how we could search through different pages looking for email addresses , links and so on .
Goals
What will you learn in this course:
- Python programming
- Socket programming
- Socket programming in python
- Networking basics
- http and https protocols
- Creating a web crawler

Curriculum
Check out the detailed breakdown of what’s inside the course
Networking basics and web crawler
16 Lectures
-
1- introduction - what you will learn in this course 02:20 02:20
-
2- introduction-what is web crawling 11:54 11:54
-
3- tcp/ip packet format 09:36 09:36
-
4- what is a socket 03:27 03:27
-
5- ip addresses 03:55 03:55
-
6- nat(network address translation) 05:59 05:59
-
7- nat in wireshark 04:36 04:36
-
8- sending http get requests using python requests module 09:01 09:01
-
9- sending http post requests using python requests module 07:45 07:45
-
10- how to create a socket to send data in python 09:55 09:55
-
11- how to send http request using socket module in python 05:30 05:30
-
12- how to receive http responses using socket 07:11 07:11
-
13- re module (regular expression) 07:29 07:29
-
14- printing all the links and emails in a web page 16:20 16:20
-
15- making our linux as a webserver using apache and python 12:31 12:31
-
16- searching email addresses in different web pages 10:19 10:19
Instructor Details

mgh gh
Course Certificate
User your certification to make a career change or to advance in your current career. Salaries are among the highest in the world.

Our students work
with the Best


































Feedbacks
Related Video Courses
View MoreAnnual Membership
Become a valued member of Tutorials Point and enjoy unlimited access to our vast library of top-rated Video Courses
Subscribe now
Online Certifications
Master prominent technologies at full length and become a valued certified professional.
Explore Now