- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
- Python Advanced Tutorial
- Python - Classes/Objects
- Python - Reg Expressions
- Python - CGI Programming
- Python - Database Access
- Python - Networking
- Python - Sending Email
- Python - Multithreading
- Python - XML Processing
- Python - GUI Programming
- Python - Further Extensions
Deploying Scrapy spider on ScrapingHub
Scrapy spider is a class which provides the facility to follow the links of a website and extract the information from the webpages.
This is the main class from which other spiders must inherit.
Scrapinghub is an open source application to run Scrapy spiders. Scrapinghub turns web content into some useful data or information. It allows us to extract the data from webpages, even for complex webpages.
We are going to use scrapinghub to deploy scrapy spiders on cloud and execute it.
Steps to deploy spiders on scrapinghub −
Create one scrapy project −
After installing scrapy, just run the following command in your terminal −
$scrapy startproject <project_name>
Change your directory to your new project (project_name).
Step 2 −
Write one scrapy spider for your target website, let's take a usual website "quotes.toscrape.com".
Below is my very simple scrapy spider −
#import scrapy library import scrapy class AllSpider(scrapy.Spider): crawled = set() #Spider name name = 'all' #starting url start_urls = ['http://www.tutorialspoint.com/'] def __init__(self): self.links =  def parse(self, response): self.links.append(response.url) for href in response.css('a::attr(href)'): yield response.follow(href, self.parse)
Step 3 −
Run your spider and save the output to your links.json file −
After executing above code, you'll be able to scrape all links and save it inside links.json file. This might be not a lengthy process but to run it continously for round the clock(24/7) we need to deploy this spider on Scrapinghub.
Step 4 −
Creating account on Scrapinghub
For that, you just need to login to ScrapingHub login page either using your Gmail account or Github. It will redirect to the dashboard.
Now click on Create project and mention the name of the project. Now we can add our project to cloud either using command line(CLI) or through github. Next we going to deploy our code through shub CLI, first install shub
$pip install shub
After installing shub, login to shub account using the api key generated on creating account (Enter your API key from https://app.scrapinghub.com/account/apikey).
If your API key is OK, you are logged in now. Now we need to deploy it using the deploy Id, which you see on command line section of the "deploy your code" section (6 digit number).
$ shub deploy deploy_id
That's it from command line, now move back On Spiders dashboard section, user can see the ready spider. Just click on spider name and on Run button.That's it now you can see your spider in your dashboard, something like −
It will show us, the running progress through one click and you don't need to run your local machine 24/7.
- Related Articles
- Best Practices for Deploying Hadoop Server on CentOS/RHEL 8
- Architecture for Deploying SAP HR module
- Web Scraping using Python and Scrapy?
- Python Implementing Web Scraping with Scrapy
- Is spider a herbivore?
- Implementing Web Scraping in Python with Scrapy
- What is a Spider in Digital Marketing?
- Why a spider can give only some silk?
- How to make a polygon radar (spider) chart in Python Matplotlib?
- Identify the animal which is not an Arthropoda-(a) Prawn(b) Butterfly(c) Earthworm(d) Spider
- In evolutionary terms, we have more in common with(a) a Chinese school-boy(b) a chimpanzee(c) a spider(d) a bacterium
- How to programmatically turn on Wifi on Android device?
- How to visualize values on logarithmic scale on matplotalib?
- How to run Linux libraries on Docker on Windows?
- How to turn notifications on and off on YouTube