- Trending Categories
- Data Structure
- Operating System
- C Programming
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Deploying Scrapy spider on ScrapingHub
Scrapy spider is a class which provides the facility to follow the links of a website and extract the information from the webpages.
This is the main class from which other spiders must inherit.
Scrapinghub is an open source application to run Scrapy spiders. Scrapinghub turns web content into some useful data or information. It allows us to extract the data from webpages, even for complex webpages.
We are going to use scrapinghub to deploy scrapy spiders on cloud and execute it.
Steps to deploy spiders on scrapinghub −
Create one scrapy project −
After installing scrapy, just run the following command in your terminal −
$scrapy startproject <project_name>
Change your directory to your new project (project_name).
Step 2 −
Write one scrapy spider for your target website, let's take a usual website "quotes.toscrape.com".
Below is my very simple scrapy spider −
#import scrapy library import scrapy class AllSpider(scrapy.Spider): crawled = set() #Spider name name = 'all' #starting url start_urls = ['http://www.tutorialspoint.com/'] def __init__(self): self.links =  def parse(self, response): self.links.append(response.url) for href in response.css('a::attr(href)'): yield response.follow(href, self.parse)
Step 3 −
Run your spider and save the output to your links.json file −
After executing above code, you'll be able to scrape all links and save it inside links.json file. This might be not a lengthy process but to run it continously for round the clock(24/7) we need to deploy this spider on Scrapinghub.
Step 4 −
Creating account on Scrapinghub
For that, you just need to login to ScrapingHub login page either using your Gmail account or Github. It will redirect to the dashboard.
Now click on Create project and mention the name of the project. Now we can add our project to cloud either using command line(CLI) or through github. Next we going to deploy our code through shub CLI, first install shub
$pip install shub
After installing shub, login to shub account using the api key generated on creating account (Enter your API key from https://app.scrapinghub.com/account/apikey).
If your API key is OK, you are logged in now. Now we need to deploy it using the deploy Id, which you see on command line section of the "deploy your code" section (6 digit number).
$ shub deploy deploy_id
That's it from command line, now move back On Spiders dashboard section, user can see the ready spider. Just click on spider name and on Run button.That's it now you can see your spider in your dashboard, something like −
It will show us, the running progress through one click and you don't need to run your local machine 24/7.
- Architecture for Deploying SAP HR module
- Web Scraping using Python and Scrapy?
- Python Implementing Web Scraping with Scrapy
- Implementing Web Scraping in Python with Scrapy
- How to make a polygon radar (spider) chart in Python Matplotlib?
- How to programmatically turn on Wifi on Android device?
- Differentiate between Return on Asset and Return On Equity
- How to visualize values on logarithmic scale on matplotalib?
- Example on ToggleButton?
- Retrospective on Ethernet
- How to hide Back Button on navigation bar on iPhone/iPad?
- How to show current location on a google map on Android?
- Getting root permissions on a file inside of vi on Linux
- How to put an outline on a canvas text on Tkinter?