- Trending Categories
- Data Structure
- Operating System
- C Programming
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
urllib.robotparser - Parser for robots.txt in Python
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. This file is a simple text-based access control system for computer programs that automatically access web resources. Such programs are called spiders, crawlers, etc. The file specifies the user agent identifier followed by a list of URLs the agent may not access.
#robots.txt Sitemap: https://example.com/sitemap.xml User-agent: * Disallow: /admin/ Disallow: /downloads/ Disallow: /media/ Disallow: /static/
This file is usually put in the top-level directory of your web server.
Python's urllib.robotparser module provides RobotFileParser class. It answers questions about whether or not a particular user agent can fetch a URL on the Web site that published the robots.txt file.
Following method are defined in RobotFileParser class
This method sets the URL referring to a robots.txt file.
This method reads the robots.txt URL and feeds it to the parser.
This method parses the lines argument.
This method returns True if the useragent is able to fetch the url according to the rules contained in robots.txt.
This method returns the time the robots.txt file was last fetched.
This method sets the time robots.txt was last fetched.
This method returns the value of the Crawl-delay parameter robots.txt for the useragent in question.
This method returns the contents of the Request-rate parameter as a named tuple RequestRate(requests, seconds).
from urllib import parse from urllib import robotparser AGENT_NAME = 'PyMOTW' URL_BASE = 'https://example.com/' parser = robotparser.RobotFileParser() parser.set_url(parse.urljoin(URL_BASE, 'robots.txt')) parser.read()
- URL handling Python modules (urllib)
- Python Parser for command line options
- C-style parser for command line options in Python
- Configuration file parser in Python (configparser)
- In the future, will it be possible for robots to develop feelings?
- html.parser — Simple HTML and XHTML parser in Python
- Program to find goal parser interpretation command in Python
- Ternary Expression Parser in C++
- HTML Entity Parser in C++
- Do you think robots will replace jobs in near future?
- How to make a txt file and read txt file from internal storage in android?
- How to store list in a txt file and read list from txt file in android?
- What is Recursive Descent Parser?
- What is a Predictive Parser?
- What is SLR (1) Parser?