- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Downloading PDFs with Python using Requests and BeautifulSoup
Request and BeautifulSoup are Python libraries that can download any file or PDF online. The request library is used to send HTTP requests and receive responses. BeautifulSoup library is used to parse the HTML received in the response and get the downloadable pdf link.In this article, we will understand how we can download PDFs using Request and Beautiful Soup in Python.
Installing Dependencies
Before using the BeautifulSoup and Request libraries in Python we need to install the libraries in our system using the pip command. To install the request and BeautifulSoup and Request library run the following command in your terminal.
pip install requests pip install beautifulsoup4
Downloading PDFs using Request and Beautiful Soup
To download PDFs from the internet you need to first find the URL of the pdf file using the request library. We can then use Beautiful Soup to parse the HTML response and extract the link to the PDF file. The base URL and the PDF link received after parsing is then combined to get the URL of the PDF file. Now we can use the request method and sent Get request to download the file.
Example
In the below code put the valid url of the page that contains the URL of the PDF file at the plae of ‘https://example.com/document.pdf’
import requests from bs4 import BeautifulSoup # Step 1: Fetch the PDF URL url = 'https://example.com/document.pdf' response = requests.get(url) if response.status_code == 200: # Step 2: Parse the HTML to get the PDF link soup = BeautifulSoup(response.text, 'html.parser') link = soup.find('a')['href'] # Step 3: Download the PDF pdf_url = url + link pdf_response = requests.get(pdf_url) if pdf_response.status_code == 200: with open('document.pdf', 'wb') as f: f.write(pdf_response.content) print('PDF downloaded successfully.') else: print('Error:', pdf_response.status_code) else: print('Error:', response.status_code)
Output
PDF downloaded successfully.
Conclusion
In this article, we discussed how we can download PDF files from the internet using Request and Beautiful Soup libraries in Python. With the request method, we can send an HTTP requests to validate the PDF link. Once the page containing the PDF file link is found we can use Beautiful Soup to download the parse the page and get the PDF downloadable link.