Accessing the internet using the urllib.request module in Python

PythonServer Side ProgrammingProgramming

Introduction

We use the urllib.request module in Python to access and open URLs, which most often use the HTTP protocol.

The interface used is also very simple for beginners to use and learn; it uses the urlopen function which can fetch various URLs using a variety of different protocols.

You will get a better understanding of what we are working with, once we start using its various functionalities. So, let us get started.

Getting Started

The urllib library comes packaged along with Python. So, you do not need to install it separately, but in case you want to add it to your environment and you don’t already have it, you can install it using the pip package manger.

Launch your terminal and use the below code,

pip install urllib

Once you have it installed, you can import the right modules and start writing your script.

Checking out urllib.request

We most often use urllib.request to open and read data, or the source code of the page. This becomes especially useful if you are trying to retrieve data from an API. For example,

import urllib.request
request_url = urllib.request.urlopen('https://official−joke−api.appspot.com/random_ten')

The above lines of code will open the joke api and read its data.

Suppose you want to print out its content, you can use −

print(request_url.read())

Note − This would print out value in byte format. If you want plain text, use decode function.

print(request_url.read().decode())

You even save the data from the API and then parse it later using RegEx to obtain only the essential data.

Example

import urllib.request
data = urllib.request.urlopen('https://official−joke−api.appspot.com/random_ten')
data = data.read().decode()
print(data)
file = open("content.txt", "w+")
file.write(data)
file.close()

Note − You can access URLs with various protocols, including FTP, HTTPS, etc. The urlopen functions the exact same way for all the different protocols.

Sending data to a URL

If you are working with Common Gateway Interface, you might want to send data to a URL. This is similarly how it works with HTTP where it sends out POST requests.

You can achieve this using the urllib.request along with the urllib.parse modules.

Let us import the modules first.

Example

import urllib.parse
import urllib.request
url = 'http://www.google.com/cgi-bin/register.cgi'
values = {'name' : 'S Vijay Balaji', language' : 'Python' }
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

This should then print out the source code from the URL provided.

Conclusion

The urllib.request module is very useful considering that we can retrieve internet resources and obtain data from them.

It comes in handy when parsing data from an API or reading through source code of a web page to scrape its content.

Project where urllib.request was used to extract data from various APIs − https://github.com/SVijayB/Steam_WebScraper.

There are various other functions from the urllib.request module. If you are curious and want to learn more, you can go through their official documentation at− https://docs.python.org/3/library/urllib.request.html.

raja
Published on 11-Feb-2021 10:12:50
Advertisements