How to clone webpage using pywebcopy in python?


Python provides Pywebcopy module, that allows us to download and store the entire website including all the images, HTML pages and other files to our machine. In this module, we have one of the functions namely save_webpage() which allows us to clone the webpage.

Installing pywebcopy module

Firstly, we have to install the pywebcopy module in the python environment using the following code.

pip install pywebcopy

On successful installation we will get the following output –

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pywebcopy
  Downloading pywebcopy-7.0.2-py2.py3-none-any.whl (46 kB)
     . . . . . . . . . . . . . . . . . . . . . . . . . . 
     . . . . . . . . . . . . . . . . . . . . . . . . . . 
     . . . . . . . . . . . . . . . . . . . . . . . . . . 

Installing collected packages: pywebcopy
Successfully installed pywebcopy-7.0.2

Syntax

Following is the syntax for using Pywebcopy module save_webpage() function.

from pywebpage import save_webpage
kwargs = {‘bypass_robots’: True, ‘project_name’:’example’}
save_webpage(url,folder,**kwargs)

Where,

  • kwargs are the optional keyword arguments that we can use while downloading the webpage

  • bypass_robots is the keyword which allows the robot.txt files to download along with the webpage

  • project_name is the name of the downloaded webpage

  • save_webpage is the function

  • URL is the link of the webpage.

  • Folder is the location where we save the downloaded file.

Example

Following is an example where we will specify the webpage URL, location for storing the file and additional keyword arguments to the save_webpage() function of pywebcopy module, then the defined webpage will be saved in the defined location with the specified name.

from pywebcopy import save_webpage
url = 'https://www.tutorialspoint.com/'
folder = 'Desktop/March 2023'
kwargs = {'bypass_robots': True, 'project_name': 'sample_webpage'}
save_webpage(url, folder, **kwargs)
print("webpage saved in the location:",folder)

Output

When we run the above code, following output will be generated -

webpage saved in the location: Desktop/March 2023

Example

Let’s see another example for this –

from pywebcopy import save_webpage
url = 'https://www.python.org/'
folder = 'Articles/March 2023'
kwargs = {'bypass_robots': False, 'project_name': 'webpage'}
save_webpage(url, folder, **kwargs)
print("webpage saved in the location:",folder)

Output

Following is the output of saving the webpage.

webpage saved in the location: Articles/March 2023

Updated on: 09-Aug-2023

745 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements