How to download a website page on Linux terminal?

Operating SystemLinuxMCA

The Linux command line provides greta features for web crawling in addition to its inherent capabilities to handle web servers and web browsing. In this article we will check for few tools which are wither available or can be installed and used in the Linux environment for offline web browsing. This is achieved by basically downloading the webpage or many webpages.

Wget

Wget is probably the most famous one among all the downloading options. It allows downloading from http, https, as well as FTP servers. It can download the entire website and also allows proxy browsing.

Below are the steps to get it installed and start using it.

Check if wget already available

ubuntu@ubuntu:~$ which wget ; echo $?

Running the above code gives us the following result:

/usr/bin/wget
0

If the exit code($?) is 1 then we runt he below command to install wget.

ubuntu@ubuntu:~$ sudo apt-get install wget

Now we run the wget command for a specific webpage or a website to be downloaded.

#Downlaod a webpage
wget https://en.wikipedia.org/wiki/Linux_distribution
# Download entire website
wget abc.com

Running the above code gives us the following result. We show the result only for the web page and not the whole website. Thee downloaded file gets saved in the current directory.

ubuntu@ubuntu:~$ wget https://en.wikipedia.org/wiki/Linux_distribution
--2019-12-29 23:31:41-- https://en.wikipedia.org/wiki/Linux_distribution
Resolving en.wikipedia.org (en.wikipedia.org)... 103.102.166.224, 2001:df2:e500:ed1a::1
Connecting to en.wikipedia.org (en.wikipedia.org)|103.102.166.224|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 216878 (212K) [text/html]
Saving to: ‘Linux_distribution’
Linux_distribution 100%[===================>] 211.79K 1.00MB/s in 0.2s
2019-12-29 23:31:42 (1.00 MB/s) - ‘Linux_distribution’ saved [216878/216878]

cURL

cURL is a client side application. It supports downloading files from http, https,FTP,FTPS, Telnet, IMAP etc. It has additional support for different types of downloads as compared to wget.

Below are the steps to get it installed and start using it.

Check if cURL already available

ubuntu@ubuntu:~$ which cURL ; echo $?

Running the above code gives us the following result:

1

The value of 1 indicates cURL is not available in the system. So we will install it using the below command.

ubuntu@ubuntu:~$ sudo apt-get install curl

Running the above code gives us the following result indicating the installation of cURL.

[sudo] password for ubuntu:
Reading package lists... Done
….
Get:1 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 curl amd64 7.47.0-1ubuntu2.14 [139 kB]
Fetched 139 kB in 21s (6,518 B/s)
…….
Setting up curl (7.47.0-1ubuntu2.14) ...

Next we user cURL to download a webpage.

curl -O https://en.wikipedia.org/wiki/Linux_distribution

Running the above code gives us the following result. You can locate the downloaded in the current working directory.

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 211k 100 211k 0 0 312k 0 --:--:-- --:--:-- --:--:-- 311k
raja
Published on 03-Jan-2020 06:40:41
Advertisements