Article Categories

Selected Reading

Get the Contents of a Web Page in a Shell Variable on Linux

Linux Operating System Open Source

One of the most useful and powerful features of the Linux command line is the ability to manipulate text. This can be especially useful when working with web pages, as web page content can often be saved as plain text and then manipulated with command-line tools. In this article, we will explore how to store the content of a web page into a shell variable in Linux.

What is a Shell Variable?

A Shell variable is a value stored in memory that can be used by the shell (command-line interface) and other programs. Shell variables are usually defined in the form NAME=value, where "NAME" is the name of the variable and "value" is the value stored in the variable.

Shell variables can be used to store a wide variety of information, including the output of command-line tools, the contents of text files, and even the contents of web pages.

Using curl to Get Web Page Content

One of the easiest ways to put the content of a web page into a shell variable is to use the curl command. Curl is a command line tool used to transfer data to or from a server. It supports a wide range of protocols, including HTTP, HTTPS, FTP and much more.

To get the content of a web page into a shell variable using curl, we can use the following command

webcontent=$(curl -s https://www.example.com)

This command will store the content of the web page at https://www.example.com in the shell variable "webcontent". The -s flag instructs curl to run in silent mode, which means it will not print any output to the terminal.

To verify the content has been stored, you can display it using

echo "$webcontent"

Using wget as Alternative

Another common tool for fetching web content is wget. You can use it to download web page content to a shell variable

webcontent=$(wget -qO- https://www.example.com)

The -q flag runs wget quietly, and -O- outputs the content to stdout instead of a file.

Using Grep to Extract Specific Content

Once we have the web page content in a shell variable, we can use command line tools like grep to extract specific lines of text from the web page. Grep is a powerful command line tool used to search for patterns in text.

For example, to extract all links from the web page, we can use the following command

links=$(echo "$webcontent" | grep -o 'href="[^"]*"')
echo "$links"

This command uses grep to find all occurrences of the pattern href="[^"]*" in the web page content, which matches all href attributes. The -o flag tells grep to print only the matching part of the text.

Using Awk to Extract Specific Fields

Another useful command-line tool for extracting specific information from text is awk. Awk is a programming language designed for text processing and is often used to extract specific fields from text files.

For example, to extract the title of the web page (which is stored between <title> tags), we can use

title=$(echo "$webcontent" | grep -o '<title>[^<]*</title>' | sed 's/<title>\|<\/title>//g')
echo "Page title: $title"

This command first extracts the complete title element using grep, then removes the HTML tags using sed to get just the title text.

Using Cut to Extract Fields

The cut command is useful for extracting specific fields from delimited text. For example, if we have comma-separated data from a web API response stored in a variable

csv_data="name,age,city
John,25,New York
Jane,30,London"

# Extract just the names (first column)
names=$(echo "$csv_data" | cut -d',' -f1)
echo "$names"

name
John
Jane

The -d flag specifies the delimiter (comma in this case) and -f1 extracts the first field.

Practical Example Weather Data

Here's a practical example that fetches weather data and extracts specific information

# Fetch weather data (example using a weather API)
weather_data=$(curl -s "http://wttr.in/London?format=3")
echo "Current weather: $weather_data"

# Extract temperature using awk
temperature=$(echo "$weather_data" | awk '{print $2}')
echo "Temperature: $temperature"

Error Handling

When fetching web content, it's important to handle potential errors. You can check if the curl command succeeded

if webcontent=$(curl -s https://www.example.com); then
    echo "Successfully fetched web content"
    echo "Content length: ${#webcontent}"
else
    echo "Failed to fetch web content"
fi

Conclusion

Shell variables provide a powerful way to store and manipulate web page content in Linux. Using tools like curl, grep, awk, and cut, you can fetch web content and extract specific information efficiently. These techniques are invaluable for web scraping, monitoring, and automated data processing tasks from the command line.

Pradeep Jhuriya

Updated on: 2026-03-17T09:01:38+05:30

3K+ Views

Previous Next