
- Kali Linux Tutorial
- Kali Linux - Home
- Installation & Configuration
- Information Gathering Tools
- Vulnerability Analyses Tools
- Kali Linux - Wireless Attacks
- Website Penetration Testing
- Kali Linux - Exploitation Tools
- Kali Linux - Forensics Tools
- Kali Linux - Social Engineering
- Kali Linux - Stressing Tools
- Kali Linux - Sniffing & Spoofing
- Kali Linux - Password Cracking Tools
- Kali Linux - Maintaining Access
- Kali Linux - Reverse Engineering
- Kali Linux - Reporting Tools
- Kali Linux Useful Resources
- Kali Linux - Quick Guide
- Kali Linux - Useful Resources
- Kali Linux - Discussion
Get the Contents of a Web Page in a Shell Variable on Linux
Introduction
One of the most useful and powerful features of the Linux command line is the ability to manipulate text. This can be especially useful when working with web pages, as web page content can often be saved as plain text and then manipulated with command-line tools. In this article, we will explore how to insert the content of a web page into a shell variable in Linux.
What is a Shell variable?
A Shell variable is a value stored in memory and can be used by the shell (command-line interface) and other programs. Shell variables are usually defined in the form NAME=value, where “NAME” is the name of the variable and “value” is the value stored in the variable.
Shell variables can be used to store a wide variety of information, including the output of command-line tools, the contents of text files, and even the contents of web pages.
Using curl to get the content of a web page
One of the easiest ways to put the content of a web page into a shell variable is to use the “curl” command. Curl is a command line tool used to transfer data to or from a server. It supports a wide range of protocols, including HTTP, HTTPS, FTP and much more.
To get the content of a web page into a shell variable using curl, we can use the following command −
$ webcontent=$(curl -s https://www.example.com)
This command will store the content of the web page at https://www.example.com in the shell variable "webcontent". The "-s" flag instructs curl to run in silent mode, which means it will not print any output to the terminal.
Using Grep to extract specific lines from web page
Once we have the web page content in a shell variable, we can use command line tools like grep to extract specific lines of text from the web page. Grep is a powerful command line tool used to search for patterns in text.
For example, suppose we want to extract all links from the web page. We can use the following command to do this −
$ links=$(echo "$webcontent" | grep -o 'href="[^"]*"')
This command will use grep to find all occurrences of the pattern 'href="[^"]*"' in the web page content, which matches all links on the page. The "-o" flag tells grep to print only the corresponding part of the text, which in this case is the link itself. The output of this command will be a list of all links on the web page, one link per line.
Using Awk to extract specific fields from web page
Another useful command-line tool for extracting specific information from text is “awk”. Awk is a programming language designed for text processing and is often used to extract specific fields from text files.
For example, suppose we want to extract the title of the web page. The web page title is usually stored in the "title" element of the HTML source code, which looks like this −
<title>Example Web Page</title>
To extract the web page title using awk, we can use the following command −
$ title=$(echo "$webcontent" | awk '// {print $0}' | sed 's/<[^>]*>//g')
This command will search the template.
Using Cut to extract specific fields from the web page
Another useful command-line tool for extracting specific fields from text is “cut”. Cut is a command-line tool used to extract specific fields from a file or command output.
For example, suppose you want to extract the first and last name from a list of names in the following format: "first last". We can use the following command to do this −
$ names="John Smith Jane Doe" $ first_names=$(echo "$names" | cut -d' ' -f1) $ last_names=$(echo "$names" | cut -d' ' -f2) $ echo "$first_names" # Output: John Jane $ echo "$last_names" # Output: Smith Doe
This command uses the "-d" flag to specify the delimiter (in this case a space) and the "-f" flag to specify the field number we want to extract. The output of the command is a list of first names and a list of last names, separated by the delimiter.
Conclusion
In this article, we've explored how to get the content of a webpage into a shell variable on Linux and how to use command-line tools like curl, grep, awk, and cut to extract specific information from the webpage. These tools are powerful and they can save you a lot of time and effort when working with web pages on the command line
- Related Articles
- How to get the web page contents from a WebView in Android?
- Auto Logout in Linux Shell Using TMOUT Shell Variable
- Delete expoted environment Variable in Linux shell
- How to read the contents of a web page without using any external library in Java?
- Hide a video tag on a web page - JavaScript
- Center pagination on a web page with CSS
- PHP: How do I display the contents of a textfile on my page?
- How to get the protocol and page path of the current web page in JavaScript?
- Read the Source Code of Shell Commands on Linux
- How to download a website page on Linux terminal?
- How do you get the footer to stay at the bottom of a Web page in HTML?
- Python Program to crawl a web page and get most frequent words
- How to download all images on a web page at once?
- Creating ‘Copy to Clipboard’ feature on a web page with JavaScript
- Storing a Command in a Variable in a Shell Script
