
- Kali Linux Tutorial
- Kali Linux - Home
- Installation & Configuration
- Information Gathering Tools
- Vulnerability Analyses Tools
- Kali Linux - Wireless Attacks
- Website Penetration Testing
- Kali Linux - Exploitation Tools
- Kali Linux - Forensics Tools
- Kali Linux - Social Engineering
- Kali Linux - Stressing Tools
- Kali Linux - Sniffing & Spoofing
- Kali Linux - Password Cracking Tools
- Kali Linux - Maintaining Access
- Kali Linux - Reverse Engineering
- Kali Linux - Reporting Tools
- Kali Linux Useful Resources
- Kali Linux - Quick Guide
- Kali Linux - Useful Resources
- Kali Linux - Discussion
How to search contents of multiple pdf files on Linux?
The pdfgrep command in Linux is used to filter searches for a particular pattern of characters in a PDF or multiple PDFs. It is one of the most used Linux utility commands to display the lines that contain the pattern that we are trying to search.
Normally, the pattern that we are trying to search in the file is referred to as the regular expression.
Installing Pdf grep
For Ubuntu/Fedora
sudo apt-get update -y
sudo apt-get install -y pdfgrep
For CentOS
yum install pdfgrep
Syntax
pdfgrep [options...] pattern [files]
While there are plenty of different options available to us, some of the most used are −
-c : counts the number of matches per input file. -h : suppresses the prefixing of file name on output. -i : Ignores, case for matching -H : print the file name for each match -n : prefix each match with the number of the page where it is found -r : recursively search all files -R : same as -r, but it also follows all symlinks.
Now, let’s consider a case where we want to find a particular pattern in all the pdf files in a particular directory, say dir1.
Syntax
pdfgrep -HiR "word" *
In the above command replace the “word” placeholder with
For that we make use of the command shown below −
pdfgrep -HiR "func main()" *
The above command will try to find a string “func main()” in all the files in a particular directory and also in the subdirectories as well.
Output
main.go:120:func main() {}
In case we only want to find a particular pattern in a single directory and not the subdirectories then we need to use the command shown below −
pdfgrep -i "func main()" *
In the above command we made use of the -s flag which will help us to not get a warning for each subdirectory that is present inside the directory where we are running the command.
Output
main.go:120:func main() {}
Another command that we can make use of is the find command.
Command
find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "func main()"' \;
Output
./main.go:func main() {
- Related Articles
- How to Append Contents of Multiple Files Into One File on Linux?
- How to Recursively Search all Files for Strings on a Linux
- How to Convert Multiple Workbooks or Worksheets to PDF Files at Once in Excel?
- How to convert PDF files to Excel files using Python?
- How to Crack PDF Files in Python?
- How to Merge PDF Files in Bash?
- Recursive Search and Replace in Text Files in Linux
- How to Search and Remove Directories Recursively on Linux?
- Fastest way to tell if two files have the same contents in Unix/Linux
- How to download all pdf files with selenium python?
- Find and tar Files on Linux
- How to download APK files from Google Play Store on Linux
- How to join lines of two files on a common field in Linux?
- Move all files except one on Linux
- Working with PDF files in Python?
