Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Crack PDF Files in Python?
Python provides powerful libraries for security testing and ethical hacking purposes. One common task is testing the strength of password-protected PDF files by attempting to crack them using dictionary attacks.
In this article, we will create a program that attempts to decrypt a password-protected PDF document using a wordlist containing common passwords. This technique is useful for security auditing and testing password strength.
Required Library
We'll use the pikepdf library, which provides a Python interface for working with PDF files. Install it using:
pip install pikepdf tqdm
We'll also use tqdm for displaying a progress bar during the cracking process.
Creating the PDF Password Cracker
Here's a complete program that attempts to crack a password-protected PDF using a wordlist ?
import pikepdf
from tqdm import tqdm
def crack_pdf_password(pdf_file, wordlist_file):
# Load password list
try:
with open(wordlist_file, 'r', encoding='utf-8', errors='ignore') as file:
passwords = [line.strip() for line in file if line.strip()]
except FileNotFoundError:
print(f"Wordlist file '{wordlist_file}' not found!")
return None
print(f"Loaded {len(passwords)} passwords from wordlist")
# Iterate over all passwords with progress bar
for password in tqdm(passwords, desc="Cracking PDF"):
try:
# Attempt to open PDF file with current password
with pikepdf.open(pdf_file, password=password) as pdf:
print(f"\nPassword found: {password}")
return password
except pikepdf._qpdf.PasswordError:
# Password incorrect, continue to next
continue
except Exception as e:
print(f"Error opening PDF: {e}")
return None
print("\nPassword not found in wordlist")
return None
# Usage example
pdf_file = "protected.pdf"
wordlist_file = "wordlist.txt"
found_password = crack_pdf_password(pdf_file, wordlist_file)
if found_password:
print(f"Successfully cracked PDF with password: {found_password}")
else:
print("Failed to crack PDF password")
How It Works
The program follows these steps:
- Load wordlist: Reads all passwords from the wordlist file into memory
- Iterate through passwords: Uses each password to attempt opening the PDF
-
Test password: If
pikepdf.open()succeeds, the password is correct -
Handle errors: Catches
PasswordErrorfor incorrect passwords and continues - Return result: Returns the found password or None if unsuccessful
Creating a Sample Wordlist
For testing purposes, you can create a simple wordlist file ?
# Create a sample wordlist
common_passwords = [
"123456", "password", "123456789", "12345678", "12345",
"1234567", "1234567890", "qwerty", "abc123", "111111",
"admin", "letmein", "welcome", "monkey", "dragon"
]
with open("sample_wordlist.txt", "w") as f:
for pwd in common_passwords:
f.write(pwd + "\n")
print("Sample wordlist created with", len(common_passwords), "passwords")
Sample wordlist created with 15 passwords
Important Considerations
Legal and Ethical Use:
- Only use this on PDF files you own or have explicit permission to test
- This is for educational and security auditing purposes only
- Unauthorized access to protected files is illegal
Performance Tips:
- Use comprehensive wordlists like
rockyou.txtfor better success rates - Consider using multi-threading for faster processing
- Test common passwords first before using large wordlists
Conclusion
PDF password cracking using Python and pikepdf demonstrates how dictionary attacks work in cybersecurity. This technique helps security professionals test password strength and educates users about the importance of using strong, unique passwords.
