How to Crack PDF Files in Python?

Python provides powerful libraries for security testing and ethical hacking purposes. One common task is testing the strength of password-protected PDF files by attempting to crack them using dictionary attacks.

In this article, we will create a program that attempts to decrypt a password-protected PDF document using a wordlist containing common passwords. This technique is useful for security auditing and testing password strength.

Required Library

We'll use the pikepdf library, which provides a Python interface for working with PDF files. Install it using:

pip install pikepdf tqdm

We'll also use tqdm for displaying a progress bar during the cracking process.

Creating the PDF Password Cracker

Here's a complete program that attempts to crack a password-protected PDF using a wordlist ?

import pikepdf
from tqdm import tqdm

def crack_pdf_password(pdf_file, wordlist_file):
    # Load password list
    try:
        with open(wordlist_file, 'r', encoding='utf-8', errors='ignore') as file:
            passwords = [line.strip() for line in file if line.strip()]
    except FileNotFoundError:
        print(f"Wordlist file '{wordlist_file}' not found!")
        return None
    
    print(f"Loaded {len(passwords)} passwords from wordlist")
    
    # Iterate over all passwords with progress bar
    for password in tqdm(passwords, desc="Cracking PDF"):
        try:
            # Attempt to open PDF file with current password
            with pikepdf.open(pdf_file, password=password) as pdf:
                print(f"\nPassword found: {password}")
                return password
        except pikepdf._qpdf.PasswordError:
            # Password incorrect, continue to next
            continue
        except Exception as e:
            print(f"Error opening PDF: {e}")
            return None
    
    print("\nPassword not found in wordlist")
    return None

# Usage example
pdf_file = "protected.pdf"
wordlist_file = "wordlist.txt"

found_password = crack_pdf_password(pdf_file, wordlist_file)
if found_password:
    print(f"Successfully cracked PDF with password: {found_password}")
else:
    print("Failed to crack PDF password")

How It Works

The program follows these steps:

  1. Load wordlist: Reads all passwords from the wordlist file into memory
  2. Iterate through passwords: Uses each password to attempt opening the PDF
  3. Test password: If pikepdf.open() succeeds, the password is correct
  4. Handle errors: Catches PasswordError for incorrect passwords and continues
  5. Return result: Returns the found password or None if unsuccessful

Creating a Sample Wordlist

For testing purposes, you can create a simple wordlist file ?

# Create a sample wordlist
common_passwords = [
    "123456", "password", "123456789", "12345678", "12345",
    "1234567", "1234567890", "qwerty", "abc123", "111111",
    "admin", "letmein", "welcome", "monkey", "dragon"
]

with open("sample_wordlist.txt", "w") as f:
    for pwd in common_passwords:
        f.write(pwd + "\n")

print("Sample wordlist created with", len(common_passwords), "passwords")
Sample wordlist created with 15 passwords

Important Considerations

Legal and Ethical Use:

  • Only use this on PDF files you own or have explicit permission to test
  • This is for educational and security auditing purposes only
  • Unauthorized access to protected files is illegal

Performance Tips:

  • Use comprehensive wordlists like rockyou.txt for better success rates
  • Consider using multi-threading for faster processing
  • Test common passwords first before using large wordlists

Conclusion

PDF password cracking using Python and pikepdf demonstrates how dictionary attacks work in cybersecurity. This technique helps security professionals test password strength and educates users about the importance of using strong, unique passwords.

Updated on: 2026-03-25T16:51:54+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements