How to extract all the .txt files from a zip file using Python?

Python provides a built-in module called zipfile that allows us to create, read, write, and extract ZIP archives. When we want to extract only specific files, such as all .txt files, we can filter file names using string methods like endswith() or the pathlib module.

A ZIP file is an archive format used to compress one or more files into a single file for easy storage and transfer. It reduces file size and keeps related files together for sharing over the internet and saving disk space.

Steps for Extracting Specific Files

Follow these steps to extract all .txt files from a ZIP archive ?

  • Import the zipfile and os modules
  • Open the ZIP file using zipfile.ZipFile()
  • Get the list of all files using namelist()
  • Filter files that end with .txt
  • Extract matched files using the extract() method

Method 1: Using endswith() Method

The endswith() method is the simplest approach to filter files by extension. It checks if a filename ends with the specified string ?

import zipfile
import os

# Create output directory
output_dir = 'text_files'
os.makedirs(output_dir, exist_ok=True)

# Sample data for demonstration
with zipfile.ZipFile('sample.zip', 'w') as zip_ref:
    zip_ref.writestr('notes.txt', 'This is a text file')
    zip_ref.writestr('data.csv', 'name,age\nJohn,25')
    zip_ref.writestr('docs/readme.txt', 'Documentation file')

# Extract only .txt files
with zipfile.ZipFile('sample.zip', 'r') as zip_ref:
    for member in zip_ref.namelist():
        if member.lower().endswith('.txt'):
            zip_ref.extract(member, output_dir)
            print(f"Extracted: {member}")
Extracted: notes.txt
Extracted: docs/readme.txt

Method 2: Using pathlib.Path.suffix

The pathlib module provides a more robust way to work with file paths and extensions. The suffix property returns the file extension ?

from pathlib import Path
import zipfile
import os

# Create output directory
output_dir = 'txt_files_only'
os.makedirs(output_dir, exist_ok=True)

# Create sample ZIP file
with zipfile.ZipFile('archive.zip', 'w') as zip_ref:
    zip_ref.writestr('report.txt', 'Monthly report data')
    zip_ref.writestr('config.json', '{"setting": "value"}')
    zip_ref.writestr('logs/error.txt', 'Error log entries')

# Extract using pathlib
with zipfile.ZipFile('archive.zip', 'r') as zip_ref:
    for file in zip_ref.namelist():
        if Path(file).suffix.lower() == '.txt':
            zip_ref.extract(file, output_dir)
            print(f"Extracted: {file}")
Extracted: report.txt
Extracted: logs/error.txt

Method 3: Extracting with Custom Filter Function

For more complex filtering requirements, you can create a custom function to determine which files to extract ?

import zipfile
import os

def should_extract(filename):
    """Custom filter function"""
    # Extract .txt files that are not in 'temp' folders
    return (filename.lower().endswith('.txt') and 
            'temp' not in filename.lower())

# Create sample ZIP with various files
with zipfile.ZipFile('mixed_files.zip', 'w') as zip_ref:
    zip_ref.writestr('important.txt', 'Important document')
    zip_ref.writestr('temp/cache.txt', 'Temporary file')
    zip_ref.writestr('docs/manual.txt', 'User manual')
    zip_ref.writestr('image.png', 'Image data')

# Extract with custom filter
output_dir = 'filtered_files'
os.makedirs(output_dir, exist_ok=True)

with zipfile.ZipFile('mixed_files.zip', 'r') as zip_ref:
    extracted_count = 0
    for file in zip_ref.namelist():
        if should_extract(file):
            zip_ref.extract(file, output_dir)
            print(f"Extracted: {file}")
            extracted_count += 1
    
    print(f"\nTotal files extracted: {extracted_count}")
Extracted: important.txt
Extracted: docs/manual.txt

Total files extracted: 2

Comparison

Method Simplicity Flexibility Best For
endswith() High Low Simple extension filtering
Path.suffix Medium Medium Cross-platform path handling
Custom Function Low High Complex filtering rules

Conclusion

Use endswith() for simple .txt file extraction from ZIP archives. For more robust path handling, prefer pathlib.Path.suffix. Custom filter functions provide maximum flexibility for complex extraction requirements.

Updated on: 2026-03-24T18:43:07+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements