Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to extract all the .txt files from a zip file using Python?
Python provides a built-in module called zipfile that allows us to create, read, write, and extract ZIP archives. When we want to extract only specific files, such as all .txt files, we can filter file names using string methods like endswith() or the pathlib module.
A ZIP file is an archive format used to compress one or more files into a single file for easy storage and transfer. It reduces file size and keeps related files together for sharing over the internet and saving disk space.
Steps for Extracting Specific Files
Follow these steps to extract all .txt files from a ZIP archive ?
- Import the zipfile and os modules
- Open the ZIP file using
zipfile.ZipFile() - Get the list of all files using
namelist() - Filter files that end with .txt
- Extract matched files using the
extract()method
Method 1: Using endswith() Method
The endswith() method is the simplest approach to filter files by extension. It checks if a filename ends with the specified string ?
import zipfile
import os
# Create output directory
output_dir = 'text_files'
os.makedirs(output_dir, exist_ok=True)
# Sample data for demonstration
with zipfile.ZipFile('sample.zip', 'w') as zip_ref:
zip_ref.writestr('notes.txt', 'This is a text file')
zip_ref.writestr('data.csv', 'name,age\nJohn,25')
zip_ref.writestr('docs/readme.txt', 'Documentation file')
# Extract only .txt files
with zipfile.ZipFile('sample.zip', 'r') as zip_ref:
for member in zip_ref.namelist():
if member.lower().endswith('.txt'):
zip_ref.extract(member, output_dir)
print(f"Extracted: {member}")
Extracted: notes.txt Extracted: docs/readme.txt
Method 2: Using pathlib.Path.suffix
The pathlib module provides a more robust way to work with file paths and extensions. The suffix property returns the file extension ?
from pathlib import Path
import zipfile
import os
# Create output directory
output_dir = 'txt_files_only'
os.makedirs(output_dir, exist_ok=True)
# Create sample ZIP file
with zipfile.ZipFile('archive.zip', 'w') as zip_ref:
zip_ref.writestr('report.txt', 'Monthly report data')
zip_ref.writestr('config.json', '{"setting": "value"}')
zip_ref.writestr('logs/error.txt', 'Error log entries')
# Extract using pathlib
with zipfile.ZipFile('archive.zip', 'r') as zip_ref:
for file in zip_ref.namelist():
if Path(file).suffix.lower() == '.txt':
zip_ref.extract(file, output_dir)
print(f"Extracted: {file}")
Extracted: report.txt Extracted: logs/error.txt
Method 3: Extracting with Custom Filter Function
For more complex filtering requirements, you can create a custom function to determine which files to extract ?
import zipfile
import os
def should_extract(filename):
"""Custom filter function"""
# Extract .txt files that are not in 'temp' folders
return (filename.lower().endswith('.txt') and
'temp' not in filename.lower())
# Create sample ZIP with various files
with zipfile.ZipFile('mixed_files.zip', 'w') as zip_ref:
zip_ref.writestr('important.txt', 'Important document')
zip_ref.writestr('temp/cache.txt', 'Temporary file')
zip_ref.writestr('docs/manual.txt', 'User manual')
zip_ref.writestr('image.png', 'Image data')
# Extract with custom filter
output_dir = 'filtered_files'
os.makedirs(output_dir, exist_ok=True)
with zipfile.ZipFile('mixed_files.zip', 'r') as zip_ref:
extracted_count = 0
for file in zip_ref.namelist():
if should_extract(file):
zip_ref.extract(file, output_dir)
print(f"Extracted: {file}")
extracted_count += 1
print(f"\nTotal files extracted: {extracted_count}")
Extracted: important.txt Extracted: docs/manual.txt Total files extracted: 2
Comparison
| Method | Simplicity | Flexibility | Best For |
|---|---|---|---|
endswith() |
High | Low | Simple extension filtering |
Path.suffix |
Medium | Medium | Cross-platform path handling |
| Custom Function | Low | High | Complex filtering rules |
Conclusion
Use endswith() for simple .txt file extraction from ZIP archives. For more robust path handling, prefer pathlib.Path.suffix. Custom filter functions provide maximum flexibility for complex extraction requirements.
