How to extract all the .txt files from a zip file using Python?


Multiple files can be compressed and stored together using ZIP archives, which are common in the area of data manipulation and file management. Python offers a number of modules to work with ZIP files without any issues because it is a flexible and strong language. The requirement to extract particular files from a ZIP archive, such as all the.txt files, is a frequent activity. This in-depth article will examine the procedure for using Python to extract every.txt file from a ZIP package. A few real-world examples of code will be provided to illustrate the process as we go through the principles step-by-step.

Extract All Files from a ZIP Archive

Let's learn how to extract all of the files from a ZIP package first. This example will provide the groundwork for other examples in which we will extract and filter.txt files. Here is the key −

Example

Here, we define the method extract_all_files, which accepts as parameters the path to the ZIP package and the folder to which it should be extracted. Using zipfile, we may open the ZIP archive.Use the extractall() function with ZipFile() in read mode ('r') to extract all files to the designated destination folder.

import zipfile

def extract_all_files(zip_file_path, extract_to):
   with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
      zip_ref.extractall(extract_to)

# Example usage
zip_file_path = 'my_archive.zip'
extract_to = 'destination_folder'
extract_all_files(zip_file_path, extract_to)

Extract Specific File Types

We can focus on extracting certain file types, specifically.txt files, now that we understand how to extract all files from a ZIP package. By repeatedly going through the list of files in the ZIP package and choosing just those with.txt extensions, we can do this. Check out the code −

Example

In this line of code, we loop through the list of file details that zip_ref.infolist() returned. Using the endswith() function, we determine for each file whether its filename ends with.txt. If so, we use the zip_ref.extract() function to extract that particular file to the designated destination folder.

import zipfile

def extract_txt_files(zip_file_path, extract_to):
   with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
      for file_info in zip_ref.infolist():
         if file_info.filename.endswith('.txt'):
            zip_ref.extract(file_info, extract_to)

# Example usage
zip_file_path = 'my_archive.zip'
extract_to = 'destination_folder'
extract_txt_files(zip_file_path, extract_to)

Extract Files to a Specific Directory Structure

Maintaining the directory structure while extracting data from a ZIP archive is crucial in many situations. For instance, we might wish to keep folders during extraction if the ZIP archive has them. Let's look at how to do this −

Example

In order to guarantee that the extracted file is stored in the appropriate directory structure, in this example we build the destination file path using os.path.join(). The file's relative path within the ZIP archive is provided by file_info.filename, and the final file path is produced by joining this path with the extraction directory using the os.path.join() function.

import zipfile
import os

def extract_txt_files_with_structure(zip_file_path, extract_to):
   with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
      for file_info in zip_ref.infolist():
         if file_info.filename.endswith('.txt'):
            file_path = os.path.join(extract_to, file_info.filename)
            zip_ref.extract(file_info, file_path)

# Example usage
zip_file_path = 'my_archive.zip'
extract_to = 'destination_folder'
extract_txt_files_with_structure(zip_file_path, extract_to)

Extract Files with a Prefix

Occasionally, regardless of their extensions, we might want to extract files with particular prefixes. For instance, we could wish to extract all files having the word "data_" in their name. Let's investigate how to do this −

Example

In this snippet of code, we use the startswith() function to determine whether the filename of each file begins with the requested prefix. If it does, we preserve the relative path structure of that specific file as we extract it to the desired location.

import zipfile
import os

def extract_files_with_prefix(zip_file_path, extract_to, prefix):
   with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
      for file_info in zip_ref.infolist():
         if file_info.filename.startswith(prefix):
            file_path = os.path.join(extract_to, file_info.filename)
            zip_ref.extract(file_info, file_path)

# Example usage
zip_file_path = 'my_archive.zip'
extract_to = 'destination_folder'
prefix = 'data_'
extract_files_with_prefix(zip_file_path, extract_to, prefix)

Extract Files with a Custom Extraction Function

What if particular circumstances need us to carry out a more complicated extraction? This may be done by utilizing a unique extraction function. See how to put it into practice −

Example

In this illustration, a custom extraction function called custom_extraction_func() is defined. It accepts a file_info object as input and returns True or False depending on certain criteria. The.txt extension and a file size more than 1024 bytes are the custom conditions in this situation.

The ZIP archive path, destination folder, and the specific extraction function are sent as parameters to the custom_extract() function. Each file in the ZIP archive is then given a specific extraction function call, and extraction only happens if the function returns True.

import zipfile
import os

def custom_extraction_func(file_info):
   # Your custom condition here
   return file_info.filename.endswith('.txt') and file_info.file_size > 1024

def custom_extract(zip_file_path, extract_to, extraction_func):
   with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
      for file_info in zip_ref.infolist():
         if extraction_func(file_info):
            file_path = os.path.join(extract_to, file_info.filename)
            zip_ref.extract(file_info, file_path)

# Example usage with the custom_extraction_func
zip_file_path = 'my_archive.zip'
extract_to = 'destination_folder'
custom_extract(zip_file_path, extract_to, custom_extraction_func)

Conclusion

In this in-depth article, we examined how to use Python's flexible zipfile module to extract all the.txt files from a ZIP package. The foundations of the zipfile module and ZIP files were first understood.

Then, we provided a few real-world code examples that covered a range of situations, including extracting all files, extracting certain file types, maintaining directory structures, extracting files with prefixes, and creating unique extraction routines.

You are now ready to handle ZIP archives in Python easily thanks to your newly acquired expertise. You have limitless options for managing and modifying ZIP files thanks to the zipfile module, which also gives you strong capabilities for handling tricky file operations. So go ahead and unleash the full potential of ZIP archives by putting your Python abilities to the test!

Updated on: 22-Aug-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements