Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How are files extracted from a tar file using Python?
A TAR file is abbreviated as Tape Archive, which is an archive format used mainly in Unix and Linux environments. The Tar file is used to collect multiple files into a single file, which makes it easier to share or backup together.
In Python, when we want to work with TAR files, we can use the tarfile module, which allows us to create, read and extract TAR archives programmatically.
In this article, we will explore how to extract files from a tar file using Python.
Extracting All Files from a Tar File
The extractall() method in Python is used to extract all the files or folders of the archive into the current path if no other path is specified.
Syntax
tar.extractall(path='.', members=None)
Parameters:
-
path? Directory where files will be extracted (default: current directory) -
members? List of specific members to extract (default: all)
Example
Following example uses the extractall() method to extract all available files from a tar archive ?
import tarfile
import os
# Create a sample tar file first
with tarfile.open('sample.tar', 'w') as tar:
# Create some sample files
with open('file1.txt', 'w') as f:
f.write("This is file 1 content")
with open('file2.txt', 'w') as f:
f.write("This is file 2 content")
# Add files to tar
tar.add('file1.txt')
tar.add('file2.txt')
# Now extract all files
with tarfile.open('sample.tar', 'r') as tar:
tar.extractall(path='extracted_files')
print("All files extracted successfully.")
# List extracted files
for root, dirs, files in os.walk('extracted_files'):
for file in files:
print(f"Extracted: {os.path.join(root, file)}")
All files extracted successfully. Extracted: extracted_files/file1.txt Extracted: extracted_files/file2.txt
Extracting a Single File from Tar Archive
The extract() method in Python is used to extract a specific file from a TAR archive. This is useful when you don't want to extract the entire archive but only one particular file.
Syntax
tar.extract(member, path='.', set_attrs=True)
Example
Below example uses the extract() method to extract a single file from the tar archive ?
import tarfile
import os
# Extract only file1.txt
with tarfile.open('sample.tar', 'r') as tar:
tar.extract('file1.txt', path='single_extract')
print("Single file extracted successfully.")
# Verify the extracted file
if os.path.exists('single_extract/file1.txt'):
with open('single_extract/file1.txt', 'r') as f:
print(f"Content: {f.read()}")
Single file extracted successfully. Content: This is file 1 content
Listing Files Before Extraction
You can list all files in a tar archive before extracting using the getnames() method ?
import tarfile
# List all files in the tar archive
with tarfile.open('sample.tar', 'r') as tar:
file_list = tar.getnames()
print("Files in archive:")
for file in file_list:
print(f" - {file}")
Files in archive: - file1.txt - file2.txt
Comparison of Extraction Methods
| Method | Purpose | Use Case |
|---|---|---|
extractall() |
Extract all files | Complete archive extraction |
extract() |
Extract single file | Selective file extraction |
getnames() |
List archive contents | Preview before extraction |
Conclusion
Python's tarfile module provides efficient methods to extract files from TAR archives. Use extractall() to extract all files and extract() for specific files. Always use context managers with tarfile.open() for proper resource handling.
