How to Compress files with ZIPFILE module in Python.


Problem

You want to create a compress files in python.

Introduction

ZIP files can hold the compressed contents of many other files. Compressing a file reduces its size on disk, which is useful when transferring it over the internet or between the systems using Control-m AFT or Connect direct or even scp.

Python programs creates ZIP files using functions in the zipfile module.

How to do it...

1. We will be using zipfile and io packages. Install them with pip if any of the packages are missing on your system. If you are unsure, use pip freeze command to validate the packages.

2. We will write a function to write sample data to a file. The function write_data_to_files below takes data as input and creates a file in the current directory name.

EXample

# Function : write_data_to_files
def write_data_to_files(inp_data, file_name):
"""
function : create a csv file with the data passed to this code
args : inp_data : data to be written to the target file
file_name : target file name to store the data
return : none
assumption : File to be created and this code are in same directory.
"""
print(f" *** Writing the data to - {file_name}")
throwaway_storage = io.StringIO(inp_data)
with open(file_name, 'w') as f:
for line in throwaway_storage:
f.write(line)

3. We will now write a function file_compress to zip the files created in above step. This function accepts list of files, go through them and compress it to a zip file. Detailed explanation of each step is provided in comments.

To create your own compressed ZIP files, you must open the ZipFile object in write mode by passing 'w' as the second argument.

When you pass a path to the write() method of a ZipFile object, Python will compress the file at that path and add it into the ZIP file.

The first argument for write() method is a string of the filename to add.

The second argument is the compression type parameter - which tells the computer what algorithm it should use to compress the files.

Example

# Function : file_compress
def file_compress(inp_file_names, out_zip_file):
"""
function : file_compress
args : inp_file_names : list of filenames to be zipped
out_zip_file : output zip file
return : none
assumption : Input file paths and this code is in same directory.
"""
# Select the compression mode ZIP_DEFLATED for compression
# or zipfile.ZIP_STORED to just store the file
compression = zipfile.ZIP_DEFLATED
print(f" *** Input File name passed for zipping - {inp_file_names}")

# create the zip file first parameter path/name, second mode
print(f' *** out_zip_file is - {out_zip_file}')
zf = zipfile.ZipFile(out_zip_file, mode="w")

try:
for file_to_write in inp_file_names:
# Add file to the zip file
# first parameter file to zip, second filename in zip
print(f' *** Processing file {file_to_write}')
zf.write(file_to_write, file_to_write, compress_type=compression)

except FileNotFoundError as e:
print(f' *** Exception occurred during zip process - {e}')
finally:
# Don't forget to close the file!
zf.close()

4. We will call the functions to create two csv files and then zip them. We will use tennis players data who won more than 1 grandslam titles to one file - temporary_file1_for_zip.csv and tennis players who won less than or equal to 1 grandslam in another file temporary_file1_for_zip.csv. We will then zip both these files to temporary.zip file.

Example

import zipfile
import io
import pandas as pd

file_name1 = "temporary_file1_for_zip.csv"
file_name2 = "temporary_file2_for_zip.csv"
file_name_list = [file_name1, file_name2]
zip_file_name = "temporary.zip"

# data for file 1
file_data_1 = """
player,titles
Federer,20
Nadal,20
Djokovic,17
Murray,3
"""

# data for file 2

file_data_2 = """
player,titles
Theim,1
Zverev,0
Medvedev,0
Rublev,0
"""

# write the file_data to file_name
write_data_to_files(file_data_1, file_name1)
write_data_to_files(file_data_2, file_name2)

# zip the file_name to zip_file_name
file_compress(file_name_list, zip_file_name)

Example

5.Putting everything together discussed in above steps.

# Define the data
# let us create a zip file with a single file

import zipfile
import io
import pandas as pd

# Function : write_data_to_files
def write_data_to_files(inp_data, file_name):
"""
function : create a csv file with the data passed to this code
args : inp_data : data to be written to the target file
file_name : target file name to store the data
return : none
assumption : File to be created and this code are in same directory.
"""
print(f" *** Writing the data to - {file_name}")
throwaway_storage = io.StringIO(inp_data)
with open(file_name, 'w') as f:
for line in throwaway_storage:
f.write(line)

# Function : file_compress
def file_compress(inp_file_names, out_zip_file):
"""
function : file_compress
args : inp_file_names : list of filenames to be zipped
out_zip_file : output zip file
return : none
assumption : Input file paths and this code is in same directory.
"""
# Select the compression mode ZIP_DEFLATED for compression
# or zipfile.ZIP_STORED to just store the file
compression = zipfile.ZIP_DEFLATED
print(f" *** Input File name passed for zipping - {inp_file_names}")

# create the zip file first parameter path/name, second mode
print(f' *** out_zip_file is - {out_zip_file}')
zf = zipfile.ZipFile(out_zip_file, mode="w")

try:
for file_to_write in inp_file_names:
# Add file to the zip file
# first parameter file to zip, second filename in zip
print(f' *** Processing file {file_to_write}')
zf.write(file_to_write, file_to_write, compress_type=compression)

except FileNotFoundError as e:
print(f' *** Exception occurred during zip process - {e}')
finally:
# Don't forget to close the file!
zf.close()

# __main__ program
if __name__ == '__main__':
# Define your file name and data
file_name1 = "temporary_file1_for_zip.csv"
file_name2 = "temporary_file2_for_zip.csv"

file_name_list = [file_name1, file_name2]
zip_file_name = "temporary.zip"
file_data_1 = """
player,titles
Federer,20
Nadal,20
Djokovic,17
Murray,3
"""

file_data_2 = """
player,titles
Theim,1
Zverev,0
Medvedev,0
Rublev,0
"""
# write the file_data to file_name
write_data_to_files(file_data_1, file_name1)
write_data_to_files(file_data_2, file_name2)

# zip the file_name to zip_file_name
file_compress(file_name_list, zip_file_name)
*** Writing the data to - temporary_file1_for_zip.csv
*** Writing the data to - temporary_file2_for_zip.csv
*** Input File name passed for zipping - ['temporary_file1_for_zip.csv', 'temporary_file2_for_zip.csv']
*** out_zip_file is - temporary.zip
*** Processing file temporary_file1_for_zip.csv
*** Processing file temporary_file2_for_zip.csv

Output

When the above code is executed, the output is

  • temporary_file1_for_zip.csv created in current directory.

  • temporary_file2_for_zip.csv created in current directory.

  • temporary.zip file is created in current directory.

Updated on: 09-Nov-2020

9K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements