How to Save Pandas Dataframe as gzip/zip File?

Pandas DataFrames can be saved in compressed gzip/zip format to reduce file size and improve storage efficiency. Python provides multiple approaches using gzip, zipfile modules, and built-in compression parameters in pandas methods.

Method 1: Using to_csv() with Built-in Compression

The simplest approach is using pandas' built-in compression parameter with to_csv() ?

Saving as Gzip File

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Save DataFrame as gzip file using built-in compression
df.to_csv('data.csv.gz', index=False, compression='gzip')

print("DataFrame saved as data.csv.gz")
DataFrame saved as data.csv.gz

Saving as Zip File

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Save DataFrame as zip file using built-in compression
df.to_csv('data.csv.zip', index=False, compression='zip')

print("DataFrame saved as data.csv.zip")
DataFrame saved as data.csv.zip

Method 2: Using gzip Module Directly

For more control over the compression process, you can use the gzip module directly ?

import pandas as pd
import gzip

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Save DataFrame as gzip file using gzip module
with gzip.open('manual_data.gz', 'wt') as f:
    df.to_csv(f, index=False)

print("DataFrame saved using gzip module")
DataFrame saved using gzip module

Method 3: Using zipfile Module

For zip compression with more options, use the zipfile module ?

import pandas as pd
import zipfile
import io

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Save DataFrame as zip file using zipfile module
with zipfile.ZipFile('manual_data.zip', 'w', compression=zipfile.ZIP_DEFLATED) as zipf:
    csv_buffer = io.StringIO()
    df.to_csv(csv_buffer, index=False)
    zipf.writestr('data.csv', csv_buffer.getvalue())

print("DataFrame saved using zipfile module")
DataFrame saved using zipfile module

Method 4: Using to_pickle() with Compression

Pickle format preserves DataFrame structure and data types better than CSV ?

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Save as compressed pickle files
df.to_pickle('data.pkl.gz', compression='gzip')
df.to_pickle('data.pkl.zip', compression='zip')

print("DataFrame saved as compressed pickle files")

# Read back to verify
df_loaded = pd.read_pickle('data.pkl.gz', compression='gzip')
print(df_loaded)
DataFrame saved as compressed pickle files
      Name  Age  Salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   70000

Comparison of Methods

Method Format Data Preservation File Size Best For
to_csv() with compression CSV Text only Larger Human-readable data
to_pickle() with compression Pickle Complete DataFrame Smaller Python-specific storage
gzip/zipfile modules Any Depends on method Customizable Custom compression needs

Conclusion

Use to_csv(compression='gzip') for simple text data compression. Use to_pickle(compression='gzip') to preserve DataFrame structure and data types. Both approaches significantly reduce file size for storage and transmission.

Updated on: 2026-03-27T07:27:49+05:30

7K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements