Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Save Pandas Dataframe as gzip/zip File?
Pandas DataFrames can be saved in compressed gzip/zip format to reduce file size and improve storage efficiency. Python provides multiple approaches using gzip, zipfile modules, and built-in compression parameters in pandas methods.
Method 1: Using to_csv() with Built-in Compression
The simplest approach is using pandas' built-in compression parameter with to_csv() ?
Saving as Gzip File
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)
# Save DataFrame as gzip file using built-in compression
df.to_csv('data.csv.gz', index=False, compression='gzip')
print("DataFrame saved as data.csv.gz")
DataFrame saved as data.csv.gz
Saving as Zip File
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)
# Save DataFrame as zip file using built-in compression
df.to_csv('data.csv.zip', index=False, compression='zip')
print("DataFrame saved as data.csv.zip")
DataFrame saved as data.csv.zip
Method 2: Using gzip Module Directly
For more control over the compression process, you can use the gzip module directly ?
import pandas as pd
import gzip
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)
# Save DataFrame as gzip file using gzip module
with gzip.open('manual_data.gz', 'wt') as f:
df.to_csv(f, index=False)
print("DataFrame saved using gzip module")
DataFrame saved using gzip module
Method 3: Using zipfile Module
For zip compression with more options, use the zipfile module ?
import pandas as pd
import zipfile
import io
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)
# Save DataFrame as zip file using zipfile module
with zipfile.ZipFile('manual_data.zip', 'w', compression=zipfile.ZIP_DEFLATED) as zipf:
csv_buffer = io.StringIO()
df.to_csv(csv_buffer, index=False)
zipf.writestr('data.csv', csv_buffer.getvalue())
print("DataFrame saved using zipfile module")
DataFrame saved using zipfile module
Method 4: Using to_pickle() with Compression
Pickle format preserves DataFrame structure and data types better than CSV ?
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)
# Save as compressed pickle files
df.to_pickle('data.pkl.gz', compression='gzip')
df.to_pickle('data.pkl.zip', compression='zip')
print("DataFrame saved as compressed pickle files")
# Read back to verify
df_loaded = pd.read_pickle('data.pkl.gz', compression='gzip')
print(df_loaded)
DataFrame saved as compressed pickle files
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
Comparison of Methods
| Method | Format | Data Preservation | File Size | Best For |
|---|---|---|---|---|
| to_csv() with compression | CSV | Text only | Larger | Human-readable data |
| to_pickle() with compression | Pickle | Complete DataFrame | Smaller | Python-specific storage |
| gzip/zipfile modules | Any | Depends on method | Customizable | Custom compression needs |
Conclusion
Use to_csv(compression='gzip') for simple text data compression. Use to_pickle(compression='gzip') to preserve DataFrame structure and data types. Both approaches significantly reduce file size for storage and transmission.
