- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Sparse Calculation and Conversion
Sparse data structures in Pandas store data efficiently by not including repeated values (like NaN). This results in significant memory savings, especially with large datasets containing many repeated values. In the previous tutorial, we learned about the creation and benefits of using Sparse Arrays and Sparse DataFrames.
Pandas allows you to perform calculations on sparse arrays, where the results can also be kept in a sparse format. This feature helps maintain memory efficiency during operations, and the .sparse accessor makes it easy to switch between sparse and dense representations as needed.
In this tutorial, we will learn about performing calculations on sparse data, converting between sparse and dense formats, and applying specific NumPy functions.
Sparse Calculation with NumPy Universal Functions
Sparse arrays in Pandas support element-wise operations using NumPy's universal functions (ufuncs), which allow efficient, vectorized calculations. The output of these operations will return a sparse array, maintaining memory efficiency.
Example
The following example demonstrates the sparse array calculation with NumPy universal function (ufuncs).
import pandas as pd
import numpy as np
# Create a Sparse Data with NaN values
sparse_series = pd.arrays.SparseArray([8, np.nan, 4, np.nan, 6, np.nan])
# Display input of the sparse object
print("Input sparse object:")
print(sparse_series)
# Apply the addition function (np.add) to the Sparse Data
result = np.add(sparse_series, 2)
print("Output after calling the universal function:")
print(result)
Following is the output of the above code −
Input sparse object: [8.0, nan, 4.0, nan, 6.0, nan] Fill: nan IntIndex Indices: array([0, 2, 4], dtype=int32) Output after calling the universal function: [10.0, nan, 6.0, nan, 8.0, nan] Fill: nan IntIndex Indices: array([0, 2, 4], dtype=int32)
Applying Ufuncs with Custom Fill Values
If you specify a fill_value other than NaN, the ufunc is also applied to the fill value, which ensures correct results for dense conversion.
Example
This example applies the ufunc to the sparse object with the custom fill value.
import pandas as pd
import numpy as np
# Create a Sparse Data with NaN values
sparse_series = pd.arrays.SparseArray([8, np.nan, 4, np.nan, 6, np.nan], fill_value=-4)
# Display input of the sparse object
print("Input sparse object:")
print(sparse_series)
# Apply the addition function (np.add) to the Sparse Data
result = np.add(sparse_series, 2)
print("Output after calling the universal function:")
print(result)
print('Convert to a dense array', result.to_dense())
Following is the output of the above code −
Input sparse object: [8.0, nan, 4.0, nan, 6.0, nan] Fill: -4 IntIndex Indices: array([0, 1, 2, 3, 4, 5], dtype=int32) Output after calling the universal function: [10.0, nan, 6.0, nan, 8.0, nan] Fill: -2.0 IntIndex Indices: array([0, 1, 2, 3, 4, 5], dtype=int32) Convert to a dense array [10. nan 6. nan 8. nan]
In the above output you can observe that the fill value is updated in the resulting Sparse object.
Converting Dense DataFrames to Sparse
To create a sparse DataFrame from a dense one, can be done by specifying the SparseDtype in the astype() method.
The SparseDtype compresses the data by storing only the non-zero elements, reducing memory usage. Here, 0 is the fill value for this sparse integer type.
Example
This example demonstrates the converting the dense DataFrame to the sparse DataFrame by using the astype() method with SparseDtype constructor.
import pandas as pd
import numpy as np
# Define a dense DataFrame
dense_df = pd.DataFrame({"A": [8, np.nan, 4, 0, 6, np.nan]})
# Convert to sparse using SparseDtype
sparse_df = dense_df.astype(pd.SparseDtype(float, fill_value=0))
# Display output sparse DataFrame
print("Output sparse DataFrame:")
print(sparse_df)
Following is the output of the above code −
Output sparse DataFrame:
A
0 8.0
1 NaN
2 4.0
3 0.0
4 6.0
5 NaN
Converting Sparse Data to Dense Format
Pandas allows you to convert sparse data structures back to dense format (where all values are stored, including NaN), which can be done using the .sparse accessor for sparse objects.
Example
To convert a sparse DataFrame to a dense format, you can use the .sparse.to_dense() accessor.
import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
df = pd.DataFrame(np.random.randn(100, 4))
df.iloc[:90] = np.nan
# Convert to a sparse DataFrame
sparse_df = df.astype(pd.SparseDtype("float", np.nan))
# Display input of the sparse object
print("Input sparse object:")
print(sparse_df)
print('\nConvert to a dense format:')
print(sparse_df.sparse.to_dense())
Following is the output of the above code −
Input sparse object:
0 1 2 3
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
.. ... ... ... ...
95 -0.864730 0.077271 2.986417 1.088242
96 1.552542 0.877490 -0.443526 1.043148
97 0.865025 -0.083762 -1.278229 -1.246637
98 0.258184 -2.738429 1.442518 1.185237
99 -0.333820 1.283771 -0.023755 0.710814
[100 rows x 4 columns]
Convert to a dense format:
0 1 2 3
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
.. ... ... ... ...
95 -0.864730 0.077271 2.986417 1.088242
96 1.552542 0.877490 -0.443526 1.043148
97 0.865025 -0.083762 -1.278229 -1.246637
98 0.258184 -2.738429 1.442518 1.185237
99 -0.333820 1.283771 -0.023755 0.710814
[100 rows x 4 columns]