- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas Flatten Nested JSON
Flattening nested JSON is a common technique used to simplify semi-structured data for analysis. This process is also called as JSON normalization, it converts complex, nested JSON structures into a flat tabular format. Python's Pandas library provides the json_normalize() method, which simplifies this process by converting nested JSON data into a flat table.
This conversion technique is particularly useful when you need to analyze or manipulate semi-structured JSON data using Pandas DataFrames without additional processing.
In this tutorial, we will explore how to flatten nested JSON data using the pandas.json_normalize() function.
The json_normalize() Method
The pandas.json_normalize() method takes a nested JSON structure and converts it into a flat table, represented as a Pandas DataFrame. It supports customization for handling metadata, prefixes, and more.
Syntax
Following is the syntax of the json_normalize() method −
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)
Where,
data: The input nested JSON data to be flattened.
record_path: Specifies the path to a nested list of records.
meta: Defines additional fields to include as metadata in the resulting DataFrame.
meta_prefix: Adds a prefix to metadata columns.
record_prefix: Adds a prefix to record columns.
errors: Determines how to handle missing keys in metadata.
sep: A string that defines the separator for nested names in the resulting DataFrame.
max_level: Controls how deeply nested data should be flattened.
Example
Let's see an example of flattening the nested JSON using Pandas. In this example a nested JSON data is converted into a Pandas DataFrame using the json_normalize() method and represented as a flat table.
import pandas as pd
# Sample nested JSON data
data = [
{"person_id": 1, "Details": {"name": "Kiran", "age": 18, "gender": 'male'}},
{"person_id": 2, "Details": {"name": "Riya", "age": 22, "gender": 'female'}},
]
# Flatten nested JSON
result = pd.json_normalize(data)
print('Flattened DataFrame from nested JSON data:')
print(result)
When we run above program, it produces following result −
Flattened DataFrame from nested JSON data:
| person_id | Details.name | Details.age | Details.gender | |
|---|---|---|---|---|
| 0 | 1 | Kiran | 18 | male |
| 1 | 2 | Riya | 22 | female |
Flattening a Nested JSON with Limited Levels
If you want to flatten up to a specific depth in a nested JSON you can use the max_level parameter in the json_normalize() method, which allows you to control this. For example, if you set the max_level=1, which means that only the first level of nesting is flattened.
Example
Following is the example that demonstrates flattening a nested JSON data with specified depth using the max_level parameter in the json_normalize() method.
import pandas as pd
# Input nested JSON data
data = [
{"person_id": 1, "Details": {"name": "Kiran", "age": 18, "Gender": 'male'}},
{"person_id": 2, "Details": {"name": "Riya", "age": 22, "Gender": 'female'}},
]
# Flatten nested JSON with specified depth
result1 = pd.json_normalize(data, max_level=0)
print('Flattened DataFrame with max_level=0:')
print(result1)
result2 = pd.json_normalize(data, max_level=1)
print('\nFlattened DataFrame with max_level=1:')
print(result2)
Following is an output of the above code −
Flattened DataFrame with max_level=0:
| person_id | Details | |
|---|---|---|
| 0 | 1 | {'name': 'Kiran', 'age': 18, 'Gender': 'male'} |
| 1 | 2 | {'name': 'Riya', 'age': 22, 'Gender': 'female'} |
| person_id | Details.name | Details.age | Details.gender | |
|---|---|---|---|---|
| 0 | 1 | Kiran | 18 | male |
| 1 | 2 | Riya | 22 | female |
Flattening a Nested JSON with Metadata
To include additional metadata in the flattened DataFrame, you can use the meta parameter of the json_normalize() method. This is useful for adding context to the data, such as state or other identifiers.
Example
This example shows how to flatten the nested JSON data with the additional meta information using the meta parameter.
import pandas as pd
# Input nested JSON data
data = [
{"person_id": 1, "Details": {"name": "Kiran", "age": 18, "Gender": 'male'},
"profession": [{'field': "Doctor", "salary": 90000}]},
{"person_id": 2, "Details": {"name": "Riya", "age": 22, "Gender": 'female'},
"profession": [{'field': "Teacher", "salary": 20000}]}
]
# Flatten nested JSON with metadata
result = pd.json_normalize(data, "profession", ["person_id", ["Details", "name"], ["Details", "age"]])
print('Flattened DataFrame with metadata:')
print(result)
Output of the above code is as follows −
Flattened DataFrame with metadata:
| field | salary | person_id | Details.name | Details.age | |
|---|---|---|---|---|---|
| 0 | Doctor | 90000 | 1 | Kiran | 18 |
| 1 | Teacher | 20000 | 2 | Riya | 22 |
Adding Prefixes While Flattening a Nested JSON
To avoid column name conflicts or customize the flattened data column names, you can use the record_prefix and meta_prefix parameters.
Example
This example demonstrates flattening the nested JSON data into a Pandas DataFrame with added prefixes for the column names using the record_prefix parameter.
import pandas as pd
# Input nested JSON data
data = [
{
"product": "Laptop",
"specs": {"processor": "Intel i5", "RAM": "8GB"},
},
{
"product": "Tablet",
"specs": {"processor": "ARM", "RAM": "4GB"},
}
]
# Adding Prefixes to Column Names While Flattening
result = pd.json_normalize(data, max_level=1, record_prefix="Specs_")
print('Flatten DataFrame with added prefixes:')
print(result)
When we run above program, it produces following result −
Flatten DataFrame with added prefixes:
| product | specs.processor | specs.RAM | |
|---|---|---|---|
| 0 | Laptop | Intel i5 | 8GB |
| 1 | Tablet | ARM | 4GB |
Customizing the Record Names while Flattening
The sep parameter in the json_normalize() method, allows you to customize the separator for nested field names. Which is useful for customizing the column names with the specified separator.
Example
This example shows how to flatten the nested JSON data with the customizing the column names with a specified separator using the sep parameter.
import pandas as pd
# Input nested JSON data
import pandas as pd
# Input nested JSON data
data = [
{"person_id": 1, "Details": {"name": "Kiran", "age": 18, "Gender": 'male'},
"profession": [{'field': "Doctor", "salary": 90000}]},
{"person_id": 2, "Details": {"name": "Riya", "age": 22, "Gender": 'female'},
"profession": [{'field': "Teacher", "salary": 20000}]}
]
# Customizing Separator
result = pd.json_normalize(data, "profession", ["person_id", ["Details", "name"], ["Details", "age"]], sep="@")
print('Flatten DataFrame with custom separator:')
print(result)
Output of the above code is as follows −
Flatten DataFrame with custom separator:
| field | salary | person_id | Details@name | Details@age | |
|---|---|---|---|---|---|
| 0 | Doctor | 90000 | 1 | Kiran | 18 |
| 1 | Teacher | 20000 | 2 | Riya | 22 |