Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas Flatten Nested JSON

Quiz

Flattening nested JSON is a common technique used to simplify semi-structured data for analysis. This process is also called as JSON normalization, it converts complex, nested JSON structures into a flat tabular format. Python's Pandas library provides the json_normalize() method, which simplifies this process by converting nested JSON data into a flat table.

This conversion technique is particularly useful when you need to analyze or manipulate semi-structured JSON data using Pandas DataFrames without additional processing.

In this tutorial, we will explore how to flatten nested JSON data using the pandas.json_normalize() function.

The json_normalize() Method

The pandas.json_normalize() method takes a nested JSON structure and converts it into a flat table, represented as a Pandas DataFrame. It supports customization for handling metadata, prefixes, and more.

Syntax

Following is the syntax of the json_normalize() method −

pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)

Where,

data: The input nested JSON data to be flattened.
record_path: Specifies the path to a nested list of records.
meta: Defines additional fields to include as metadata in the resulting DataFrame.
meta_prefix: Adds a prefix to metadata columns.
record_prefix: Adds a prefix to record columns.
errors: Determines how to handle missing keys in metadata.
sep: A string that defines the separator for nested names in the resulting DataFrame.
max_level: Controls how deeply nested data should be flattened.

Example

Let's see an example of flattening the nested JSON using Pandas. In this example a nested JSON data is converted into a Pandas DataFrame using the json_normalize() method and represented as a flat table.

import pandas as pd

# Sample nested JSON data
data = [
    {"person_id": 1, "Details": {"name": "Kiran", "age": 18, "gender": 'male'}},
    {"person_id": 2, "Details": {"name": "Riya", "age": 22, "gender": 'female'}},
]

# Flatten nested JSON
result = pd.json_normalize(data)
print('Flattened DataFrame from nested JSON data:')
print(result)

When we run above program, it produces following result −

Flattened DataFrame from nested JSON data:

	person_id	Details.name	Details.age	Details.gender
0	1	Kiran	18	male
1	2	Riya	22	female

Flattening a Nested JSON with Limited Levels

If you want to flatten up to a specific depth in a nested JSON you can use the max_level parameter in the json_normalize() method, which allows you to control this. For example, if you set the max_level=1, which means that only the first level of nesting is flattened.

Example

Following is the example that demonstrates flattening a nested JSON data with specified depth using the max_level parameter in the json_normalize() method.

import pandas as pd

# Input nested JSON data
data = [
    {"person_id": 1, "Details": {"name": "Kiran", "age": 18, "Gender": 'male'}},
    {"person_id": 2, "Details": {"name": "Riya", "age": 22, "Gender": 'female'}},
]

# Flatten nested JSON with specified depth
result1 = pd.json_normalize(data, max_level=0)

print('Flattened DataFrame with max_level=0:')
print(result1)

result2 = pd.json_normalize(data, max_level=1)

print('\nFlattened DataFrame with max_level=1:')
print(result2)

Following is an output of the above code −

Flattened DataFrame with max_level=0:

	person_id	Details
0	1	{'name': 'Kiran', 'age': 18, 'Gender': 'male'}
1	2	{'name': 'Riya', 'age': 22, 'Gender': 'female'}

Flattened DataFrame with max_level=1:

	person_id	Details.name	Details.age	Details.gender
0	1	Kiran	18	male
1	2	Riya	22	female

Flattening a Nested JSON with Metadata

To include additional metadata in the flattened DataFrame, you can use the meta parameter of the json_normalize() method. This is useful for adding context to the data, such as state or other identifiers.

Example

This example shows how to flatten the nested JSON data with the additional meta information using the meta parameter.

import pandas as pd

# Input nested JSON data
data = [
    {"person_id": 1, "Details": {"name": "Kiran", "age": 18, "Gender": 'male'},
     "profession": [{'field': "Doctor", "salary": 90000}]},
    {"person_id": 2, "Details": {"name": "Riya", "age": 22, "Gender": 'female'},
     "profession": [{'field': "Teacher", "salary": 20000}]}
]

# Flatten nested JSON with metadata
result = pd.json_normalize(data, "profession", ["person_id", ["Details", "name"], ["Details", "age"]])

print('Flattened DataFrame with metadata:')
print(result)

Output of the above code is as follows −

Flattened DataFrame with metadata:

	field	salary	person_id	Details.name	Details.age
0	Doctor	90000	1	Kiran	18
1	Teacher	20000	2	Riya	22

Adding Prefixes While Flattening a Nested JSON

To avoid column name conflicts or customize the flattened data column names, you can use the record_prefix and meta_prefix parameters.

Example

This example demonstrates flattening the nested JSON data into a Pandas DataFrame with added prefixes for the column names using the record_prefix parameter.

import pandas as pd

# Input nested JSON data
data = [
    {
        "product": "Laptop",
        "specs": {"processor": "Intel i5", "RAM": "8GB"},
    },
    {
        "product": "Tablet",
        "specs": {"processor": "ARM", "RAM": "4GB"},
    }
]

# Adding Prefixes to Column Names While Flattening 
result = pd.json_normalize(data, max_level=1, record_prefix="Specs_")


print('Flatten DataFrame with added prefixes:')
print(result)

When we run above program, it produces following result −

Flatten DataFrame with added prefixes:

	product	specs.processor	specs.RAM
0	Laptop	Intel i5	8GB
1	Tablet	ARM	4GB

Customizing the Record Names while Flattening

The sep parameter in the json_normalize() method, allows you to customize the separator for nested field names. Which is useful for customizing the column names with the specified separator.

Example

This example shows how to flatten the nested JSON data with the customizing the column names with a specified separator using the sep parameter.

import pandas as pd

# Input nested JSON data
import pandas as pd

# Input nested JSON data
data = [
    {"person_id": 1, "Details": {"name": "Kiran", "age": 18, "Gender": 'male'},
     "profession": [{'field': "Doctor", "salary": 90000}]},
    {"person_id": 2, "Details": {"name": "Riya", "age": 22, "Gender": 'female'},
     "profession": [{'field': "Teacher", "salary": 20000}]}
]

# Customizing Separator
result = pd.json_normalize(data, "profession", ["person_id", ["Details", "name"], ["Details", "age"]], sep="@")

print('Flatten DataFrame with custom separator:')
print(result)

Output of the above code is as follows −

Flatten DataFrame with custom separator:

	field	salary	person_id	Details@name	Details@age
0	Doctor	90000	1	Kiran	18
1	Teacher	20000	2	Riya	22

Print Page