Python Pandas Flatten Nested JSON



Flattening nested JSON is a common technique used to simplify semi-structured data for analysis. This process is also called as JSON normalization, it converts complex, nested JSON structures into a flat tabular format. Python's Pandas library provides the json_normalize() method, which simplifies this process by converting nested JSON data into a flat table.

This conversion technique is particularly useful when you need to analyze or manipulate semi-structured JSON data using Pandas DataFrames without additional processing.

In this tutorial, we will explore how to flatten nested JSON data using the pandas.json_normalize() function.

The json_normalize() Method

The pandas.json_normalize() method takes a nested JSON structure and converts it into a flat table, represented as a Pandas DataFrame. It supports customization for handling metadata, prefixes, and more.

Syntax

Following is the syntax of the json_normalize() method −

pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)

Where,

  • data: The input nested JSON data to be flattened.

  • record_path: Specifies the path to a nested list of records.

  • meta: Defines additional fields to include as metadata in the resulting DataFrame.

  • meta_prefix: Adds a prefix to metadata columns.

  • record_prefix: Adds a prefix to record columns.

  • errors: Determines how to handle missing keys in metadata.

  • sep: A string that defines the separator for nested names in the resulting DataFrame.

  • max_level: Controls how deeply nested data should be flattened.

Example

Let's see an example of flattening the nested JSON using Pandas. In this example a nested JSON data is converted into a Pandas DataFrame using the json_normalize() method and represented as a flat table.

import pandas as pd

# Sample nested JSON data
data = [
    {"person_id": 1, "Details": {"name": "Kiran", "age": 18, "gender": 'male'}},
    {"person_id": 2, "Details": {"name": "Riya", "age": 22, "gender": 'female'}},
]

# Flatten nested JSON
result = pd.json_normalize(data)
print('Flattened DataFrame from nested JSON data:')
print(result)

When we run above program, it produces following result −

Flattened DataFrame from nested JSON data:
person_id Details.name Details.age Details.gender
0 1 Kiran 18 male
1 2 Riya 22 female

Flattening a Nested JSON with Limited Levels

If you want to flatten up to a specific depth in a nested JSON you can use the max_level parameter in the json_normalize() method, which allows you to control this. For example, if you set the max_level=1, which means that only the first level of nesting is flattened.

Example

Following is the example that demonstrates flattening a nested JSON data with specified depth using the max_level parameter in the json_normalize() method.

import pandas as pd

# Input nested JSON data
data = [
    {"person_id": 1, "Details": {"name": "Kiran", "age": 18, "Gender": 'male'}},
    {"person_id": 2, "Details": {"name": "Riya", "age": 22, "Gender": 'female'}},
]

# Flatten nested JSON with specified depth
result1 = pd.json_normalize(data, max_level=0)

print('Flattened DataFrame with max_level=0:')
print(result1)

result2 = pd.json_normalize(data, max_level=1)

print('\nFlattened DataFrame with max_level=1:')
print(result2)

Following is an output of the above code −

Flattened DataFrame with max_level=0:
 
person_id Details
0 1 {'name': 'Kiran', 'age': 18, 'Gender': 'male'}
1 2 {'name': 'Riya', 'age': 22, 'Gender': 'female'}
Flattened DataFrame with max_level=1:
person_id Details.name Details.age Details.gender
0 1 Kiran 18 male
1 2 Riya 22 female

Flattening a Nested JSON with Metadata

To include additional metadata in the flattened DataFrame, you can use the meta parameter of the json_normalize() method. This is useful for adding context to the data, such as state or other identifiers.

Example

This example shows how to flatten the nested JSON data with the additional meta information using the meta parameter.

import pandas as pd

# Input nested JSON data
data = [
    {"person_id": 1, "Details": {"name": "Kiran", "age": 18, "Gender": 'male'},
     "profession": [{'field': "Doctor", "salary": 90000}]},
    {"person_id": 2, "Details": {"name": "Riya", "age": 22, "Gender": 'female'},
     "profession": [{'field': "Teacher", "salary": 20000}]}
]

# Flatten nested JSON with metadata
result = pd.json_normalize(data, "profession", ["person_id", ["Details", "name"], ["Details", "age"]])

print('Flattened DataFrame with metadata:')
print(result)

Output of the above code is as follows −

Flattened DataFrame with metadata:
field salary person_id Details.name Details.age
0 Doctor 90000 1 Kiran 18
1 Teacher 20000 2 Riya 22

Adding Prefixes While Flattening a Nested JSON

To avoid column name conflicts or customize the flattened data column names, you can use the record_prefix and meta_prefix parameters.

Example

This example demonstrates flattening the nested JSON data into a Pandas DataFrame with added prefixes for the column names using the record_prefix parameter.

import pandas as pd

# Input nested JSON data
data = [
    {
        "product": "Laptop",
        "specs": {"processor": "Intel i5", "RAM": "8GB"},
    },
    {
        "product": "Tablet",
        "specs": {"processor": "ARM", "RAM": "4GB"},
    }
]

# Adding Prefixes to Column Names While Flattening 
result = pd.json_normalize(data, max_level=1, record_prefix="Specs_")


print('Flatten DataFrame with added prefixes:')
print(result)

When we run above program, it produces following result −

Flatten DataFrame with added prefixes:
product specs.processor specs.RAM
0 Laptop Intel i5 8GB
1 Tablet ARM 4GB

Customizing the Record Names while Flattening

The sep parameter in the json_normalize() method, allows you to customize the separator for nested field names. Which is useful for customizing the column names with the specified separator.

Example

This example shows how to flatten the nested JSON data with the customizing the column names with a specified separator using the sep parameter.

import pandas as pd

# Input nested JSON data
import pandas as pd

# Input nested JSON data
data = [
    {"person_id": 1, "Details": {"name": "Kiran", "age": 18, "Gender": 'male'},
     "profession": [{'field': "Doctor", "salary": 90000}]},
    {"person_id": 2, "Details": {"name": "Riya", "age": 22, "Gender": 'female'},
     "profession": [{'field': "Teacher", "salary": 20000}]}
]

# Customizing Separator
result = pd.json_normalize(data, "profession", ["person_id", ["Details", "name"], ["Details", "age"]], sep="@")

print('Flatten DataFrame with custom separator:')
print(result)

Output of the above code is as follows −

Flatten DataFrame with custom separator:
field salary person_id Details@name Details@age
0 Doctor 90000 1 Kiran 18
1 Teacher 20000 2 Riya 22
Advertisements