Load JSON String into Pandas DataFrame


Introduction

Understanding, purifying, and manipulating data to get insightful knowledge and make wise judgements is the art of data science and machine learning. This work is made simpler by Python's strong modules like Pandas and json. JSON, which stands for JavaScript Object Notation, is a well-liked web data exchange standard. On the other hand, Pandas DataFrames offer an effective structure for storing and modifying tabular data in Python.

This article offers a thorough tutorial, replete with useful examples, on how to import JSON strings into a Pandas DataFrame.

Prerequisites

Make sure your Python environment has both the Pandas and json libraries installed. Using pip, you can install them:

pip install pandas

Loading JSON Strings into Pandas DataFrame

Example 1: Loading Simple JSON String

Let's begin with a straightforward JSON string. The JSON text will first be transformed into a Python dictionary using the json module in Python before being loaded into a DataFrame.

import pandas as pd
import json

# JSON string
json_string = '{"name": "John", "age": 30, "city": "New York"}'

# Convert JSON string to Python dictionary
data = json.loads(json_string)

# Convert dictionary to DataFrame
df = pd.DataFrame(data, index=[0])
print(df)

Output

  name   age      city
0  John   30  New York

Example 2: Loading JSON String with Multiple Objects

Let's now handle a JSON string with many objects in it. Each row in the DataFrame in this case corresponds to one of the objects in the JSON text.

import pandas as pd
import json

# JSON string
json_string = '[{"name": "John", "age": 30, "city": "New York"},{"name": "Jane", "age": 25, "city": "Chicago"}]'

# Convert JSON string to Python list of dictionaries
data = json.loads(json_string)

# Convert list of dictionaries to DataFrame
df = pd.DataFrame(data)
print(df)

Output

   name  age      city
0  John   30  New York
1  Jane   25   Chicago

Example 3: Loading Nested JSON String

Nested JSON strings require slightly more complicated handling. It is possible to think of each nested object as a separate DataFrame that may be combined with the primary DataFrame.

import pandas as pd
import json

# Nested JSON string
json_string = '{"employee":{"name": "John", "age": 30, "city": "New York"}, "company":{"name": "ABC Corp", "location": "USA"}}'

# Convert JSON string to Python dictionary
data = json.loads(json_string)

# Convert each nested dictionary to a DataFrame and merge
df_employee = pd.DataFrame(data['employee'], index=[0])
df_company = pd.DataFrame(data['company'], index=[0])
df = pd.concat([df_employee, df_company], axis=1)
print(df)

Output

   name   age      city     name     location
0  John   30     New York   ABC Corp      USA

Because it is present in both nested dictionaries, the 'name' column appears twice in this DataFrame. Make sure to rename columns correctly to prevent confusion.

Conclusion

When working with web data in Python, one frequent action is loading JSON strings into a Pandas DataFrame. Even complex JSON strings can be efficiently loaded into a DataFrame for additional data analysis and manipulation using Pandas and the json package.

You'll have a strong basis for mining JSON data for insights if you comprehend how to load JSON strings into DataFrames. You can load, handle, and visualise data more efficiently with this expertise, which can increase your productivity as a data scientist, data analyst, or machine learning engineer. These abilities will be useful for many data-driven apps and projects because JSON is a worldwide standard for data transmission on the web.

Updated on: 18-Jul-2023

159 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements