How to convert pandas DataFrame into JSON in Python?


Pandas is a popular Python library for data manipulation and analysis. A common task in working with Pandas is to convert a DataFrame into a JSON (JavaScript Object Notation) format, which is a lightweight data interchange format widely used in web applications. The conversion from pandas DataFrame to JSON can be useful for data sharing, data storage, and data transfer between different programming languages.

In this tutorial, we will discuss how to convert a pandas DataFrame to JSON using built-in Pandas functions, explore different options and parameters for the conversion, and provide examples of how to handle specific scenarios.

Converting Pandas DataFrame into JSON

In Python's pandas library, we can utilise the DataFrame.to_json() function to transform pandas DataFrames into JSON format. This function offers various customizations that allow us to obtain the desired JSON formats. In the upcoming sections, we will delve into the accepted parameters of this function and examine the customization options in more detail.

Here are some of the important parameters and their possible values that can be used in the DataFrame.to_json() function for converting a pandas DataFrame to JSON −

  • path_or_buf − The output location where the resulting JSON will be saved. It can be a file path or a buffer object. The default value is None.

  • orient − The format of the resulting JSON. Possible values are 'split', 'records', 'index', 'columns', and 'values'. The default value is 'columns'.

  • date_format − The date format used for date-like columns in the DataFrame. It can be any valid datetime format string. The default value is None.

  • double_precision − The precision of floating-point numbers in the JSON. It can be an integer value that specifies the number of decimal places to include. The default value is 10.

  • force_ascii − Whether to encode non-ASCII characters as their Unicode escape sequences. The default value is True.

  • date_unit − The unit of the timestamp in date-like columns. Possible values are 's' for seconds, 'ms' for milliseconds, and 'us' for microseconds. The default value is 'ms'.

Let's examine a few examples to gain a better understanding of how the DataFrame.to_json() function is used.

Example 1: Basic Usage

Consider the code shown below. In this code, we create a 2×2 NumPy array called array_data, containing four string values. We then convert this array into a pandas DataFrame called df, with column names 'col1' and 'col2'. Finally, we use the to_json() function to convert the DataFrame into a JSON string, which we print to the console using the print() function.

import numpy as np
import pandas as pd

# create a NumPy array with two rows and two columns
array_data = np.array([['1', '2'], ['3', '4']])

# convert the NumPy array into a pandas DataFrame with column names
df = pd.DataFrame(array_data, columns=['col1', 'col2'])

# convert the DataFrame to a JSON string
json_data = df.to_json()

# print the resulting JSON string
print(json_data)

Output

On execution, it will produce the following output:

{"col1":{"0":"1","1":"3"},"col2":{"0":"2","1":"4"}}

Example 2: Converting DataFrame to JSON

Now look at an example of how to use these parameters in the DataFrame.to_json() function to convert a pandas DataFrame to JSON.

Consider the code shown below. In this example, we set the path_or_buf parameter to 'output.json' to save the JSON data to a file named 'output.json'. We set the orient parameter to 'records' to format the JSON as a list of records.

We also set the date_format parameter to 'iso' to use the ISO date format for date-like columns, and the double_precision parameter to 2 to include two decimal places for floating-point numbers. Finally, we set force_ascii to False to preserve non-ASCII characters, and date_unit to 'ms' to use milliseconds as the unit for timestamps.

import pandas as pd

# create a sample DataFrame
df = pd.DataFrame({
   'Name': ['John', 'Jane', 'Bob'],
   'Age': [30, 25, 40],
   'Salary': [50000.0, 60000.0, 70000.0],
   'Join_date': ['2022-01-01', '2021-06-15', '2020-11-30']
})

# convert the DataFrame to JSON
json_data = df.to_json(
   path_or_buf='output.json',
   orient='records',
   date_format='iso',
   double_precision=2,
   force_ascii=False,
   date_unit='ms'
)

# print the resulting JSON
print(json_data)

Output:

On execution, it will create a new file named "output.json" and the contents of the file is shown below:

[
   { "Name": "John", "Age": 30, "Salary": 50000.0, "Join_date": "2022-01-01" },
   { "Name": "Jane", "Age": 25, "Salary": 60000.0, "Join_date": "2021-06-15" },
   { "Name": "Bob", "Age": 40, "Salary": 70000.0, "Join_date": "2020-11-30" }
]

Conclusion

In conclusion, converting a pandas DataFrame to JSON format in Python is a straightforward process that can be accomplished using the to_json() method provided by the pandas library.

This method allows for a variety of customizations, such as specifying the JSON output format, date formatting, and precision. It also provides the ability to write the resulting JSON string to a file, making it easy to share data with other systems. With a basic understanding of the to_json() method and its parameters, you can easily convert your pandas DataFrames to JSON format for use in a wide range of applications

Updated on: 18-Apr-2023

813 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements