Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Plotly: How to define the structure of a Sankey diagram using a Pandas dataframe?
A Sankey diagram is used to visualize flow between nodes by defining a "source" and "target" relationship. It effectively represents the movement of objects or data between different points in a system.
In this tutorial, we'll learn how to define the structure of a Sankey diagram using a Pandas dataframe. We will use the plotly.graph_objects module to create interactive flow diagrams.
Required Libraries
First, import the necessary libraries ?
import plotly.graph_objects as go import pandas as pd
Creating Node Data Structure
Define nodes with their IDs, labels, and colors ?
nodes = [
['id', 'label', 'color'],
[ 0, 'A1', 'blue'],
[ 1, 'A2', 'green'],
[ 2, 'B1', 'red'],
[ 3, 'B2', 'brown'],
[ 4, 'C1', 'cyan'],
[ 5, 'C2', 'yellow']
]
print("Node structure:", nodes[:3]) # Show first 3 rows
Node structure: [['id', 'label', 'color'], [0, 'A1', 'blue'], [1, 'A2', 'green']]
Creating Link Data Structure
Define links between nodes with source, target, value, and color ?
links = [
['Source', 'Target', 'Value', 'Link Color'],
[ 0, 2, 4, 'grey'],
[ 0, 3, 4, 'grey'],
[ 1, 3, 4, 'grey'],
[ 2, 4, 4, 'grey'],
[ 3, 4, 4, 'grey'],
[ 3, 5, 4, 'grey']
]
print("Link structure:", links[:3]) # Show first 3 rows
Link structure: [['Source', 'Target', 'Value', 'Link Color'], [0, 2, 4, 'grey'], [0, 3, 4, 'grey']]
Converting to DataFrames
Extract headers and create Pandas DataFrames from the data ?
import plotly.graph_objects as go
import pandas as pd
nodes = [
['id', 'label', 'color'],
[ 0, 'A1', 'blue'],
[ 1, 'A2', 'green'],
[ 2, 'B1', 'red'],
[ 3, 'B2', 'brown'],
[ 4, 'C1', 'cyan'],
[ 5, 'C2', 'yellow']
]
links = [
['Source', 'Target', 'Value', 'Link Color'],
[ 0, 2, 4, 'grey'],
[ 0, 3, 4, 'grey'],
[ 1, 3, 4, 'grey'],
[ 2, 4, 4, 'grey'],
[ 3, 4, 4, 'grey'],
[ 3, 5, 4, 'grey']
]
# Extract headers
nodes_headers = nodes.pop(0)
links_headers = links.pop(0)
# Create DataFrames
df_nodes = pd.DataFrame(nodes, columns=nodes_headers)
df_links = pd.DataFrame(links, columns=links_headers)
print("Nodes DataFrame:")
print(df_nodes)
print("\nLinks DataFrame:")
print(df_links)
Nodes DataFrame: id label color 0 0 A1 blue 1 1 A2 green 2 2 B1 red 3 3 B2 brown 4 4 C1 cyan 5 5 C2 yellow Links DataFrame: Source Target Value Link Color 0 0 2 4 grey 1 0 3 4 grey 2 1 3 4 grey 3 2 4 4 grey 4 3 4 4 grey 5 3 5 4 grey
Creating the Sankey Diagram
Build the complete Sankey diagram using the DataFrames ?
import plotly.graph_objects as go
import pandas as pd
# Data setup
nodes = [
['id', 'label', 'color'],
[ 0, 'A1', 'blue'],
[ 1, 'A2', 'green'],
[ 2, 'B1', 'red'],
[ 3, 'B2', 'brown'],
[ 4, 'C1', 'cyan'],
[ 5, 'C2', 'yellow']
]
links = [
['Source', 'Target', 'Value', 'Link Color'],
[ 0, 2, 4, 'grey'],
[ 0, 3, 4, 'grey'],
[ 1, 3, 4, 'grey'],
[ 2, 4, 4, 'grey'],
[ 3, 4, 4, 'grey'],
[ 3, 5, 4, 'grey']
]
# Create DataFrames
nodes_headers = nodes.pop(0)
links_headers = links.pop(0)
df_nodes = pd.DataFrame(nodes, columns=nodes_headers)
df_links = pd.DataFrame(links, columns=links_headers)
# Create Sankey diagram
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = df_nodes['label'].dropna(axis=0, how='any'),
color = df_nodes['color']
),
link = dict(
source = df_links['Source'].dropna(axis=0, how='any'),
target = df_links['Target'].dropna(axis=0, how='any'),
value = df_links['Value'].dropna(axis=0, how='any'),
color = df_links['Link Color'].dropna(axis=0, how='any')
)
)])
# Update layout
fig.update_layout(
title_text="DataFrame-Sankey Diagram",
font_size=10
)
fig.show()
Key Components
| Component | Purpose | Required Fields |
|---|---|---|
| Nodes | Define diagram elements | id, label, color |
| Links | Define flow connections | source, target, value |
| Layout | Configure appearance | pad, thickness, title |
Conclusion
Pandas DataFrames provide an excellent structure for organizing Sankey diagram data. Use separate DataFrames for nodes and links to maintain clean separation of concerns and easy data manipulation.
