Python Plotly: How to define the structure of a Sankey diagram using a Pandas dataframe?

A Sankey diagram is used to visualize flow between nodes by defining a "source" and "target" relationship. It effectively represents the movement of objects or data between different points in a system.

In this tutorial, we'll learn how to define the structure of a Sankey diagram using a Pandas dataframe. We will use the plotly.graph_objects module to create interactive flow diagrams.

Required Libraries

First, import the necessary libraries ?

import plotly.graph_objects as go
import pandas as pd

Creating Node Data Structure

Define nodes with their IDs, labels, and colors ?

nodes = [
    ['id', 'label', 'color'],
    [ 0,    'A1',    'blue'],
    [ 1,    'A2',    'green'],
    [ 2,    'B1',    'red'],
    [ 3,    'B2',    'brown'],
    [ 4,    'C1',    'cyan'],
    [ 5,    'C2',    'yellow']
]

print("Node structure:", nodes[:3])  # Show first 3 rows
Node structure: [['id', 'label', 'color'], [0, 'A1', 'blue'], [1, 'A2', 'green']]

Creating Link Data Structure

Define links between nodes with source, target, value, and color ?

links = [
    ['Source', 'Target', 'Value', 'Link Color'],
    [  0,          2,       4,       'grey'],
    [  0,          3,       4,       'grey'],
    [  1,          3,       4,       'grey'],
    [  2,          4,       4,       'grey'],
    [  3,          4,       4,       'grey'],
    [  3,          5,       4,       'grey']
]

print("Link structure:", links[:3])  # Show first 3 rows
Link structure: [['Source', 'Target', 'Value', 'Link Color'], [0, 2, 4, 'grey'], [0, 3, 4, 'grey']]

Converting to DataFrames

Extract headers and create Pandas DataFrames from the data ?

import plotly.graph_objects as go
import pandas as pd

nodes = [
    ['id', 'label', 'color'],
    [ 0,    'A1',    'blue'],
    [ 1,    'A2',    'green'],
    [ 2,    'B1',    'red'],
    [ 3,    'B2',    'brown'],
    [ 4,    'C1',    'cyan'],
    [ 5,    'C2',    'yellow']
]

links = [
    ['Source', 'Target', 'Value', 'Link Color'],
    [  0,          2,       4,       'grey'],
    [  0,          3,       4,       'grey'],
    [  1,          3,       4,       'grey'],
    [  2,          4,       4,       'grey'],
    [  3,          4,       4,       'grey'],
    [  3,          5,       4,       'grey']
]

# Extract headers
nodes_headers = nodes.pop(0)
links_headers = links.pop(0)

# Create DataFrames
df_nodes = pd.DataFrame(nodes, columns=nodes_headers)
df_links = pd.DataFrame(links, columns=links_headers)

print("Nodes DataFrame:")
print(df_nodes)
print("\nLinks DataFrame:")
print(df_links)
Nodes DataFrame:
   id label   color
0   0    A1    blue
1   1    A2   green
2   2    B1     red
3   3    B2   brown
4   4    C1    cyan
5   5    C2  yellow

Links DataFrame:
   Source  Target  Value Link Color
0       0       2      4       grey
1       0       3      4       grey
2       1       3      4       grey
3       2       4      4       grey
4       3       4      4       grey
5       3       5      4       grey

Creating the Sankey Diagram

Build the complete Sankey diagram using the DataFrames ?

import plotly.graph_objects as go
import pandas as pd

# Data setup
nodes = [
    ['id', 'label', 'color'],
    [ 0,    'A1',    'blue'],
    [ 1,    'A2',    'green'],
    [ 2,    'B1',    'red'],
    [ 3,    'B2',    'brown'],
    [ 4,    'C1',    'cyan'],
    [ 5,    'C2',    'yellow']
]

links = [
    ['Source', 'Target', 'Value', 'Link Color'],
    [  0,          2,       4,       'grey'],
    [  0,          3,       4,       'grey'],
    [  1,          3,       4,       'grey'],
    [  2,          4,       4,       'grey'],
    [  3,          4,       4,       'grey'],
    [  3,          5,       4,       'grey']
]

# Create DataFrames
nodes_headers = nodes.pop(0)
links_headers = links.pop(0)

df_nodes = pd.DataFrame(nodes, columns=nodes_headers)
df_links = pd.DataFrame(links, columns=links_headers)

# Create Sankey diagram
fig = go.Figure(data=[go.Sankey(
    node = dict(
        pad = 15,
        thickness = 20,
        line = dict(color = "black", width = 0.5),
        label = df_nodes['label'].dropna(axis=0, how='any'),
        color = df_nodes['color']
    ),
    
    link = dict(
        source = df_links['Source'].dropna(axis=0, how='any'),
        target = df_links['Target'].dropna(axis=0, how='any'),
        value = df_links['Value'].dropna(axis=0, how='any'),
        color = df_links['Link Color'].dropna(axis=0, how='any')
    )
)])

# Update layout
fig.update_layout(
    title_text="DataFrame-Sankey Diagram",
    font_size=10
)

fig.show()

Key Components

Component Purpose Required Fields
Nodes Define diagram elements id, label, color
Links Define flow connections source, target, value
Layout Configure appearance pad, thickness, title

Conclusion

Pandas DataFrames provide an excellent structure for organizing Sankey diagram data. Use separate DataFrames for nodes and links to maintain clean separation of concerns and easy data manipulation.

Updated on: 2026-03-26T22:26:07+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements