Sankey diagrams are nice. Plotly is also nice, but at least for me it was a little challenging to understand how it is used as the documentation is not very user friendly especially if you want to position labels yourself. Below is a "beautiful" example of a Sankey diagram. It depicts the beer cycle. The color scheme was chosen just for demonstration purposes. The purpose of this page is to show how you can force Plotly to put labels where you want them to be placed and how to specify colors for each connection.

Labels

I reckon the easiest way to prepare data is to use Libreoffice/Excel (example) and to create two sheets: the first one contains a legend (the python script at the end of the page assumes it is named "Legend") that maps human readable names such as "beer" to indeces used by Plotly. The gotcha here is, at least for those that don't think like programmers, that the index numbering starts from 0 instead of 1. This is also the place where you specify label positions. The numbering scheme runs from 0.01 to 1 as far as I know. Clipping of flows may occur if your labels are too close to the edges.

LabelsIndexXY
Water00.10.2
Hops10.10.5
Wort20.20.2
Yeast30.20.5
Beer40.330.2
Other Energy50.330.5
Me60.50.2
Wastewater treatment plant70.80.2
CO280.80.5
Atmosphere90.950.5
Compost100.950.2

Connections

Name the second sheet "Connections" and describe connections between labels. The easiest way is to use human readable labels and then use the vlookup fuction to pull indices for these labels from the Legend sheet. Thickness is the thickness of the flow line and color specifies its color in "red, green, blue, transparency" format where the color range is 0...255 and transparency 0...1 where 0 is transparent and 1 is solid. It is a good idea to use some transparency as flow lines often overlap.

FromTextToTextFromToThicknessColor
WaterWort021rgba(51,195,225,0.9)
HopsWort121rgba(12,49,7,0.9)
WortBeer242rgba(142,96,29,0.9)
YeastBeer341rgba(61,195,223,0.9)
BeerMe463rgba(97,235,185,0.9)
Other energyMe561rgba(251,177,202,0.9)
MeCO2681rgba(49,85,228,0.9)
MeWastewater treatment plant673rgba(75,188,156,0.9)
CO2Atmosphere891rgba(162,20,173,0.9)
AtmosphereOther energy951rgba(205,115,240,0.9)
Wastewater treatment plantWater701rgba(131,234,185,0.9)
AtmosphereHops911rgba(153,30,254,0.9)
Wastewater treatment plantAtmosphere791rgba(17,42,17,0.9)
Wastewater treatment plantCompost7101rgba(147,4,195,0.9)
CompostHops1011rgba(76,159,47,0.9)
Wastewater treatment plantHops711rgba(234,173,63,0.9)

The code

The python script below pulls data from and .ods file (Libreoffice) from sheets "Legend" and "Connections" and spits out a Sankey diagram similar to that on the top of this page. The script requires plotly, pandas and pandas_ods_reader modules. All of these can be installed using pip.

import plotly.graph_objects as go
import pandas as pd
from pandas_ods_reader import read_ods


path = "/path/to/your/sourcefile.ods"

label_sheet = "Legend"

connections = "Connections"

df_labels = read_ods(path, label_sheet)
df_connections = read_ods(path, connections)


convert_dict = {'From': int,
                'To': int,
                'Thickness': int }  

df = df_connections.astype(convert_dict)

labels = df_labels["Labels"].values.tolist()

source = df["From"].values.tolist()
target = df["To"].values.tolist()
color = df["Color"].values.tolist()
value = df["Thickness"].values.tolist()

link = dict(source=source, target=target, value=value, color=color)
node = dict(label=labels, pad=15, thickness=1, color="red", x=df_labels["X"].values.tolist(), y=df_labels["Y"].values.tolist())

fig = go.Figure(data=[go.Sankey(arrangement='fixed', link=link, node=node)])

fig.show()

Last updated on 21 April 2023.