Build a Pipeline in Python#

What does this guide solve?

Although the primary interface for building a Lumen dashboard is the YAML specification file, this guide shows you an alternate approaches for building with Python. To learn more, visit the Lumen in Python Conceptual Guide.

Overview#

When building with Lumen in Python, the main object that defines a dashboard is the Pipeline. With this Pipeline object, you can specify the data source, filters, and transforms. There are two approaches to add these specifications to a Pipeline object, declaratively or programmatically. While the declarative approach is more compact, the programmatic approach allows you to seperate the pipeline creation steps.

Pipeline Diagram

Declaratively specifying a pipeline#

The declarative specification approach looks similar to a YAML file hierarchy, but consists of nested Python dictionary and list objects.

import lumen as lm

data_url = 'https://datasets.holoviz.org/penguins/v1/penguins.csv'

pipeline = lm.Pipeline.from_spec({
    'source': {
        'type': 'file',
        'tables': {
            'penguins': data_url
        }
    },
    'filters': [
        {'type': 'widget', 'field': 'species'},
        {'type': 'widget', 'field': 'island'},
        {'type': 'widget', 'field': 'sex'},
        {'type': 'widget', 'field': 'year'}
    ],
    'transforms': [
        {'type': 'aggregate', 'method': 'mean', 'by': ['species', 'sex', 'year']}
    ]
})

Programmatically specifying a pipeline#

The programmatic specification approach uses Lumen objects to build the pipeline step by step.

Add source#

First, add a valid Source to your Pipeline. A common choice is FileSource, which can load CSV, Excel, JSON and Parquet files, but see the Source Reference for all options.

from lumen.sources import FileSource

data_url = 'https://datasets.holoviz.org/penguins/v1/penguins.csv'

pipeline = lm.Pipeline(source=FileSource(tables={'penguins': data_url}), table='penguins')

Preview the data

At any point after defining the source in your pipeline, you can inspect the data in a notebook with pipeline.data

pipeline.data.head()

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	male	2007
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	female	2007
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	female	2007
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN	2007
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	female	2007

Add filter#

Next, you can add widgets for certain columns of your source. When displaying the dashboard, these widgets will allows your dashboard users to filter the data. See the Filter Reference for all options.

pipeline.add_filter('widget', field='species')
pipeline.add_filter('widget', field='island')
pipeline.add_filter('widget', field='sex')
pipeline.add_filter('widget', field='year')

Add transform#

Now you can apply a transform to the data, such as computing the mean or selecting certain columns. See the Transform Reference for more.

columns = ['species', 'island', 'sex', 'year', 'bill_length_mm', 'bill_depth_mm']

pipeline.add_transform('columns', columns=columns)

pipeline.data.head()

	species	island	sex	year	bill_length_mm	bill_depth_mm
0	Adelie	Torgersen	male	2007	39.1	18.7
1	Adelie	Torgersen	female	2007	39.5	17.4
2	Adelie	Torgersen	female	2007	40.3	18.0
3	Adelie	Torgersen	NaN	2007	NaN	NaN
4	Adelie	Torgersen	female	2007	36.7	19.3

Manually update dashboard

By default, every interaction will update the dashboard. If this behavior is unwanted, for instance, if you want to select multiple filter widgets and not have the dashboard update after every individual selection, set auto_update=False on the Pipeline. This will require you to manually trigger an update by clicking a button.

Display the pipeline#

Once you have built your pipeline it is extremely easy to view it interactively. As long as you have loaded the Panel extension with pn.extension('tabulator') simply displaying a pipeline in a notebook cell will render it:

import panel as pn

pn.extension('tabulator')

pipeline

If you are working in a local REPL or from a script you can also use pipeline.show() to preview it.

You can also easily render the control panel containing the filter widgets and variables separately:

pipeline.control_panel