Build a Pipeline in Python#

What does this guide solve?

Although the primary interface for building a Lumen dashboard is the YAML specification file, this guide shows you an alternate approaches for building with Python. To learn more, visit the Lumen in Python Conceptual Guide.

Overview#

When building with Lumen in Python, the main object that defines a dashboard is the Pipeline. With this Pipeline object, you can specify the data source, filters, and transforms. There are two approaches to add these specifications to a Pipeline object, declaratively or programmatically. While the declarative approach is more compact, the programmatic approach allows you to seperate the pipeline creation steps.

Pipeline Diagram

Declaratively specifying a pipeline#

The declarative specification approach looks similar to a YAML file hierarchy, but consists of nested Python dictionary and list objects.

import lumen as lm

data_url = 'https://datasets.holoviz.org/penguins/v1/penguins.csv'

pipeline = lm.Pipeline.from_spec({
    'source': {
        'type': 'file',
        'tables': {
            'penguins': data_url
        }
    },
    'filters': [
        {'type': 'widget', 'field': 'species'},
        {'type': 'widget', 'field': 'island'},
        {'type': 'widget', 'field': 'sex'},
        {'type': 'widget', 'field': 'year'}
    ],
    'transforms': [
        {'type': 'aggregate', 'method': 'mean', 'by': ['species', 'sex', 'year']}
    ]
})

Programmatically specifying a pipeline#

The programmatic specification approach uses Lumen objects to build the pipeline step by step.

Add source#

First, add a valid Source to your Pipeline. A common choice is FileSource, which can load CSV, Excel, JSON and Parquet files, but see the Source Reference for all options.

from lumen.sources import FileSource

data_url = 'https://datasets.holoviz.org/penguins/v1/penguins.csv'

pipeline = lm.Pipeline(source=FileSource(tables={'penguins': data_url}), table='penguins')

Preview the data

At any point after defining the source in your pipeline, you can inspect the data in a notebook with pipeline.data

pipeline.data.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007

Add filter#

Next, you can add widgets for certain columns of your source. When displaying the dashboard, these widgets will allows your dashboard users to filter the data. See the Filter Reference for all options.

pipeline.add_filter('widget', field='species')
pipeline.add_filter('widget', field='island')
pipeline.add_filter('widget', field='sex')
pipeline.add_filter('widget', field='year')

Add transform#

Now you can apply a transform to the data, such as computing the mean or selecting certain columns. See the Transform Reference for more.

columns = ['species', 'island', 'sex', 'year', 'bill_length_mm', 'bill_depth_mm']

pipeline.add_transform('columns', columns=columns)

pipeline.data.head()
species island sex year bill_length_mm bill_depth_mm
0 Adelie Torgersen male 2007 39.1 18.7
1 Adelie Torgersen female 2007 39.5 17.4
2 Adelie Torgersen female 2007 40.3 18.0
3 Adelie Torgersen NaN 2007 NaN NaN
4 Adelie Torgersen female 2007 36.7 19.3

Manually update dashboard

By default, every interaction will update the dashboard. If this behavior is unwanted, for instance, if you want to select multiple filter widgets and not have the dashboard update after every individual selection, set auto_update=False on the Pipeline. This will require you to manually trigger an update by clicking a button.

Display the pipeline#

Once you have built your pipeline it is extremely easy to view it interactively. As long as you have loaded the Panel extension with pn.extension('tabulator') simply displaying a pipeline in a notebook cell will render it:

import panel as pn

pn.extension('tabulator')

pipeline

If you are working in a local REPL or from a script you can also use pipeline.show() to preview it.

You can also easily render the control panel containing the filter widgets and variables separately:

pipeline.control_panel