DerivedSource  type: derived#

class lumen.sources.base.DerivedSource(*, filters, source, tables, transforms, cache_dir, cache_per_query, cache_with_dask, root, shared, name)#

DerivedSource applies filtering and transforms to tables from other sources.

A DerivedSource references tables on other sources and optionally allows applying filters and transforms to the returned data which is then made available as a new (derived) table.

The DerivedSource has two modes:

Table Mode

When an explicit tables specification is provided full control over the exact tables to filter and transform is available. This is referred to as the ‘table’ mode.

In ‘table’ mode the tables can reference any table on any source using the reference syntax and declare filters and transforms to apply to that specific table, e.g. a table specification might look like this:

{
  'derived_table': {
    'source': 'original_source',
    'table': 'original_table'
    'filters': [
      ...
    ],
    'transforms': [
      ...
    ]
  }
}

Mirror mode

When a source is declared all tables on that Source are mirrored and filtered and transformed according to the supplied filters and transforms. This is referred to as ‘mirror’ mode.

In mirror mode the DerivedSource may reference an existing source directly, e.g.:

{
    'type': 'derived',
    'source': 'original_source',
    'filters': [...],
    'transforms': [...],
}

Parameters#

filters

type: list[Any]
default: []
A list of filters to apply to all tables of this source.

source

type: lumen.Source
default: None
A source to mirror the tables on.

tables

type: dict
default: {}
The dictionary of tables and associated filters and transforms.

transforms

type: list[Any]
default: []
A list of transforms to apply to all tables of this source.


Methods#

DerivedSource.clear_cache()#

Clears any cached data.

DerivedSource.get(table: str, **query) DataFrame#

Return a table; optionally filtered by the given query.

Parameters:
  • table (str) – The name of the table to query

  • query (dict) – A dictionary containing all the query parameters

Returns:

A DataFrame containing the queried table.

Return type:

DataFrame

DerivedSource.get_schema(table: str | None = None, limit: int | None = None) Dict[str, Dict[str, Any]] | Dict[str, Any]#

Returns JSON schema describing the tables returned by the Source.

Parameters:
  • table (str | None) – The name of the table to return the schema for. If None returns schema for all available tables.

  • limit (int | None) – Limits the number of rows considered for the schema calculation

Returns:

JSON schema(s) for one or all the tables.

Return type:

dict

DerivedSource.get_tables() List[str]#

Returns the list of tables available on this source.

Returns:

The list of available tables on this source.

Return type:

list

DerivedSource.to_spec(context: Dict[str, Any] | None = None) Dict[str, Any]#

Exports the full specification to reconstruct this component.

Return type:

Resolved and instantiated Component object