DerivedSource type: derived#
- class lumen.sources.base.DerivedSource(*, filters, source, tables, transforms, cache_dir, cache_per_query, cache_with_dask, root, shared, name)#
DerivedSource applies filtering and transforms to tables from other sources.
A DerivedSource references tables on other sources and optionally allows applying filters and transforms to the returned data which is then made available as a new (derived) table.
The DerivedSource has two modes:
Table Mode
When an explicit tables specification is provided full control over the exact tables to filter and transform is available. This is referred to as the ‘table’ mode.
In ‘table’ mode the tables can reference any table on any source using the reference syntax and declare filters and transforms to apply to that specific table, e.g. a table specification might look like this:
{ 'derived_table': { 'source': 'original_source', 'table': 'original_table' 'filters': [ ... ], 'transforms': [ ... ] } }
Mirror mode
When a source is declared all tables on that Source are mirrored and filtered and transformed according to the supplied filters and transforms. This is referred to as ‘mirror’ mode.
In mirror mode the DerivedSource may reference an existing source directly, e.g.:
{ 'type': 'derived', 'source': 'original_source', 'filters': [...], 'transforms': [...], }
Parameters#
type: list[Any]
default: []
A list of filters to apply to all tables of this source.
type: lumen.Source
default: None
A source to mirror the tables on.
type: dict
default: {}
The dictionary of tables and associated filters and transforms.
type: list[Any]
default: []
A list of transforms to apply to all tables of this source.
Methods#
- DerivedSource.clear_cache()#
Clears any cached data.
- DerivedSource.get(table: str, **query) DataFrame #
Return a table; optionally filtered by the given query.
- Parameters:
table (str) – The name of the table to query
query (dict) – A dictionary containing all the query parameters
- Returns:
A DataFrame containing the queried table.
- Return type:
DataFrame
- DerivedSource.get_schema(table: str | None = None, limit: int | None = None) Dict[str, Dict[str, Any]] | Dict[str, Any] #
Returns JSON schema describing the tables returned by the Source.
- Parameters:
table (str | None) – The name of the table to return the schema for. If None returns schema for all available tables.
limit (int | None) – Limits the number of rows considered for the schema calculation
- Returns:
JSON schema(s) for one or all the tables.
- Return type:
dict
- DerivedSource.get_tables() List[str] #
Returns the list of tables available on this source.
- Returns:
The list of available tables on this source.
- Return type:
list
- DerivedSource.to_spec(context: Dict[str, Any] | None = None) Dict[str, Any] #
Exports the full specification to reconstruct this component.
- Return type:
Resolved and instantiated Component object