JoinedSource  type: join#

class lumen.sources.base.JoinedSource(*, sources, tables, cache_dir, cache_per_query, cache_with_dask, root, shared, name)#

JoinedSource performs a join on tables from one or more sources.

A JoinedSource applies a join on two or more sources returning new table(s) with data from all sources. It iterates over the tables specification and merges the specified tables from the declared sources on the supplied index.

In this way multiple tables from multiple sources can be merged. Individual tables from sources that should not be joined may also be surfaced by declaring a single source and table in the specification.

As a simple example we may have sources A and B, which contain tables ‘foo’ and ‘bar’ respectively. We now want to merge these tables on column ‘a’ in Table A with column ‘b’ in Table B:

{'new_table': [
  {'source': 'A', 'table': 'foo', 'index': 'a'},
  {'source': 'B', 'table': 'bar', 'index': 'b'}
]}

The joined source will now publish the “new_table” with all columns from tables “foo” and “bar” except for the index column from table “bar”, which was merged with the index column “a” from table “foo”.


Parameters#

sources

type: list | dict
default: None
A dictionary of sources indexed by their assigned name.

tables

type: dict
default: {}
A dictionary with the names of the joined sources as keysand a specification of the source, table and index to mergeon.{"new_table": [    {'source': <source_name>,     'table': <table_name>,     'index': <index_name>    },    {'source': <source_name>,     'table': <table_name>,     'index': <index_name>    },    ...]}


Methods#

JoinedSource.clear_cache()#

Clears any cached data.

JoinedSource.get(table: str, **query) DataFrame#

Return a table; optionally filtered by the given query.

Parameters:
  • table (str) – The name of the table to query

  • query (dict) – A dictionary containing all the query parameters

Returns:

A DataFrame containing the queried table.

Return type:

DataFrame

JoinedSource.get_schema(table: str | None = None, limit: int | None = None) Dict[str, Dict[str, Any]] | Dict[str, Any]#

Returns JSON schema describing the tables returned by the Source.

Parameters:
  • table (str | None) – The name of the table to return the schema for. If None returns schema for all available tables.

  • limit (int | None) – Limits the number of rows considered for the schema calculation

Returns:

JSON schema(s) for one or all the tables.

Return type:

dict

JoinedSource.get_tables() List[str]#

Returns the list of tables available on this source.

Returns:

The list of available tables on this source.

Return type:

list

JoinedSource.to_spec(context: Dict[str, Any] | None = None) Dict[str, Any]#

Exports the full specification to reconstruct this component.

Return type:

Resolved and instantiated Component object