FileSource  type: file#

class lumen.sources.base.FileSource(**params)#

FileSource loads CSV, Excel and Parquet files using pandas and dask read_* functions.

The FileSource can declare a list or dictionary of local or remote files which are then loaded using either pandas.read_* or dask.dataframe.read_* functions depending on whether use_dask is enabled.



type: bool
default: False
Whether to return a Dask dataframe.


type: dict
default: None
Keyword arguments to the pandas/dask loading function.


type: list | dict
default: None
List or dictionary of tables to load. If a list is supplied thenames are computed from the filenames, otherwise the keys arethe names. The values must filepaths or URLs to the data:{    'local' : '/home/user/local_file.csv',    'remote': ''}if the filepath does not have a declared extension an extensionmay be provided in a list or tuple, e.g.:{'table': ['', 'json']}


type: bool
default: True
Whether to use dask to load files.


FileSource.clear_cache(*events: Event)#

Clears any cached data.

FileSource.get(table: str, **query) DataFrame#

Return a table; optionally filtered by the given query.

  • table (str) – The name of the table to query

  • query (dict) – A dictionary containing all the query parameters


A DataFrame containing the queried table.

Return type:


FileSource.get_schema(table: str | None = None) Dict[str, Dict[str, Any]] | Dict[str, Any]#

Returns JSON schema describing the tables returned by the Source.


table (str or None) – The name of the table to return the schema for. If None returns schema for all available tables.


JSON schema(s) for one or all the tables.

Return type:


FileSource.get_tables() List[str]#

Returns the list of tables available on this source.


The list of available tables on this source.

Return type:


FileSource.to_spec(context: Dict[str, Any] | None = None) Dict[str, Any]#

Exports the full specification to reconstruct this component.

Return type:

Resolved and instantiated Component object