FileSource  type: file#

class lumen.sources.base.FileSource(*, dask, kwargs, tables, use_dask, cache_dir, cache_per_query, cache_with_dask, root, shared, name)#

FileSource loads CSV, Excel and Parquet files using pandas and dask read_* functions.

The FileSource can declare a list or dictionary of local or remote files which are then loaded using either pandas.read_* or dask.dataframe.read_* functions depending on whether use_dask is enabled.


Parameters#

dask

type: bool
default: False
Whether to return a Dask dataframe.

kwargs

type: dict
default: None
Keyword arguments to the pandas/dask loading function.

tables

type: list | dict
default: None
List or dictionary of tables to load. If a list is supplied thenames are computed from the filenames, otherwise the keys arethe names. The values must filepaths or URLs to the data:{    'local' : '/home/user/local_file.csv',    'remote': 'https://test.com/test.csv'}if the filepath does not have a declared extension an extensionmay be provided in a list or tuple, e.g.:{'table': ['http://test.com/api', 'json']}

use_dask

type: bool
default: True
Whether to use dask to load files.


Methods#

FileSource.clear_cache(*events: Event)#

Clears any cached data.

FileSource.get(table: str, **query) DataFrame#

Return a table; optionally filtered by the given query.

Parameters:
  • table (str) – The name of the table to query

  • query (dict) – A dictionary containing all the query parameters

Returns:

A DataFrame containing the queried table.

Return type:

DataFrame

FileSource.get_schema(table: str | None = None) Dict[str, Dict[str, Any]] | Dict[str, Any]#

Returns JSON schema describing the tables returned by the Source.

Parameters:

table (str or None) – The name of the table to return the schema for. If None returns schema for all available tables.

Returns:

JSON schema(s) for one or all the tables.

Return type:

dict

FileSource.get_tables() List[str]#

Returns the list of tables available on this source.

Returns:

The list of available tables on this source.

Return type:

list

FileSource.to_spec(context: Dict[str, Any] | None = None) Dict[str, Any]#

Exports the full specification to reconstruct this component.

Return type:

Resolved and instantiated Component object