FileSource type: file#
- class lumen.sources.base.FileSource(*, dask, kwargs, tables, use_dask, cache_dir, cache_per_query, cache_with_dask, root, shared, name)#
FileSource loads CSV, Excel and Parquet files using pandas and dask read_* functions.
The FileSource can declare a list or dictionary of local or remote files which are then loaded using either pandas.read_* or dask.dataframe.read_* functions depending on whether use_dask is enabled.
Parameters#
type: bool
default: False
Whether to return a Dask dataframe.
type: dict
default: None
Keyword arguments to the pandas/dask loading function.
type: list | dict
default: None
List or dictionary of tables to load. If a list is supplied thenames are computed from the filenames, otherwise the keys arethe names. The values must filepaths or URLs to the data:{ 'local' : '/home/user/local_file.csv', 'remote': 'https://test.com/test.csv'}
if the filepath does not have a declared extension an extensionmay be provided in a list or tuple, e.g.:{'table': ['http://test.com/api', 'json']}
type: bool
default: True
Whether to use dask to load files.
Methods#
- FileSource.clear_cache(*events: Event)#
Clears any cached data.
- FileSource.get(table: str, **query) DataFrame #
Return a table; optionally filtered by the given query.
- Parameters:
table (str) – The name of the table to query
query (dict) – A dictionary containing all the query parameters
- Returns:
A DataFrame containing the queried table.
- Return type:
DataFrame
- FileSource.get_schema(table: str | None = None, limit: int | None = None) Dict[str, Dict[str, Any]] | Dict[str, Any] #
Returns JSON schema describing the tables returned by the Source.
- Parameters:
table (str | None) – The name of the table to return the schema for. If None returns schema for all available tables.
limit (int | None) – Limits the number of rows considered for the schema calculation
- Returns:
JSON schema(s) for one or all the tables.
- Return type:
dict
- FileSource.get_tables() List[str] #
Returns the list of tables available on this source.
- Returns:
The list of available tables on this source.
- Return type:
list
- FileSource.to_spec(context: Dict[str, Any] | None = None) Dict[str, Any] #
Exports the full specification to reconstruct this component.
- Return type:
Resolved and instantiated Component object