IntakeSource  type: intake#

class lumen.sources.intake.IntakeSource(*, catalog, dask, uri, load_schema, cache_dir, cache_per_query, cache_with_dask, root, shared, name)#

An IntakeSource loads data from an Intake catalog.

Intake is a lightweight set of tools for loading and sharing data in data science projects using convenient catalog specifications.

The IntakeSource can be given a dictionary catalog specification OR a URI pointing to a catalog.yaml file on disk.


Parameters#

catalog

type: dict
default: None
An inlined Catalog specification.

dask

type: bool
default: False
Whether to return a dask DataFrame.

load_schema

type: bool
default: True
Whether to load the schema

uri

type: str
default: ''
URI of the catalog file.


Methods#

IntakeSource.clear_cache(*events: Event)#

Clears any cached data.

IntakeSource.get(table, **query)#

Return a table; optionally filtered by the given query.

Parameters:
  • table (str) – The name of the table to query

  • query (dict) – A dictionary containing all the query parameters

Returns:

A DataFrame containing the queried table.

Return type:

DataFrame

IntakeSource.get_schema(table: str | None = None, limit: int | None = None) Dict[str, Dict[str, Any]] | Dict[str, Any]#

Returns JSON schema describing the tables returned by the Source.

Parameters:
  • table (str | None) – The name of the table to return the schema for. If None returns schema for all available tables.

  • limit (int | None) – Limits the number of rows considered for the schema calculation

Returns:

JSON schema(s) for one or all the tables.

Return type:

dict

IntakeSource.get_tables()#

Returns the list of tables available on this source.

Returns:

The list of available tables on this source.

Return type:

list

IntakeSource.to_spec(context: Dict[str, Any] | None = None) Dict[str, Any]#

Exports the full specification to reconstruct this component.

Return type:

Resolved and instantiated Component object