Custom Data Sources#

What does this guide solve?

This guide shows you how to configure custom data sources for Lumen AI.

Overview#

We will be using a local LLM to understand how to load custom data sources in Lumen. You do not need to use a local LLM, and can instead opt for using one you have an API key for, ensure your API key is in the environment of the terminal you run your commands in.

Local and remote files using the command line#

You can download the standard penguins data set here. To start Lumen AI, run the following command (replacing the path where you downloaded the data to).

lumen-ai serve penguins.csv --provider llama-cpp --show

If instead you do not want to download data, you can tell Lumen where the data is on the web, and start a chat.

lumen-ai serve "https://datasets.holoviz.org/penguins/v1/penguins.csv" --provider llama-cpp --show

Local and remote files using a Panel app#

Download the earthquakes dataset and make a note of where it is on your system. Create a file called app.py and update the path to where the earthquakes data was downloaded to. We will use both a local file, and a remote file with the app.

# app.py
import lumen.ai as lmai
import panel as pn

pn.extension("vega")
llm = lmai.llm.LlamaCpp()

lmai.ExplorerUI(
    data=[
        "/LOCAL/PATH/TO/earthquakes.csv",
        "https://datasets.holoviz.org/penguins/v1/penguins.csv",
    ],
    llm=llm,
    agents=[lmai.agents.SQLAgent, lmai.agents.VegaLiteAgent],
).servable()

Run the app.py file with the following command.

panel serve app.py --show

Custom data sources#

You can also create apps using custom Lumen sources. Below are examples connecting to different sources.

Snowflake#

import lumen.ai as lmai
from lumen.sources.snowflake import SnowflakeSource

source = SnowflakeSource(
    account="...",
    authenticator="externalbrowser",
    database="...",
    user="...",
)
lmai.ExplorerUI(source).servable()

DuckDB#

One thing to note about using DuckDB as a custom data source is that if your parquet files have a non-standard extension name e.g. .parq, then you need to wrap the path to those files with the directive read_parquet(...). DuckDB has many native ways for reading parquet files, see https://duckdb.org/docs/data/parquet/overview.html for an overview of the methods available to you, and which ones you will need to use the directive for when using Lumen AI.

import lumen.ai as lmai
from lumen.sources.duckdb import DuckDBSource

# Use a list
tables = ["path/to/parquet/dataset/file.parquet", "read_parquet('file.parq')", "penguins.csv"]
# Use a dictionary
#tables = {
#    "penguins": "path/to/penguins.csv",
#    "earthquakes": "read_parquet('path/to/earthquakes.parq')",
#}

source = DuckDBSource(tables=tables)
lmai.ExplorerUI(source).servable()

No local or remote data files#

The Panel apps and terminal commands above use local paths or URIs to hosted data files, however, you are not required to specify data files when starting Lumen AI. You can run the following command in your terminal.

lumen-ai serve --provider llama-cpp --show

Lumen AI will start up in your default browser with no data available. You can use the Drag & Drop area to upload your local data files, or select the Browse link to open a file dialog where you can select the data you wish to upload to Lumen AI.