LLM Providers¶

Configure which AI model powers Lumen.

First time setup?

See the Installation guide for step-by-step instructions on setting up API keys and environment variables for each provider.

Quick start¶

Set your API key and launch:

export OPENAI_API_KEY="sk-..."
lumen-ai serve penguins.csv

Lumen auto-detects the provider from environment variables.

Different models per agent¶

Use cheap models for simple tasks, powerful models for complex tasks:

Cost-optimized configuration

import lumen.ai as lmai

model_config = {
    "default": {"model": "gpt-4.1-mini"},  # Cheap for most agents
    "sql": {"model": "gpt-4.1"},           # Powerful for SQL
    "vega_lite": {"model": "gpt-4.1"},     # Powerful for charts
    "deck_gl": {"model": "gpt-4.1"},       # Powerful for 3D maps
    "analyst": {"model": "gpt-4.1"},       # Powerful for analysis
}

llm = lmai.llm.OpenAI(model_kwargs=model_config)
ui = lmai.ExplorerUI(data='penguins.csv', llm=llm)
ui.servable()

Agent names map to model types: SQLAgent → "sql", VegaLiteAgent → "vega_lite", etc.

Configure temperature¶

Lower temperature = more deterministic. Higher = more creative.

Temperature by task

model_config = {
    "sql": {
        "model": "gpt-4.1",
        "temperature": 0.1,  # Deterministic SQL
    },
    "chat": {
        "model": "gpt-4.1-mini",
        "temperature": 0.4,  # Natural conversation
    },
}

Recommended ranges: 0.1 (SQL) to 0.4 (chat).

Supported providers¶

For installation and API key setup instructions, see the Installation guide.

Cloud providers¶

Provider	Default Model	Popular Models
OpenAI	`gpt-4.1-mini`	`gpt-4.1`, `gpt-4-turbo`, `gpt-4`
Anthropic	`claude-haiku-4-5`	`claude-sonnet-4-5`, `claude-opus-4-5`
Google	`gemini-3-flash-preview`	`gemini-3-pro-preview`, `gemini-2.5-flash`, `gemini-2.0-flash`
Mistral	`mistral-small-latest`	`mistral-large-latest`, `ministral-8b-latest`
Azure OpenAI	`gpt-4.1-mini`	`gpt-4.1`, `gpt-4-turbo`, `gpt-4`
Azure Mistral	`azureai`	`mistral-large`, `mistral-small`

Reasoning Models Not Suitable for Dialog

Reasoning models like gpt-5, o4-mini, and gemini-2.0-flash-thinking are significantly slower than standard models. They are designed for single, complex queries that require deep thinking, not interactive chat interfaces. For dialog-based applications like Lumen, use standard models for better user experience.

Local providers¶

Provider	Default Model	Notes
Ollama	`qwen3:32b`	Requires Ollama installed, models pulled locally
Llama.cpp	`unsloth/Qwen3-32B-GGUF`	Auto-downloads models on first use

Recommended local models:

General purpose: qwen3:32b, llama3.3:70b, qwen3:30b-a3b, nemotron-3-nano:30b
Coding: qwen3-coder:32b, qwen2.5-coder:32b
Reasoning: nemotron-3-nano:30b

Router / Gateway providers¶

Provider	Purpose	Setup
AWS Bedrock	Gateway to Anthropic, Meta, Mistral, Amazon models	Installation guide
LiteLLM	Router for 100+ models across all providers	Installation guide

AWS Bedrock options:

AnthropicBedrock - Optimized for Claude models using Anthropic's SDK
Bedrock - Universal access to all Bedrock models (Claude, Llama, Mistral, Titan, etc.)

Advanced configuration¶

Custom endpoints¶

Override default API endpoints:

Custom endpoint

llm = lmai.llm.OpenAI(
    api_key='...',
    endpoint='https://your-custom-endpoint.com/v1'
)

Managed Identity (Azure)¶

Use Azure Active Directory authentication:

Azure Managed Identity

from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

llm = lmai.llm.AzureOpenAI(
    api_version="2024-02-15-preview",
    endpoint="https://your-resource.openai.azure.com/",
    model_kwargs={
        "default": {
            "model": "gpt4o-mini",
            "azure_ad_token_provider": token_provider
        }
    }
)

Fallback models (LiteLLM)¶

Automatically retry with backup models if primary fails:

Fallback configuration

llm = lmai.llm.LiteLLM(
    model_kwargs={
        "default": {"model": "gpt-4.1-mini"}
    },
    fallback_models=[
        "gpt-4.1-mini",
        "claude-haiku-4-5",
        "gemini/gemini-2.5-flash"
    ]
)

Remote Ollama server¶

Connect to Ollama running on another machine:

Remote Ollama

llm = lmai.llm.Ollama(
    endpoint='http://your-server:11434/v1',
    model_kwargs={"default": {"model": "qwen3:32b"}}
)

Model types¶

Agent class names convert to model types automatically:

Agent	Model type
SQLAgent	`sql`
VegaLiteAgent	`vega_lite`
DeckGLAgent	`deck_gl`
ChatAgent	`chat`
AnalysisAgent	`analysis`
(others)	`default`

Conversion rule: remove "Agent" suffix, convert to snake_case.

Additional model types:

edit - Used when fixing errors
ui - Used for UI initialization

Model string formats¶

Different providers use different model string formats:

OpenAI: "gpt-4.1", "gpt-4.1-mini", "gpt-4-turbo", "gpt-4"
Anthropic: "claude-sonnet-4-5", "claude-haiku-4-5", "claude-opus-4-5"
Google: "gemini-3-flash-preview", "gemini-2.5-flash"
Mistral: "mistral-large-latest", "mistral-small-latest"
Azure: "your-deployment-name" (use your Azure deployment name)
Bedrock: "us.anthropic.claude-sonnet-4-5-20250929-v1:0", "meta.llama3-70b-instruct-v1:0"
LiteLLM: "gpt-4.1-mini" (OpenAI), "anthropic/claude-sonnet-4-5" (Anthropic), "gemini/gemini-2.5-flash" (Google)

For LiteLLM, use the provider/model format for non-OpenAI models.

Troubleshooting¶

"API key not found" - Set environment variable or pass api_key= in Python.

Wrong model used - Model type names must be snake_case: "sql" not "SQLAgent".

High costs - Use gpt-4.1-mini or claude-haiku-4-5 for default, reserve gpt-4.1 or claude-sonnet-4-5 for critical tasks (sql, vega_lite, analyst).

Slow responses - Local models are slower than cloud APIs. Use cloud providers when speed matters.

AWS credentials not found - For Bedrock, ensure AWS credentials are configured. See Installation guide.

Best practices¶

Use efficient models elsewhere:

default - Simple tasks work well with gpt-4.1-mini or claude-haiku-4-5
chat - Conversation works with smaller models

Avoid reasoning models in dialog interfaces:

Reasoning models (o1, o1-mini, gemini-3.0-pro) are significantly slower
They're designed for single, complex queries, not interactive chat
For Lumen's dialog interface, use standard models (mistral-small-latest, gpt-4.1-mini, claude-sonnet-4-5)
Reserve reasoning models for batch processing or one-off complex analyses

Set temperature by task:

0.1 for SQL (deterministic)
0.3-0.4 for analysis and chat
0.5-0.7 for creative tasks

Test before deploying:

Different models behave differently. Test with real queries.