Skip to content

iModel

class iModel(Element)

Uniform interface to any LLM provider with rate limiting, queuing, and hooks.

Constructor

model = li.iModel(
    provider="openai",
    model="gpt-4o",
    api_key=None,           # falls back to OPENAI_API_KEY env var
    limit_requests=100,
    limit_tokens=100_000,
)
Param Type Default Notes
provider str \| None None "openai", "anthropic", etc. Inferred from model if set
base_url str \| None None Custom API URL (for proxies, local endpoints)
endpoint str \| Endpoint "chat" Endpoint type (see table below)
api_key str \| None None Explicit key; falls back to env var
queue_capacity int \| None auto Max queued requests before backpressure
capacity_refresh_time float 60 Seconds between queue capacity refreshes
interval float \| None auto Queue processing interval in seconds
limit_requests int \| None None Max requests per rate-limit cycle
limit_tokens int \| None None Max tokens per rate-limit cycle
concurrency_limit int \| None None Max concurrent streams
streaming_process_func Callable \| None None Custom chunk processor for streaming responses
provider_metadata dict \| None None Provider-specific metadata (e.g., CLI session IDs)
hook_registry HookRegistry \| dict \| None HookRegistry() Pre/post invocation hooks
**kwargs Provider-specific config (e.g., model="gpt-4o", temperature=0.7)

Endpoint types

Chat / LLM

provider= Default endpoint= Key env var
"openai" "chat" OPENAI_API_KEY
"anthropic" "chat" ANTHROPIC_API_KEY
"gemini" "chat" GOOGLE_API_KEY
"ollama" "chat" — (local)
"groq" "chat" GROQ_API_KEY
"deepseek" "chat" DEEPSEEK_API_KEY
"perplexity" "chat" PERPLEXITY_API_KEY
"openrouter" "chat" OPENROUTER_API_KEY
"nvidia_nim" "chat" NVIDIA_NIM_API_KEY

Embed

provider= endpoint= Key env var
"nvidia_nim" "embed" NVIDIA_NIM_API_KEY

OpenaiEmbedEndpoint and NvidiaNimEmbedEndpoint exist as classes but only nvidia_nim embed is routed via match_endpoint(). Pass an Endpoint instance directly for the others.

OpenAI responses API

provider= endpoint= Notes
"openai" "response" Stateful Responses API (/v1/responses)

CLI / Agentic

provider= Aliases Notes
"claude_code" "claude", "claude-code" Claude Code CLI
"codex" OpenAI Codex CLI
"gemini_code" "gemini-code", "gemini_cli", "gemini-cli" Gemini CLI
"pi" "pi-code", "pi_code" Pi CLI
"ag2" AG2 GroupChat (stream-only; requires pip install lionagi[ag2])

CLI endpoints set is_cli = True. Branch.operate() routes to run_and_collect instead of communicate. See operations.md#middle-protocol.

provider= endpoint= Key env var
"exa" "search" EXA_API_KEY
"tavily" "search" TAVILY_API_KEY
"tavily" "extract" TAVILY_API_KEY

Scrape / Crawl

provider= endpoint= Key env var
"firecrawl" "scrape" FIRECRAWL_API_KEY
"firecrawl" "map" FIRECRAWL_API_KEY

Fallback

Any unrecognized provider falls back to an OpenAI-compatible generic chat endpoint. Pass base_url= to point at your custom host.

Endpoint matching

iModel(provider="openai", endpoint="chat")
  → match_endpoint("openai", "chat")
  → OpenaiChatEndpoint

match_endpoint() dispatches on (provider, endpoint) string containment:

  • Default endpoint="chat" resolves to the provider's chat class.
  • Single-endpoint providers (claude_code, codex, gemini_code, pi) ignore the endpoint argument and always return their only class.
  • Unrecognized providers fall back to a generic OpenAI-compatible Endpoint.

Common construction patterns

import lionagi as li

# OpenAI (default)
model = li.iModel(model="gpt-4o")

# Anthropic
model = li.iModel(provider="anthropic", model="claude-opus-4-7-20251001")

# With rate limits
model = li.iModel(model="gpt-4o", limit_requests=100, limit_tokens=100_000)

# Ollama local
model = li.iModel(
    provider="ollama",
    base_url="http://localhost:11434",
    model="llama3",
)

# NVIDIA NIM
model = li.iModel(provider="nvidia_nim", model="meta/llama-3.1-70b-instruct")

# DeepSeek
model = li.iModel(provider="deepseek", model="deepseek-chat")

# OpenAI Responses API
model = li.iModel(provider="openai", endpoint="response", model="gpt-4o")

# CLI endpoints (stream-only — use with Branch.run())
model = li.iModel(provider="claude_code", model="sonnet")
model = li.iModel(provider="codex", model="codex-mini-latest")
model = li.iModel(provider="gemini_code", model="gemini-2.5-pro")
model = li.iModel(provider="pi", model="pi")

# Search
exa   = li.iModel(provider="exa", endpoint="search")
tvly  = li.iModel(provider="tavily", endpoint="search")

# Scrape / crawl
crawl = li.iModel(provider="firecrawl", endpoint="scrape")
cmap  = li.iModel(provider="firecrawl", endpoint="map")

# OpenAI-compatible custom host
model = li.iModel(
    provider="my_provider",
    base_url="https://my-api.example.com/v1",
    model="my-model",
)

Public methods

invoke()

api_call = await model.invoke(
    messages=[{"role": "user", "content": "hello"}],
    temperature=0.7,
)
response_text = api_call.response

Sends a rate-limited request. Returns APICalling with .response attribute.

stream()

async for chunk in await model.stream(messages=[...]):
    print(chunk, end="", flush=True)

Streaming request. Prefer Branch.run() for managed streaming with message history.

create_api_calling()

api_call = model.create_api_calling(
    messages=[{"role": "user", "content": "hello"}],
)
# inspect before invoking
result = await model.invoke(api_call)

Constructs an APICalling object without sending the request.

copy()

model2 = model.copy(share_session=False)

Creates a fresh iModel with the same config but a new ID and executor. Use when you need independent rate-limit buckets for parallel workflows.

close()

await model.close()

Stops the executor and releases resources. Not needed when using as context manager.

Context manager

async with li.iModel(model="gpt-4o") as model:
    api_call = await model.invoke(messages=[{"role": "user", "content": "hello"}])
    print(api_call.response)
# executor closed automatically

Properties

Property Type Notes
model_name str Model identifier string
is_cli bool True for CLI endpoints (claude_code, codex, gemini_code)
request_options type[BaseModel] \| None Endpoint-specific request schema
provider_session_id str \| None CLI session ID for resumption

Provider resolution

Provider is inferred from model kwarg when it contains a slash (e.g., "anthropic/claude-opus-4-7"). Otherwise set provider explicitly. The provider string must match exactly (see aliases in the CLI table above for accepted variants).

provider string API Key env var
"openai" OpenAI OPENAI_API_KEY
"anthropic" Anthropic ANTHROPIC_API_KEY
"gemini" Google AI (OpenAI-compat) GOOGLE_API_KEY
"ollama" Ollama local — (no key needed)
"nvidia_nim" NVIDIA NIM NVIDIA_NIM_API_KEY
"perplexity" Perplexity Sonar PERPLEXITY_API_KEY
"groq" Groq GROQ_API_KEY
"openrouter" OpenRouter OPENROUTER_API_KEY
"deepseek" DeepSeek DEEPSEEK_API_KEY
"exa" Exa Search EXA_API_KEY
"tavily" Tavily TAVILY_API_KEY
"firecrawl" Firecrawl FIRECRAWL_API_KEY
"claude_code" Claude Code CLI
"codex" OpenAI Codex CLI
"gemini_code" Gemini CLI
"pi" Pi CLI

HookRegistry

Pre/post invocation hooks for logging, caching, or metrics:

from lionagi.service.hooks import HookRegistry, HookEventTypes

async def log_pre(event, **kw):
    print(f"Sending: {type(event).__name__}")

async def log_post(event, **kw):
    print(f"Received: {type(event).__name__}")

hooks = HookRegistry(
    hooks={
        HookEventTypes.PreInvocation: log_pre,
        HookEventTypes.PostInvocation: log_post,
    }
)

model = li.iModel(model="gpt-4o", hook_registry=hooks)

Serialization

data = model.to_dict()
restored = li.iModel.from_dict(data)

Next: Operations & extension — Middle protocol and param types