iModel¶
class iModel(Element)
Uniform interface to any LLM provider with rate limiting, queuing, and hooks.
Constructor¶
model = li.iModel(
provider="openai",
model="gpt-4o",
api_key=None, # falls back to OPENAI_API_KEY env var
limit_requests=100,
limit_tokens=100_000,
)
| Param | Type | Default | Notes |
|---|---|---|---|
provider | str \| None | None | "openai", "anthropic", etc. Inferred from model if set |
base_url | str \| None | None | Custom API URL (for proxies, local endpoints) |
endpoint | str \| Endpoint | "chat" | Endpoint type (see table below) |
api_key | str \| None | None | Explicit key; falls back to env var |
queue_capacity | int \| None | auto | Max queued requests before backpressure |
capacity_refresh_time | float | 60 | Seconds between queue capacity refreshes |
interval | float \| None | auto | Queue processing interval in seconds |
limit_requests | int \| None | None | Max requests per rate-limit cycle |
limit_tokens | int \| None | None | Max tokens per rate-limit cycle |
concurrency_limit | int \| None | None | Max concurrent streams |
streaming_process_func | Callable \| None | None | Custom chunk processor for streaming responses |
provider_metadata | dict \| None | None | Provider-specific metadata (e.g., CLI session IDs) |
hook_registry | HookRegistry \| dict \| None | HookRegistry() | Pre/post invocation hooks |
**kwargs | — | — | Provider-specific config (e.g., model="gpt-4o", temperature=0.7) |
Endpoint types¶
Chat / LLM¶
provider= | Default endpoint= | Key env var |
|---|---|---|
"openai" | "chat" | OPENAI_API_KEY |
"anthropic" | "chat" | ANTHROPIC_API_KEY |
"gemini" | "chat" | GOOGLE_API_KEY |
"ollama" | "chat" | — (local) |
"groq" | "chat" | GROQ_API_KEY |
"deepseek" | "chat" | DEEPSEEK_API_KEY |
"perplexity" | "chat" | PERPLEXITY_API_KEY |
"openrouter" | "chat" | OPENROUTER_API_KEY |
"nvidia_nim" | "chat" | NVIDIA_NIM_API_KEY |
Embed¶
provider= | endpoint= | Key env var |
|---|---|---|
"nvidia_nim" | "embed" | NVIDIA_NIM_API_KEY |
OpenaiEmbedEndpoint and NvidiaNimEmbedEndpoint exist as classes but only nvidia_nim embed is routed via match_endpoint(). Pass an Endpoint instance directly for the others.
OpenAI responses API¶
provider= | endpoint= | Notes |
|---|---|---|
"openai" | "response" | Stateful Responses API (/v1/responses) |
CLI / Agentic¶
provider= | Aliases | Notes |
|---|---|---|
"claude_code" | "claude", "claude-code" | Claude Code CLI |
"codex" | — | OpenAI Codex CLI |
"gemini_code" | "gemini-code", "gemini_cli", "gemini-cli" | Gemini CLI |
"pi" | "pi-code", "pi_code" | Pi CLI |
"ag2" | — | AG2 GroupChat (stream-only; requires pip install lionagi[ag2]) |
CLI endpoints set is_cli = True. Branch.operate() routes to run_and_collect instead of communicate. See operations.md#middle-protocol.
Search¶
provider= | endpoint= | Key env var |
|---|---|---|
"exa" | "search" | EXA_API_KEY |
"tavily" | "search" | TAVILY_API_KEY |
"tavily" | "extract" | TAVILY_API_KEY |
Scrape / Crawl¶
provider= | endpoint= | Key env var |
|---|---|---|
"firecrawl" | "scrape" | FIRECRAWL_API_KEY |
"firecrawl" | "map" | FIRECRAWL_API_KEY |
Fallback¶
Any unrecognized provider falls back to an OpenAI-compatible generic chat endpoint. Pass base_url= to point at your custom host.
Endpoint matching¶
iModel(provider="openai", endpoint="chat")
→ match_endpoint("openai", "chat")
→ OpenaiChatEndpoint
match_endpoint() dispatches on (provider, endpoint) string containment:
- Default
endpoint="chat"resolves to the provider's chat class. - Single-endpoint providers (
claude_code,codex,gemini_code,pi) ignore theendpointargument and always return their only class. - Unrecognized providers fall back to a generic OpenAI-compatible
Endpoint.
Common construction patterns¶
import lionagi as li
# OpenAI (default)
model = li.iModel(model="gpt-4o")
# Anthropic
model = li.iModel(provider="anthropic", model="claude-opus-4-7-20251001")
# With rate limits
model = li.iModel(model="gpt-4o", limit_requests=100, limit_tokens=100_000)
# Ollama local
model = li.iModel(
provider="ollama",
base_url="http://localhost:11434",
model="llama3",
)
# NVIDIA NIM
model = li.iModel(provider="nvidia_nim", model="meta/llama-3.1-70b-instruct")
# DeepSeek
model = li.iModel(provider="deepseek", model="deepseek-chat")
# OpenAI Responses API
model = li.iModel(provider="openai", endpoint="response", model="gpt-4o")
# CLI endpoints (stream-only — use with Branch.run())
model = li.iModel(provider="claude_code", model="sonnet")
model = li.iModel(provider="codex", model="codex-mini-latest")
model = li.iModel(provider="gemini_code", model="gemini-2.5-pro")
model = li.iModel(provider="pi", model="pi")
# Search
exa = li.iModel(provider="exa", endpoint="search")
tvly = li.iModel(provider="tavily", endpoint="search")
# Scrape / crawl
crawl = li.iModel(provider="firecrawl", endpoint="scrape")
cmap = li.iModel(provider="firecrawl", endpoint="map")
# OpenAI-compatible custom host
model = li.iModel(
provider="my_provider",
base_url="https://my-api.example.com/v1",
model="my-model",
)
Public methods¶
invoke()¶
api_call = await model.invoke(
messages=[{"role": "user", "content": "hello"}],
temperature=0.7,
)
response_text = api_call.response
Sends a rate-limited request. Returns APICalling with .response attribute.
stream()¶
async for chunk in await model.stream(messages=[...]):
print(chunk, end="", flush=True)
Streaming request. Prefer Branch.run() for managed streaming with message history.
create_api_calling()¶
api_call = model.create_api_calling(
messages=[{"role": "user", "content": "hello"}],
)
# inspect before invoking
result = await model.invoke(api_call)
Constructs an APICalling object without sending the request.
copy()¶
model2 = model.copy(share_session=False)
Creates a fresh iModel with the same config but a new ID and executor. Use when you need independent rate-limit buckets for parallel workflows.
close()¶
await model.close()
Stops the executor and releases resources. Not needed when using as context manager.
Context manager¶
async with li.iModel(model="gpt-4o") as model:
api_call = await model.invoke(messages=[{"role": "user", "content": "hello"}])
print(api_call.response)
# executor closed automatically
Properties¶
| Property | Type | Notes |
|---|---|---|
model_name | str | Model identifier string |
is_cli | bool | True for CLI endpoints (claude_code, codex, gemini_code) |
request_options | type[BaseModel] \| None | Endpoint-specific request schema |
provider_session_id | str \| None | CLI session ID for resumption |
Provider resolution¶
Provider is inferred from model kwarg when it contains a slash (e.g., "anthropic/claude-opus-4-7"). Otherwise set provider explicitly. The provider string must match exactly (see aliases in the CLI table above for accepted variants).
provider string | API | Key env var |
|---|---|---|
"openai" | OpenAI | OPENAI_API_KEY |
"anthropic" | Anthropic | ANTHROPIC_API_KEY |
"gemini" | Google AI (OpenAI-compat) | GOOGLE_API_KEY |
"ollama" | Ollama local | — (no key needed) |
"nvidia_nim" | NVIDIA NIM | NVIDIA_NIM_API_KEY |
"perplexity" | Perplexity Sonar | PERPLEXITY_API_KEY |
"groq" | Groq | GROQ_API_KEY |
"openrouter" | OpenRouter | OPENROUTER_API_KEY |
"deepseek" | DeepSeek | DEEPSEEK_API_KEY |
"exa" | Exa Search | EXA_API_KEY |
"tavily" | Tavily | TAVILY_API_KEY |
"firecrawl" | Firecrawl | FIRECRAWL_API_KEY |
"claude_code" | Claude Code CLI | — |
"codex" | OpenAI Codex CLI | — |
"gemini_code" | Gemini CLI | — |
"pi" | Pi CLI | — |
HookRegistry¶
Pre/post invocation hooks for logging, caching, or metrics:
from lionagi.service.hooks import HookRegistry, HookEventTypes
async def log_pre(event, **kw):
print(f"Sending: {type(event).__name__}")
async def log_post(event, **kw):
print(f"Received: {type(event).__name__}")
hooks = HookRegistry(
hooks={
HookEventTypes.PreInvocation: log_pre,
HookEventTypes.PostInvocation: log_post,
}
)
model = li.iModel(model="gpt-4o", hook_registry=hooks)
Serialization¶
data = model.to_dict()
restored = li.iModel.from_dict(data)
Next: Operations & extension — Middle protocol and param types