Large Language Models

RealTimeX lets you choose a system-wide language model provider, then override it per workspace when a specific chat flow needs something different.

Open the main selector from Settings > AI Providers > LLM.

How the system LLM setting works

The Language Model page controls the default provider used by the instance.

That default matters for:

workspace chats that inherit the system setting
features that rely on the shared system LLM preference
any workflow that does not explicitly choose a different provider

When you select a provider, RealTimeX shows that provider's configuration fields directly below the selector. Depending on the provider, that can include:

API keys
base URLs
deployment names
token limits
model selectors
provider-specific advanced options

Provider types

The current product supports several kinds of language model provider.

RealTimeX-managed options

RealTimeX Cloud for hosted models without local inference setup
RealTimeX Local for GGUF models running on your machine through the managed llama-server runtime

Hosted cloud providers

Current built-in cloud choices include providers such as:

OpenAI
Azure OpenAI
Anthropic
Gemini
DeepSeek
Groq
Mistral
Cohere
Perplexity
OpenRouter
Together AI
Fireworks AI
AWS Bedrock
xAI
Moonshot AI
Novita AI
PPIO
APIpie

Self-hosted and local endpoints

Current built-in local or self-hosted choices include:

Ollama
LM Studio
llama.cpp
Local AI
KoboldCPP
Oobabooga Web UI
Dell Pro AI Studio
LiteLLM
Generic OpenAI

Plugin providers

Plugins can register additional LLM providers. When that happens, those providers appear in the same selector as the built-in options.

Workspace overrides

Each workspace can either inherit the system default or choose its own provider.

In workspace chat settings, the current flow supports:

System default to inherit the instance-wide provider
a workspace-specific provider selection
a workspace-specific model selection when that provider supports it

Some providers do not yet support full multi-model workspace selection. In those cases, the workspace can still point at that provider, but the actual model comes from the system-level configuration for that provider.

Local model management

If you use local inference, there are two related settings areas.

`Settings > AI Providers > RealTimeX Local`

Use this page when you want the built-in managed local workflow. The current product uses it to manage:

the llama-server runtime
runtime download and update status
hardware/backend availability
the default RealTimeX Local model

For the full workflow, see RealTimeX Local.

`Settings > AI Providers > Local Model Management`

Use this page when you want to inspect or manage model inventories across local backends.

The current UI covers:

Ollama
LocalAI
LM Studio
llama.cpp
RealTimeX Local

Depending on the backend, you can inspect status, review model counts, pull or download models, delete models, warm models into memory, and set defaults.

For the cross-provider guide, see Local Models.

Choosing the right option

Use RealTimeX Cloud when you want the simplest hosted setup.
Use RealTimeX Local when you want an integrated on-device GGUF workflow without running a separate model server yourself.
Use a hosted provider like OpenAI, Anthropic, or Gemini when your team already operates around that vendor.
Use Generic OpenAI or LiteLLM when you want a compatibility layer in front of multiple model backends.
Use Ollama, LM Studio, or llama.cpp when you already run local model infrastructure and want RealTimeX to connect to it.

Credentials and access

Many hosted providers require keys, endpoints, or deployment details. Those are configured inside the provider section on the Language Model page.

Use API Access & Keys when outside clients need to call into RealTimeX.

Use Credentials when RealTimeX agents or tools need reusable outbound secrets for other systems.

Embedding Models Transcription Models