Ollama
Ollama lets RealTimeX use a model served by an Ollama instance instead of the built-in managed local runtime.
Open it from Settings > AI Providers > LLM, then choose Ollama.
What you need
Before RealTimeX can use Ollama:
- Ollama must already be installed and running
- the Ollama server URL must be reachable from the RealTimeX instance
- at least one chat model must already exist in Ollama
This page selects from existing Ollama models. It does not pull the model for you from the main LLM selector.
If you still need to pull a model first, use the Ollama CLI or Local Models.
Current setup flow
- Start Ollama.
- Make sure the model you want is already installed in Ollama.
- Open
Settings > AI Providers > LLM. - Choose
Ollama. - Select the model from the dropdown.
- Expand
Advancedif you need to change the endpoint or runtime behavior. - Save the provider settings.
- Test a normal chat.
Model selection behavior
The model dropdown is populated dynamically from the Ollama server.
Important behavior:
- if the URL is missing or invalid, the model list stays unavailable
- if the server is reachable but has no models, there is nothing to select
- if you recently pulled a new model, refresh or reopen the page so RealTimeX can load it
Endpoint setup
The current advanced controls expose the Ollama base URL.
Common local values include:
http://127.0.0.1:11434http://host.docker.internal:11434http://172.17.0.1:11434
The UI can also try endpoint auto-detection against common local addresses.
Advanced controls
The current Ollama LLM setup supports more than just URL plus model.
Maximum tokens
Use Maximum Tokens to set the context budget RealTimeX should assume for this Ollama provider.
Performance mode
The current dropdown supports:
BaseMaximum
Leave it on the default unless you have measured a reason to change it.
Keep-alive memory behavior
The current keep-alive options control how long Ollama should keep a model in memory:
- unload immediately
- keep warm for a few minutes
- keep warm for an hour
- keep warm indefinitely
This is useful when you want faster repeated chats at the cost of keeping more memory occupied.
Auth token
The auth token field is optional. Use it only when your Ollama endpoint is behind authentication and expects a bearer token.
When to use Ollama instead of RealTimeX Local
- Use
Ollamawhen you already run Ollama and want RealTimeX to attach to it. - Use
RealTimeX Localwhen you want the fully managed built-in GGUF flow instead of a separate external runtime.
Troubleshooting
The model dropdown says it is waiting for the URL
Enter a working Ollama URL first or use auto-detect.
The URL is correct but no models appear
The Ollama server may be up with no models installed yet. Pull a model first, then return to the selector.
RealTimeX can reach Ollama only on some machines
Double-check whether you are connecting to a local desktop instance, Docker host bridge, or a remote/self-hosted Ollama service. The correct URL differs by deployment.