RealTimeX Setup
LLM Setup
Local
Ollama

Ollama

Ollama lets RealTimeX use a model served by an Ollama instance instead of the built-in managed local runtime.

Open it from Settings > AI Providers > LLM, then choose Ollama.

What you need

Before RealTimeX can use Ollama:

  • Ollama must already be installed and running
  • the Ollama server URL must be reachable from the RealTimeX instance
  • at least one chat model must already exist in Ollama

This page selects from existing Ollama models. It does not pull the model for you from the main LLM selector.

If you still need to pull a model first, use the Ollama CLI or Local Models.

Current setup flow

  1. Start Ollama.
  2. Make sure the model you want is already installed in Ollama.
  3. Open Settings > AI Providers > LLM.
  4. Choose Ollama.
  5. Select the model from the dropdown.
  6. Expand Advanced if you need to change the endpoint or runtime behavior.
  7. Save the provider settings.
  8. Test a normal chat.

Model selection behavior

The model dropdown is populated dynamically from the Ollama server.

Important behavior:

  • if the URL is missing or invalid, the model list stays unavailable
  • if the server is reachable but has no models, there is nothing to select
  • if you recently pulled a new model, refresh or reopen the page so RealTimeX can load it

Endpoint setup

The current advanced controls expose the Ollama base URL.

Common local values include:

  • http://127.0.0.1:11434
  • http://host.docker.internal:11434
  • http://172.17.0.1:11434

The UI can also try endpoint auto-detection against common local addresses.

Advanced controls

The current Ollama LLM setup supports more than just URL plus model.

Maximum tokens

Use Maximum Tokens to set the context budget RealTimeX should assume for this Ollama provider.

Performance mode

The current dropdown supports:

  • Base
  • Maximum

Leave it on the default unless you have measured a reason to change it.

Keep-alive memory behavior

The current keep-alive options control how long Ollama should keep a model in memory:

  • unload immediately
  • keep warm for a few minutes
  • keep warm for an hour
  • keep warm indefinitely

This is useful when you want faster repeated chats at the cost of keeping more memory occupied.

Auth token

The auth token field is optional. Use it only when your Ollama endpoint is behind authentication and expects a bearer token.

When to use Ollama instead of RealTimeX Local

  • Use Ollama when you already run Ollama and want RealTimeX to attach to it.
  • Use RealTimeX Local when you want the fully managed built-in GGUF flow instead of a separate external runtime.

Troubleshooting

The model dropdown says it is waiting for the URL

Enter a working Ollama URL first or use auto-detect.

The URL is correct but no models appear

The Ollama server may be up with no models installed yet. Pull a model first, then return to the selector.

RealTimeX can reach Ollama only on some machines

Double-check whether you are connecting to a local desktop instance, Docker host bridge, or a remote/self-hosted Ollama service. The correct URL differs by deployment.