Transcription Models

The Transcription page controls how RealTimeX transcribes audio recordings.

Open it from Settings > Transcription.

What this page is for

This page is for transcription workflows that process recorded or imported audio.

It is not the same as the Speech to Text section inside Settings > Voice & Speech, which controls chat-thread dictation and voice chat.

Use Transcription when you are configuring the general audio-transcription pipeline.

Use Voice & Speech when you are configuring live mic input in chat.

The current product exposes two transcription providers on this page:

RealTimeX Local uses browser-based Whisper models for on-device transcription.

The current setup flow lets you:

Smaller models are faster and lighter.

Larger models are usually more accurate, but they require more device resources and more download time.

OpenAI uses cloud Whisper transcription.

The current setup is simple:

Use this when you want cloud transcription instead of downloading local models.

Use RealTimeX Local when privacy, offline use, or avoiding cloud calls matters most.
Use OpenAI when you want a managed cloud transcription path and do not want to handle local model downloads.

RealTimeX now has two separate audio input settings areas:

Settings > Transcription This page controls the transcription pipeline for recorded audio workflows.
Settings > Voice & Speech This controls live Speech to Text and Text to Speech behavior used in chat.

If you are trying to change dictation or voice-chat behavior and nothing changes, you are probably on the wrong page.