Transcription Models
The Transcription page controls how RealTimeX transcribes audio recordings.
Open it from Settings > Transcription.
What this page is for
This page is for transcription workflows that process recorded or imported audio.
It is not the same as the Speech to Text section inside Settings > Voice & Speech, which controls chat-thread dictation and voice chat.
Use Transcription when you are configuring the general audio-transcription pipeline.
Use Voice & Speech when you are configuring live mic input in chat.
Current provider choices
The current product exposes two transcription providers on this page:
RealTimeX LocalOpenAI
RealTimeX Local
RealTimeX Local uses browser-based Whisper models for on-device transcription.
The current setup flow lets you:
- choose a Whisper model
- download models on first use or ahead of time
- trade speed against accuracy based on your device
- keep transcription local to the machine running RealTimeX
Smaller models are faster and lighter.
Larger models are usually more accurate, but they require more device resources and more download time.
OpenAI
OpenAI uses cloud Whisper transcription.
The current setup is simple:
- add an OpenAI API key
- use the built-in OpenAI Whisper configuration shown in the page
Use this when you want cloud transcription instead of downloading local models.
Choosing between local and cloud transcription
- Use
RealTimeX Localwhen privacy, offline use, or avoiding cloud calls matters most. - Use
OpenAIwhen you want a managed cloud transcription path and do not want to handle local model downloads.
Important distinction: Transcription vs Speech to Text
RealTimeX now has two separate audio input settings areas:
Settings > TranscriptionThis page controls the transcription pipeline for recorded audio workflows.Settings > Voice & SpeechThis controls liveSpeech to TextandText to Speechbehavior used in chat.
If you are trying to change dictation or voice-chat behavior and nothing changes, you are probably on the wrong page.