Scribe is a speech-to-text model from ElevenLabs designed for accurate transcription of recorded audio and video.
What it does
Scribe is built for real-world recordings, including multiple speakers and background noise. It’s available through the ElevenLabs web interface and via API, with a free option to try basic text transcription.
- Transcribes audio and video files (not live audio)
- Supports 99 languages, including Russian, Serbian, Cantonese, and Malayalam
- Reports speech recognition accuracy above 95%
- Outputs structured JSON with word-level timestamps
Speaker and event detection
Scribe includes features aimed at complex recordings and analysis.
- Speaker diarization for up to 32 speakers per file
- Tags non-speech events such as laughter, sighs, applause, music, and background noise
Pricing and limitations
Paid usage is priced at $0.40 per hour of audio, with a 50% discount for the first six weeks after release. At launch, Scribe works only with pre-recorded files; ElevenLabs has announced a low-latency version for real-time transcription. In Russia, the service may be affected by regional restrictions.

