Open navigation menu
AIDive
EN
Sign in

Description

Scribe is a speech-to-text model from ElevenLabs designed for accurate transcription of recorded audio and video.

What it does

Scribe is built for real-world recordings, including multiple speakers and background noise. It’s available through the ElevenLabs web interface and via API, with a free option to try basic text transcription.

  • Transcribes audio and video files (not live audio)
  • Supports 99 languages, including Russian, Serbian, Cantonese, and Malayalam
  • Reports speech recognition accuracy above 95%
  • Outputs structured JSON with word-level timestamps

Speaker and event detection

Scribe includes features aimed at complex recordings and analysis.

  • Speaker diarization for up to 32 speakers per file
  • Tags non-speech events such as laughter, sighs, applause, music, and background noise

Pricing and limitations

Paid usage is priced at $0.40 per hour of audio, with a 50% discount for the first six weeks after release. At launch, Scribe works only with pre-recorded files; ElevenLabs has announced a low-latency version for real-time transcription. In Russia, the service may be affected by regional restrictions.

22
0 comments

Newsletter

Get notified when new AI tools are added

Join the community.