Salad Transcription API is a cloud speech recognition API built on Whisper Large v3, designed for accurate transcription of audio and video.
What it does
Speech-to-text transcription for audio and video
Speech translation
Speech summarization
Basic content analysis
Accuracy for production use
According to the developers, Salad ranks among the most accurate options on the market and performs strongly in independent benchmarks. This makes it a fit for workflows where transcript quality matters, including:
Call centers and customer support
Podcasts and media production
Video platforms
E-learning and training products
Internal enterprise systems
Pricing and integration
Salad uses a pay-as-you-go model priced at $0.10–0.16 per hour of audio. Transcription, translation, and summarization are available through a single API, with no extra charges for enabling additional features. It’s designed to integrate into existing apps and backends without running complex speech infrastructure, supporting both startups and larger teams that need predictable costs and scalable throughput.

