SaluteSpeech is Sber’s neural network for speech recognition and speech synthesis. It converts audio to text and text to audio, helping automate customer communications, transcription, and voice content production.
What SaluteSpeech can do
- Real-time speech recognition
- Text-to-speech with seven voices
- Languages: Russian, English, Kazakh
- Emotion detection: positive, neutral, negative
- Background noise and profanity filtering
- SSML support to control pronunciation, pauses, and emotion
- Speaker diarization for multi-speaker recordings
- Text generation via the GigaChat API
How to use it
SaluteSpeech is available as a Windows and macOS desktop app, a Telegram bot, and via the SaluteSpeech API. Typical workflow:
- Install the app from the official website
- Sign up in the Studio account
- Create SaluteSpeech and GigaChat projects
- Get API tokens for authorization
- Choose a mode: recognition, synthesis, or generation
- Upload audio or enter text
- Set options (language, voice, emotions)
- Export the result in the needed format
Pricing follows a freemium model for individuals: 100 minutes of recognition and 200,000 synthesis characters per month for free. Paid plans start from 600 ₽/month.

