Speech 2.5 is an AI model by MiniMax that generates natural-sounding speech and music from text. It’s used to voice videos, audiobooks, podcasts, and educational content.
Key features
- Text-to-speech generation in seconds
- Voice cloning from a 10-second audio sample
- Support for 51 languages, including Russian
- Controls for tone, emotion, and accent
- Music generation from a text description
- Voice isolation and noise reduction for cleaner audio
- Voices for different ages
- High-quality audio export
- Up to 10 minutes of audio per request
How to use
Speech 2.5 is available via a web platform and iOS/Android apps.
- Open minimax.io or the mobile app
- Enter the text you want to convert to audio
- Choose a language and voice
- Adjust settings like tone and emotion
- Generate and download the audio file
A free version includes 5 trial generations. The paid plan costs $48/year and includes 1.2 million credits. Developer access is available via an API, and the library includes 300 voices and styles.

