Newsletter
Get notified when new AI tools are added
Join the community.
Parakeet is an NVIDIA speech-to-text model that converts English audio into text with high accuracy. It supports punctuation and capitalization, and can process up to 24 minutes of audio in a single pass.
Built on the FastConformer architecture, Parakeet focuses on fast transcription while preserving speech details. It’s designed to handle long recordings and noisy audio, and can be used for tasks like subtitles, voice assistants, and call analytics.
Parakeet ranks on the Hugging Face Open ASR Leaderboard with a 6.05% word error rate.
Parakeet is available via Hugging Face as a web demo and as a model you can run locally with NVIDIA NeMo. Use WAV or FLAC audio at 16,000 Hz. Hugging Face access is free with processing limits; local use is also free but requires an NVIDIA GPU. English only.