AvatarFX is a multimodal model from Character.AI that animates a single photo into a longer, photorealistic video. It focuses on natural-looking facial expressions, speech sync, and consistent motion across the face, hands, and body, including scenes with more than one speaker.
How to access
AvatarFX is currently in a closed beta. To try it:
- Go to the official site and sign in
- Request access to the closed beta
- Upload a source image and an audio file, or use built-in text-to-speech
- Click Generate to receive the video
Key capabilities
- Generate video from one photo with realistic expressions, gestures, and audio
- Maintain consistent face/hand/body motion in longer clips
- Support multiple speakers and dynamic dialogue in a single video
Tech, availability, and safety
AvatarFX uses flow-based diffusion on a DiT architecture and is trained on diverse video data with low-quality content filtered out. Inference is accelerated via distillation to reduce steps without sacrificing quality. Early full access is planned for C.ai+ subscribers ($10/month), with a waitlist at character.ai/video. Character.AI applies policy filters (including blocks for minors, public figures, and prohibited content) and adds a watermark to generated videos.

