VideoPoet is a Google research model that turns text prompts into realistic or stylized videos. It’s built on a large language model adapted for zero-shot video generation, so it can handle different tasks without being trained for each specific request.
Generation modes
VideoPoet supports multiple ways to create video content:
- Text-to-video generation from a written prompt
- Image-to-video generation that animates a single still image
- Short scenes and visual story snippets based on minimal input
Editing and stylization
In addition to generating videos from scratch, VideoPoet can modify existing footage:
- Stylization to change the visual look of a scene
- Scene-level style changes for creative variations
- Inpainting to fill in, replace, or redraw parts of a video
Who it’s for
VideoPoet is primarily a demonstration of a new approach: using an autoregressive language model to control video generation. It’s aimed at researchers, developers, and enthusiasts exploring multimodal models and creative AI workflows.

