Gemini Omni Reference images to video, support 1-5 reference images to generate video.
Gemini Omni Extend model can support video to video(extend or edit),can attach references 1-5 images.
Google Gemini Omni Text to Video Model supports video fusion with reference to the first frame, the first and last frames, and three images.
Google Gemini Omni Text to Video Model.
Where Gemini's ability to reason meets the ability to create. One model for text to video, real-world physics, conversational editing.
Multimodal Prompt
A marble rolling fast on a chain-reaction track, continuous smooth shot, accurate physics...
Gemini Omni blends Gemini's world understanding with generative media — covering the full journey from idea to final cut.
Describe any scene in natural language and watch it come to life with coherent motion, accurate physics, and cinematic detail.
Animate a still image with first and last frames. Control the start and end state while the model fills in believable motion.
Supply up to 5 reference images to lock characters, styles, and scenes. Maintain perfect consistency across every shot.
Continue an existing clip or edit it step by step. Each turn builds on the last, keeping the scene coherent throughout.
Capabilities that turn Gemini Omni into a practical tool — not just a demo.
Refine a scene step by step in natural language — change the environment, adjust the action, swap objects, shift the camera, or add effects — while keeping the original scene coherent. No need to rebuild the full prompt each time.
Gemini Omni connects visual creation with knowledge of physics, history, biology, culture, and narrative logic — so scenes, objects, and actions behave the way they should. Outputs feel intentional, not random.
Blend text, up to 5 reference images, and source video into one controllable creation. Start from real creative materials instead of a blank text prompt, and keep subjects and styles consistent across shots.
Bring a subject's presence, expression, and delivery into an integrated scene — not a flat visual layer. Ideal for presenter clips, character-led storytelling, and interactive media.
Integrate Gemini Omni into your SaaS, game engine, or creative tool. Async generation with task polling keeps heavy workloads smooth and predictable.
# Gemini Omni video generation payload = { "model": "gemini-omni-t2v", # or -i2v / -r2v / -extend "prompt": "A marble rolling fast on a chain-reaction track, continuous smooth shot", "aspect_ratio": "16:9" } response = requests.post( "https://api.apipod.ai/v1/videos/generations", json=payload, headers={"Authorization": "Bearer Key"} )
Everything you need to plan production workloads on the Gemini Omni series.
Model Variants
t2v · i2v · r2v · extend
One model, four generation modes
Clip Duration
10s
Extend clips for longer content
Resolution
720p · 1080p
HD and Full HD output
Aspect Ratios
16:9 · 9:16 · 1:1
Landscape, vertical & square
Reference Images
Up to 5
For the r2v & extend modes
Prompt Length
4,000 chars
Rich, descriptive prompting
Gemini Omni is now available for production workloads via the APIPod Platform.