Google Gemini Omni

4 models available

Provider: Google

Updated: Recently

Google Gemini Omni is a multimodal video generation and editing model. It can convert text, images and video references into coherent videos, and delivers stable scene consistency, world understanding capabilities and natural language control.

Available Models

Gemini Omni Reference to video

gemini-omni-r2v

Gemini Omni Reference images to video, support 1-5 reference images to generate video.

Gemini Omni Extend

gemini-omni-extend

Gemini Omni Extend model can support video to video(extend or edit)，can attach references 1-5 images.

Gemini Omni Image to Video

gemini-omni-i2v

Google Gemini Omni Text to Video Model supports video fusion with reference to the first frame, the first and last frames, and three images.

Gemini Omni Text to Video

gemini-omni-t2v

Google Gemini Omni Text to Video Model.

Multimodal AI Video

Gemini Omni API:
Create Anything from Anything

Where Gemini's ability to reason meets the ability to create. One model for text to video, real-world physics, conversational editing.

View Documentation

10s+

Clip Duration

1080p

HD Resolution

Generation Modes

Multimodal Prompt

A marble rolling fast on a chain-reaction track, continuous smooth shot, accurate physics...

Four Modes, One Intelligent Engine

Gemini Omni blends Gemini's world understanding with generative media — covering the full journey from idea to final cut.

Text → Video

Text to Video

Describe any scene in natural language and watch it come to life with coherent motion, accurate physics, and cinematic detail.

Intuitive gravity & fluid dynamics
Grounded in real-world knowledge
Leading on MovieGenBench

Image → Video

Image to Video

Animate a still image with first and last frames. Control the start and end state while the model fills in believable motion.

First & last frame anchoring
Natural, lifelike motion
Precise start & end control

Reference → Video

Reference to Video

Supply up to 5 reference images to lock characters, styles, and scenes. Maintain perfect consistency across every shot.

Up to 5 reference images
Character & style consistency
Leading reference adherence

Video → Video

Video Extension

Continue an existing clip or edit it step by step. Each turn builds on the last, keeping the scene coherent throughout.

Seamless clip continuation
Step-by-step editing
Consistent scene coherence

Built for Real Creative Work

Capabilities that turn Gemini Omni into a practical tool — not just a demo.

Edit Videos Through Conversation

Refine a scene step by step in natural language — change the environment, adjust the action, swap objects, shift the camera, or add effects — while keeping the original scene coherent. No need to rebuild the full prompt each time.

Real-World Logic in Every Frame

Gemini Omni connects visual creation with knowledge of physics, history, biology, culture, and narrative logic — so scenes, objects, and actions behave the way they should. Outputs feel intentional, not random.

Multimodal References, Made Practical

Blend text, up to 5 reference images, and source video into one controllable creation. Start from real creative materials instead of a blank text prompt, and keep subjects and styles consistent across shots.

Avatar & Character Performance

Bring a subject's presence, expression, and delivery into an integrated scene — not a flat visual layer. Ideal for presenter clips, character-led storytelling, and interactive media.

Scale with the Gemini Omni API

Integrate Gemini Omni into your SaaS, game engine, or creative tool. Async generation with task polling keeps heavy workloads smooth and predictable.

Clean REST API

OpenAI-compatible /v1/videos endpoint

Async Task Polling

Queue, track, and retrieve with ease

Get API Access

# Gemini Omni video generation
payload = {
  "model": "gemini-omni-t2v", # or -i2v / -r2v / -extend
  "prompt": "A marble rolling fast on a chain-reaction track, continuous smooth shot",
  "aspect_ratio": "16:9"
}

response = requests.post(
  "https://api.apipod.ai/v1/videos/generations",
  json=payload,
  headers={"Authorization": "Bearer Key"}
)

Technical Specifications

Everything you need to plan production workloads on the Gemini Omni series.

Model Variants

t2v · i2v · r2v · extend

One model, four generation modes

Clip Duration

10s

Extend clips for longer content

Resolution

720p · 1080p

HD and Full HD output

Aspect Ratios

16:9 · 9:16 · 1:1

Landscape, vertical & square

Reference Images

Up to 5

For the r2v & extend modes

Prompt Length

4,000 chars

Rich, descriptive prompting

Questions & Answers

What is Gemini Omni?

Gemini Omni is Google's multimodal video generation and editing model. It combines Gemini's reasoning and world understanding with generative media, turning any mix of text, images, and video into coherent, physically plausible video output.

What generation modes are available?

Four modes: text-to-video (gemini-omni-t2v), image-to-video with first/last frames (gemini-omni-i2v), reference-to-video with up to 5 images (gemini-omni-r2v), and video extension/editing (gemini-omni-extend).

How many reference images can I use?

The reference-to-video (r2v) mode accepts up to 5 reference images to anchor characters, styles, and scenes. The image-to-video (i2v) mode takes a first frame and an optional last frame.

Does Gemini Omni generate native audio?

Yes. Native audio generation is a core capability of the Gemini Omni family, producing synchronized soundscapes that match the action on screen.

How long can each clip be?

Each clip is 10 seconds. For longer pieces, use the video extension mode to continue a clip while preserving scene coherence.

Which aspect ratios are supported?

Gemini Omni supports 16:9 for cinematic landscape, 9:16 for vertical and short-form content, and 1:1 for square — all at 720p or 1080p resolution.

Is there a commercial license for APIPod generated content?

All content generated via our API is protected by our safety filters. For commercial ownership, it typically belongs to the account holder who generated the content. Please refer to our Terms of Service for full details.

Create Anything from Anything.

Gemini Omni is now available for production workloads via the APIPod Platform.

Talk to us

Google Gemini Omni

Available Models

Gemini Omni Reference to video

Gemini Omni Extend

Gemini Omni Image to Video

Gemini Omni Text to Video

Gemini Omni API: Create Anything from Anything

Four Modes, One Intelligent Engine

Text to Video

Image to Video

Reference to Video

Video Extension

Built for Real Creative Work

Edit Videos Through Conversation

Real-World Logic in Every Frame

Multimodal References, Made Practical

Avatar & Character Performance

Scale with the Gemini Omni API

Technical Specifications

Questions & Answers

Create Anything from Anything.

Gemini Omni API:
Create Anything from Anything