guideTutorials & how-to guides7 min read

How to prompt Gemini Omni Flash for 10-second video

Gemini Omni Flash has an unusual prompt surface: no negative param, two aspect ratios, English-only, and two distinct prompting modes. Here's how to write both well.

OmniArt TeamJul 1, 2026

Most AI video prompt guides teach you to write one thing: a rich, detailed paragraph you hand to the model once. Gemini Omni Flash breaks that assumption. Its developer API (live since June 30) is built around two different prompting acts — the first generation, and then a running conversation of edits that each reshape the same clip. Write for one and ignore the other and you leave most of the model on the table.

Omni Flash's prompt surface is also unusual in what it removes. There's no negative-prompt field, no temperature dial, no system instruction, and only two aspect ratios. Those aren't gaps to work around blindly — each one changes how you should phrase a prompt. This guide covers both modes and the constraints that shape them.

Note

As of July 1, 2026, Gemini Omni Flash is available through Google AI Studio, the Gemini API, the Gemini app, and Google Flow — not inside OmniArt's workspace yet. The sections below describe prompting Google's own tools directly; the closing section maps which habits carry over to the video models that are live on OmniArt today.

Two prompt modes, not one

Every Omni Flash session has two kinds of prompt, and they reward different writing.

The first-generation prompt is a complete brief for a single 10-second beat: subject, motion, camera, light, sound, style. It behaves like any strong text-to-video or image-to-video prompt — front-load the detail, be specific, describe the whole shot at once.

The conversational-edit instruction is the opposite. It's short, it names exactly one change, and it assumes the model already holds the prior clip in context. "Make the lighting golden hour." "Swap the sedan for a pickup." The model applies the change while preserving everything you didn't mention — via the previous_interaction_id that carries session state across up to three sequential edits through the Interactions API. Pile three changes into one edit instruction and you lose the precision that makes the mode worth using.

The mental model: compose in the first prompt, direct in the follow-ups. Get a solid base clip, then refine it the way you'd brief a director mid-shoot — one note at a time.

The API constraints that shape your wording

Omni Flash's parameter list is short by design. Each omission has a prompting consequence:

Constraint	What it means for the prompt
No negative-prompt field	Phrase exclusions inside the prompt itself — "an empty street, no pedestrians, no traffic" rather than a separate negative list
No temperature / top_p / system instruction	You can't dial variance or set a persistent style rule — bake tone and style into the prompt text every time
Aspect ratio: 9:16 or 16:9 only	Choose orientation up front; there's no square or cinematic-wide option, so frame for vertical or horizontal from the first word
Audio described, never uploaded	You can't hand it a track to match — you describe the sound you want in words (see below)
English fully supported; other languages untested	Write prompts in English for predictable results
10-second hard cap	One clear action per generation — not a shot list

Warning

Omni Flash has no audio-reference upload. You cannot give it a music bed or a voice sample to sync to. It generates an audio track by default, and your only control is the words in the prompt — so sound design has to be written, not attached.

A template for the first generation

Because 10 seconds holds one beat, the strongest first prompts describe a single continuous moment with every layer specified. Six slots cover almost any shot:

Subject — who or what is on screen, described concretely
Motion — the one action that plays out across the clip
Camera — a single move, not a sequence ("slow push in", "locked-off wide")
Lighting — direction, quality, time of day
Sound design — the audio you want generated, in words
Style — palette, era, film reference, texture

A worked example:

"A ceramic pour-over coffee dripper on a pale oak counter, steam rising as dark coffee streams into the glass carafe below. Slow push in on the drip. Soft morning light from a window camera-left, warm and diffused. Sound: gentle water trickle, distant kitchen ambience, no music. Muted editorial palette, shallow depth of field, shot on a fast prime lens."

Notice the exclusions live inside the sentence ("no music"), the camera is one move, and the sound is spelled out. That's the whole discipline.

Conversational editing: the vocabulary that lands

Once you have a base clip, edits are where Omni Flash pulls ahead of generate-and-discard workflows. Keep each instruction to one intent, and lean on a consistent verb vocabulary the model reads cleanly:

Relight — "make it golden hour", "add a cool rim light from behind"
Replace — "swap the coffee dripper for a French press"
Restyle — "make it feel like 1970s film stock"
Recolor — "change the mug to matte black"
Re-time — "slow the pour down", "let the steam linger longer"

Two rules keep the thread coherent. One change per turn — the model preserves what you don't mention, so a single-note edit is both more predictable and easier to undo by re-prompting. And build on the prior turn's language — reuse the nouns you established ("the mug", "the pour") so the model anchors to the same elements rather than re-inferring the scene.

Tip

The three-edit chain is a budget, not a suggestion. Plan the base prompt so it needs the fewest follow-ups — a strong first generation leaves your edit turns for genuine creative changes, not for fixing things the first prompt could have specified.

Working around the current limits

A few limits aren't prompt-solvable, and it's worth prompting with them in mind rather than fighting them:

10-second cap. There's no scene extension in the API, so don't write prompts that imply a longer arc. Design one beat that stands alone.
Character consistency across scene changes is an acknowledged weak point. If likeness matters, keep edits within the same scene rather than asking the model to relocate a character to a new setting.
Video references over 3 seconds aren't fully processed. Keep any reference clip short and to the point.
No multi-video referencing and no voice editing — both are unsupported, so plan those steps into a separate tool rather than the prompt.

None of these are disqualifying for a fast, short-form iteration tool. They just mean Omni Flash rewards prompts scoped to what it does well: one tight beat, refined conversationally.

What transfers to OmniArt today

Omni Flash isn't in OmniArt's workspace yet, but almost every habit above transfers to the video models that are — because the underlying discipline (one clear beat, specificity over keyword soup, sound written into the prompt) is model-agnostic.

Reference-driven generation maps directly to Seedance 2.0, live on OmniArt, which accepts up to nine images, three videos, and three audio files bound to roles with @image1 / @video1 syntax — the "compose from assets" idea, with more inputs than Omni Flash offers.
Cinematic camera language maps to Veo 3.1, which interprets motion verbs like "drift", "glide", and "dolly in" with restraint.
The six-slot template (subject, motion, camera, light, sound, style) is the same skeleton that produces clean results on every video model in the workspace.

Open the video workspace on OmniArt, pick the model that fits the shot, and write the first prompt as one complete beat. When Omni Flash lands, the two-mode workflow above is the part you'll add — the prompt craft is already the same.

Ready to Create?

Start generating amazing content with AI

Get started free