Skip to content

Support Gemini Omni Flash (gemini-omni-flash-preview) via the Interactions API #871

Description

@tombeckenham

Summary

Add support for Gemini Omni Flash (gemini-omni-flash-preview) — Google's new multimodal video-generation model — to the @tanstack/ai-gemini adapter.

Split out from #870 (which now ships Nano Banana 2 Lite only). Omni Flash needs a genuinely new request path, not just a model-meta entry, so it's tracked separately.

Why this is not a Veo/model-meta change

Verified against the live Gemini API (2026-07-01):

  • Omni Flash rejects generateContent:
    400 "This model only supports Interactions API."
  • It is not a predictLongRunning (Veo) model either — the existing video adapter's client.models.generateVideos() path does not apply.
  • The model's advertised supportedGenerationMethods is generateContent, countTokens, but in practice it only serves the Interactions API.

So Omni cannot be added to GEMINI_VIDEO_MODELS and reuse the Veo flow. It needs its own Interactions-based job path.

Verified working flow (Interactions API)

POST /v1beta/interactions
{
  "model": "gemini-omni-flash-preview",
  "input": "<prompt string | structured content>",
  "response_modalities": ["video"],
  "background": true
}
→ { "id": "v1_…", "status": "in_progress", "object": "interaction", "model": … }

GET /v1beta/interactions/{id}     # poll (~24s for a 10s clip)
→ {
    "status": "completed",
    "usage": { "output_tokens_by_modality": [{ "modality": "video", "tokens": 57920 }], … },
    "steps": [
      { "type": "user_input",  "content": [{ "type": "text", "text": "…" }] },
      { "type": "thought", "signature": "…" },
      { "type": "model_output", "content": [
          { "type": "video", "mime_type": "video/mp4", "data": "<base64>" }
      ]}
    ]
  }

Key differences from Veo:

  • Output video is returned as inline base64 in steps[].content[] (a model_output step), not a Veo-style file URI.
  • Usage is reported as output_tokens_by_modality (video tokens), not per-second in the response body.

SDK / dependency notes

  • The installed @google/genai@2.10.0 already exposes the Interactions API surface (client.interactions.create/get/cancel, GeminiNextGenInteractions, plus interaction.completed / video.generated webhook events).
  • packages/ai-gemini/package.json currently declares "@google/genai": "^2.8.0". If we build on client.interactions, bump the floor to ^2.10.0 so consumers are guaranteed to have it.
  • Caveat found during verification: the SDK's typed interactions.create() wrapper returned a bare 400 for this shape, while the raw REST call succeeded. The adapter may need to call the REST endpoint directly (or match the SDK param shape exactly). Worth reconciling during implementation.
  • There is already an experimental interactions adapter to model this on: packages/ai-gemini/src/experimental/text-interactions/adapter.ts.

Model facts (from Google docs + live API)

  • Model id: gemini-omni-flash-preview
  • Inputs: text, image, video (audio references + video refs >3s / scene extension not yet supported in the API)
  • Output: MP4 video with audio
  • Clip length: 10 seconds (fixed today; longer "coming soon")
  • Resolution: 720p
  • Aspect ratios: 16:9 (default), 9:16
  • Pricing: $0.10 per second of video output

Scope / tasks

  • Decide adapter shape: new interactions-based video job path (create → poll → decode inline mp4) that fits BaseVideoAdapter's job model, or a dedicated experimental adapter.
  • Handle inline base64 video output (vs. the Veo URI download path).
  • Map usage from output_tokens_by_modality.
  • model-meta.ts entry (fixed 10s duration, 720p, 16:9/9:16, $0.10/sec) once wired to the right path.
  • Bump @google/genai floor to ^2.10.0 if using client.interactions.
  • Unit tests (mock the interactions create/get). E2E is blocked upstream on aimock async-video (see existing Veo/Grok video note); rely on unit coverage.
  • Docs + media-generation skill updates.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions