feat(ai-gemini): Gemini Omni Flash video generation via the Interactions API#886
Conversation
…nteractions API Add gemini-omni-flash-preview to the Gemini video adapter. Omni only serves the Interactions API (generateContent rejects it with 400), so the adapter now routes by model: Veo models keep the :predictLongRunning operations flow, while Omni creates a background interaction with response_modalities: ['video'], polls it by id, and returns the inline base64 MP4 as a data: URL (Files-API URI delivery passes through). Usage maps from output_tokens_by_modality, size maps onto response_format.aspect_ratio, and modelOptions.previous_interaction_id chains conversational video edits. - model-meta: GEMINI_OMNI_FLASH_PREVIEW ($0.10/sec video+audio output) + GEMINI_INTERACTIONS_VIDEO_MODELS - provider options: GeminiOmniVideoProviderOptions derived from the SDK's CreateModelInteractionParamsNonStreaming; per-model input modalities (Omni accepts image+video parts) and fixed 10s duration - @google/genai floor bumped to ^2.10.0 for the interactions surface - 17 new unit tests; new interactions-video E2E feature backed by a dedicated aimock mount (native interactions text handling untouched) - docs/media/video-generation.md + media-generation skill updates Verified live against the Gemini API: background job completed in ~45s and returned a valid MP4 with video-modality usage; the SDK's typed interactions.create works with Step-list input, so no raw REST fallback is needed. Closes #871 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add gemini-omni-flash-preview (text-to-video + image-to-video) to the ts-react-media example, exercising every Omni input: text prompts, a start image, an attached reference/edit video clip (Omni-only — never sent to other providers), and conversational editing that chains a new prompt onto a completed generation via previous_interaction_id. Also fixes a latent core type bug this surfaced: generateVideo / getVideoJobStatus constrained adapters as VideoAdapter<string, any, any, any>, leaving the duration generic at its Record<string, number> default — any adapter with a narrowed per-model duration union (Omni's 10, Veo's 4|6|8) failed assignability under strict function-type contravariance. All video-activity constraints now span all six VideoAdapter generics. Verified live: Omni edit chaining (previous_interaction_id) against the real Gemini API returned an edited 10s MP4; example dev server boots and type-checks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Changeset Version Preview4 package(s) bumped directly, 20 bumped as dependents. 🟨 Minor bumps
🟩 Patch bumps
|
|
View your CI Pipeline Execution ↗ for commit b427cc4
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-acp
@tanstack/ai-angular
@tanstack/ai-anthropic
@tanstack/ai-bedrock
@tanstack/ai-claude-code
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-codex
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-grok-build
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-mcp
@tanstack/ai-mistral
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-opencode
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-sandbox
@tanstack/ai-sandbox-cloudflare
@tanstack/ai-sandbox-daytona
@tanstack/ai-sandbox-docker
@tanstack/ai-sandbox-local-process
@tanstack/ai-sandbox-sprites
@tanstack/ai-sandbox-vercel
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-utils
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/openai-base
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
The issue's live verification concluded Omni clips were a fixed 10
seconds, but response_format.duration is a real request field — just
undocumented. Verified against the live API: it takes a "<seconds>s"
string, accepts any value in the 3-10s range including fractional
seconds (a 3s request returns a 3.008s MP4 per ffprobe), rejects
out-of-range values with explicit minimum/maximum errors, and defaults
to 10s when omitted.
Omni's duration is now typed number with availableDurations() =
{ kind: 'range', min: 3, max: 10, unit: 'seconds' } and snapDuration
clamping into it; the adapter maps the generateVideo duration option
onto response_format.duration.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Reject out-of-range Omni durations at job creation with a clear local error instead of silently passing them to the live API - Map requires_action interactions to a failed status so polling can't spin until timeout (reachable via previous_interaction_id chaining) - Surface failed job statuses in the ts-react-media example instead of polling forever on a pending spinner - Add a compile-time regression test guarding the generateVideo VideoAdapter generic-arity fix, plus unit tests for duration rejection, fractional pass-through, and requires_action mapping - Fix stale doc/comment claims: Veo 2/3 model lists, "fixed 10s" clips, "clamped" duration wording, and content-block ordering (images, then videos, then text) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Closes #871
Summary
Adds Gemini Omni Flash (
gemini-omni-flash-preview) — Google's multimodal video-generation model with conversational editing — to@tanstack/ai-gemini. Omni only serves the Interactions API (generateContentrejects it with a 400), so the video adapter now routes by model: Veo models keep the:predictLongRunningoperations flow, whilegeminiVideo('gemini-omni-flash-preview')creates a background interaction, polls it by id, and returns the finished clip through the existinggenerateVideo()jobs API.What changed
packages/ai-gemini): Interactions-based job path —interactions.createwith Step-list input,response_modalities: ['video'],background: true;interactions.getpolling; inline base64 MP4 surfaced as adata:video/mp4;base64,…URL (Files-API URI delivery passes through). Usage maps fromoutput_tokens_by_modality.sizemaps ontoresponse_format.aspect_ratio('16:9' | '9:16') anddurationontoresponse_format.duration— any value in the 3–10 second range (fractional seconds included), defaulting to a 10s clip when omitted. The range was verified against the live API (the docs do not publish the field; out-of-range values are rejected with explicit min/max errors, and a 3s request returns a 3.008s MP4 per ffprobe). Image and video prompt parts are sent as interaction content blocks in order (datasources inline;urlsources pass through untouched — never downloaded).modelOptions.previous_interaction_idchains conversational video edits.GEMINI_OMNI_FLASH_PREVIEW($0.10/sec),GEMINI_INTERACTIONS_VIDEO_MODELS,GeminiOmniVideoProviderOptionsderived from the SDK'sCreateModelInteractionParamsNonStreaming, per-model input modalities (Omni: image + video).@tanstack/ai, patch):generateVideo/getVideoJobStatusconstrained adapters asVideoAdapter<string, any, any, any>, which rejected any adapter with a narrowed per-model duration union (Omni's10, Veo's4 | 6 | 8) under strict contravariance. Constraints now span all six generics.@google/genaifloor^2.8.0→^2.10.0(Interactions API surface).examples/ts-react-media): Omni text-to-video + image-to-video entries exercising all inputs — text, start image, attached reference/edit video clip (Omni-only), and an "Edit" box on completed videos that chainsprevious_interaction_id.docs/media/video-generation.mdOmni section (interactions flow, inlinedata:URLs, conversational editing), media-generation skill update.Testing
{promptTokens: 16, completionTokens: 58728, totalTokens: 59052}) and edit chaining viaprevious_interaction_id(~60s, edited clip; prior video reported as input tokens). The issue's SDK-vs-REST 400 caveat is resolved — the typedinteractions.create()works with Step-list input, no raw REST fallback needed.interactions-videoE2E feature backed by a dedicated aimock mount at/omni-video(aimock's native interactions text handling untouched); all video + stateful-interactions specs pass.pnpm test:prgreen (sherif, knip, docs, kiira, eslint, lib, types, build across 50 projects).🤖 Generated with Claude Code