Skip to content

Commit a95539a

Browse files
authored
Merge pull request heygen-com#299 from heygen-com/feat/capture-improvements-v2
fix: double-audio scaffold, lint rules, docs guide, Gemini 3.1
2 parents 9ef864d + 274db7a commit a95539a

8 files changed

Lines changed: 542 additions & 53 deletions

File tree

docs/docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@
6969
{
7070
"group": "Guides",
7171
"pages": [
72+
"guides/website-to-video",
7273
"guides/prompting",
7374
"guides/gsap-animation",
7475
"guides/rendering",

docs/guides/website-to-video.mdx

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
---
2+
title: Website to Video
3+
description: "Capture any website and turn it into a production video with a single prompt."
4+
---
5+
6+
Give your AI agent a URL and a creative direction. It captures the site, extracts the brand identity, writes a script and storyboard, generates voiceover, builds animated compositions, and delivers a renderable video.
7+
8+
```
9+
"Create a 20-second product launch video from https://linear.app.
10+
Make it feel like an Apple keynote announcement."
11+
```
12+
13+
## Getting Started
14+
15+
<Steps>
16+
<Step title="Install skills">
17+
Skills teach your AI agent how to capture websites and create HyperFrames compositions. Install once — they persist across sessions.
18+
19+
```bash
20+
npx skills add heygen-com/hyperframes
21+
```
22+
23+
Works with [Claude Code](https://claude.ai/claude-code), [Cursor](https://cursor.sh), [Gemini CLI](https://github.com/google-gemini/gemini-cli), and [Codex CLI](https://github.com/openai/codex).
24+
</Step>
25+
<Step title="Prompt your agent">
26+
Open your agent in any directory and describe the video you want:
27+
28+
```
29+
Create a 25-second product launch video from https://example.com. Bold, cinematic, dark theme energy.
30+
```
31+
32+
The agent loads the skill when they see a URL and a video request, and runs the full pipeline — capture, design, script, storyboard, voiceover, build, validate.
33+
34+
<Note>
35+
Agents also trigger this skill automatically when they see a URL and a video request.
36+
</Note>
37+
</Step>
38+
<Step title="Preview">
39+
```bash
40+
npx hyperframes preview
41+
```
42+
43+
Opens the video in your browser. Edits reload automatically.
44+
</Step>
45+
<Step title="Render to MP4">
46+
```bash
47+
npx hyperframes render --output my-video.mp4
48+
```
49+
50+
```
51+
✓ Captured 750 frames in 12.4s
52+
✓ Encoded to my-video.mp4 (25.0s, 1920×1080, 6.8MB)
53+
```
54+
</Step>
55+
</Steps>
56+
57+
<Note>
58+
You don't need to run `npx hyperframes capture` manually — the skill instructs the agent to capture as the first step. The capture command is documented [below](#capture-command) for advanced use.
59+
</Note>
60+
61+
## How the Pipeline Works
62+
63+
The skill runs 7 steps. Each produces an artifact that feeds the next:
64+
65+
| Step | Output | What happens |
66+
|------|--------|-------------|
67+
| **Capture** | `captures/<name>/` | Extract screenshots, design tokens, fonts, assets, animations |
68+
| **Design** | `DESIGN.md` | Brand reference — colors, typography, do's and don'ts |
69+
| **Script** | `SCRIPT.md` | Narration text with hook, story, proof, CTA |
70+
| **Storyboard** | `STORYBOARD.md` | Per-beat creative direction — mood, assets, animations, transitions |
71+
| **VO + Timing** | `narration.wav` + `transcript.json` | TTS audio with word-level timestamps |
72+
| **Build** | `compositions/*.html` | Animated HTML compositions, one per beat |
73+
| **Validate** | Snapshot PNGs | Visual verification before delivery |
74+
75+
## Video Types
76+
77+
The prompt determines the format. Include a duration and creative direction:
78+
79+
| Type | Duration | Example |
80+
|------|----------|---------|
81+
| Social ad | 10–15s | _"15-second Instagram reel. Energetic, fast cuts."_ |
82+
| Product launch | 20–30s | _"25-second product launch. Apple keynote energy."_ |
83+
| Product tour | 30–60s | _"45-second tour showing the top 3 features."_ |
84+
| Brand reel | 15–30s | _"20-second brand video. Celebrate the design."_ |
85+
| Feature announcement | 15–25s | _"Feature announcement highlighting the new AI agents."_ |
86+
| Teaser | 8–15s | _"10-second teaser. Super minimal. Just the hook."_ |
87+
88+
<Tip>
89+
Creative direction matters more than format. _"Playful, hand-crafted feel"_ or _"dark, developer-focused, show code"_ shapes the storyboard and drives every visual decision the agent makes.
90+
</Tip>
91+
92+
## Enriching Captures with Gemini Vision
93+
94+
By default, captures describe assets using DOM context — alt text, nearby headings, CSS classes. Add a [Gemini API key](https://aistudio.google.com/apikey) for richer AI-powered descriptions using vision.
95+
96+
Create a `.env` file in your project root:
97+
98+
```bash
99+
echo "GEMINI_API_KEY=your-key-here" > .env
100+
```
101+
102+
<Tabs>
103+
<Tab title="Without Gemini">
104+
```
105+
- hero-bg.png — 582KB, section: "Hero", above fold
106+
```
107+
The agent knows the file exists and where it was on the page, but not what it looks like.
108+
</Tab>
109+
<Tab title="With Gemini">
110+
```
111+
- hero-bg.png — 582KB, A gradient wave in purple and blue sweeps
112+
across a dark background, creating an aurora-like effect.
113+
```
114+
The agent knows what the image actually shows, enabling better creative decisions in the storyboard.
115+
</Tab>
116+
</Tabs>
117+
118+
| Tier | Rate limit | Cost per image |
119+
|------|-----------|----------------|
120+
| Free | 5 RPM | Free |
121+
| Paid | 2,000 RPM | ~$0.001 |
122+
123+
A typical capture with 40 images costs about **$0.04** on the paid tier.
124+
125+
## Capture Command
126+
127+
The skill runs capture automatically, but you can run it directly for pre-caching, debugging, or using the data outside of video production.
128+
129+
```bash
130+
npx hyperframes capture https://stripe.com
131+
```
132+
133+
```
134+
◇ Captured Stripe | Financial Infrastructure → captures/stripe-com
135+
136+
Screenshots: 12
137+
Assets: 45
138+
Sections: 15
139+
Fonts: sohne-var
140+
```
141+
142+
| Flag | Default | Description |
143+
|------|---------|-------------|
144+
| `-o, --output` | `captures/<hostname>` | Output directory |
145+
| `--timeout` | `120000` | Page load timeout in ms |
146+
| `--skip-assets` | `false` | Skip downloading images and fonts |
147+
| `--max-screenshots` | `24` | Maximum screenshot count |
148+
| `--json` | `false` | Output structured JSON for programmatic use |
149+
150+
### What Gets Captured
151+
152+
| Data | Description |
153+
|------|-------------|
154+
| **Screenshots** | Viewport captures at every scroll depth — dynamic count based on page height |
155+
| **Colors** | Pixel-sampled dominant colors + computed styles, including oklch/lab conversion |
156+
| **Fonts** | CSS font families + downloaded woff2 files |
157+
| **Assets** | Images, SVGs with semantic names, Lottie animations, video previews |
158+
| **Text** | All visible text in DOM order |
159+
| **Animations** | Web Animations API, scroll-triggered animations, WebGL shaders |
160+
| **Sections** | Page structure with headings, types, background colors |
161+
| **CTAs** | Buttons and links detected by class names and text patterns |
162+
163+
## Snapshot Command
164+
165+
Capture key frames from a built video as PNGs — verify compositions without a full render:
166+
167+
```bash
168+
npx hyperframes snapshot my-project --at 2.9,10.4,18.7
169+
```
170+
171+
| Flag | Default | Description |
172+
|------|---------|-------------|
173+
| `--frames` | `5` | Number of evenly-spaced frames |
174+
| `--at` || Comma-separated timestamps in seconds |
175+
| `--timeout` | `5000` | Ms to wait for runtime to initialize |
176+
177+
## Iterating
178+
179+
You don't need to re-run the full pipeline to make changes:
180+
181+
- **Edit the storyboard**`STORYBOARD.md` is the creative north star. Change a beat's mood or assets, then ask the agent to rebuild just that beat.
182+
- **Edit a composition** — open `compositions/beat-3-proof.html` directly and tweak animations, colors, or layout.
183+
- **Rebuild one beat**_"Rebuild beat 2 with more energy. Use the product screenshot as full-bleed background."_
184+
185+
## Troubleshooting
186+
187+
<AccordionGroup>
188+
<Accordion title="Capture times out">
189+
Increase the timeout for sites with Cloudflare or heavy client-side rendering:
190+
191+
```bash
192+
npx hyperframes capture https://example.com --timeout 180000
193+
```
194+
</Accordion>
195+
<Accordion title="Few assets captured">
196+
Sites using frameworks like Framer lazy-load images via IntersectionObserver. The capture scrolls through the page to trigger loading, but very long pages may miss images near the bottom. Adding a Gemini key improves descriptions of captured assets, but doesn't increase the count.
197+
</Accordion>
198+
<Accordion title="Colors look wrong">
199+
The capture uses pixel sampling combined with DOM computed styles. Dark sites should show dark colors in the palette. Check the scroll screenshots in `captures/<name>/screenshots/` to see what the capture actually saw.
200+
</Accordion>
201+
<Accordion title="Agent doesn't find the skill">
202+
Verify skills are installed:
203+
204+
```bash
205+
npx skills add heygen-com/hyperframes
206+
```
207+
208+
Lead your prompt with _"Use the /website-to-hyperframes skill"_ for the most reliable results. Agents also discover it automatically when they see a URL and a video request.
209+
</Accordion>
210+
</AccordionGroup>
211+
212+
## Next Steps
213+
214+
<CardGroup cols={2}>
215+
<Card title="Quickstart" icon="rocket" href="/quickstart">
216+
New to HyperFrames? Start here.
217+
</Card>
218+
<Card title="GSAP Animation" icon="wand-magic-sparkles" href="/guides/gsap-animation">
219+
Animation patterns used in compositions.
220+
</Card>
221+
<Card title="Rendering" icon="film" href="/guides/rendering">
222+
Render to MP4, MOV, or WebM.
223+
</Card>
224+
<Card title="CLI Reference" icon="terminal" href="/packages/cli">
225+
Full command reference.
226+
</Card>
227+
</CardGroup>

docs/packages/cli.mdx

Lines changed: 66 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,13 @@ npx hyperframes <command>
1414
## When to Use
1515

1616
**Use the CLI when you want to:**
17-
- Create a new composition project from an example
18-
- Preview compositions with live hot reload during development
19-
- Render compositions to MP4 (locally or in Docker)
20-
- Lint compositions for structural issues
21-
- Check your environment for missing dependencies
17+
- Capture a website for video production (`capture`)
18+
- Create a new composition project from an example (`init`)
19+
- Preview compositions with live hot reload (`preview`)
20+
- Render compositions to MP4 locally or in Docker (`render`)
21+
- Lint compositions for structural issues (`lint`)
22+
- Capture key frames as PNG screenshots (`snapshot`)
23+
- Check your environment for missing dependencies (`doctor`)
2224

2325
**Use a different package if you want to:**
2426
- Render programmatically from Node.js code — use the [producer](/packages/producer)
@@ -321,6 +323,39 @@ This is suppressed in CI environments, non-TTY shells, and when `HYPERFRAMES_NO_
321323
<Tip>
322324
Combine `tts` with `transcribe` to generate narration and word-level timestamps for captions in a single workflow: generate the audio with `tts`, then transcribe the output with `transcribe` to get word-level timing.
323325
</Tip>
326+
### `capture`
327+
328+
Capture a website — extract screenshots, design tokens, fonts, assets, and animations for video production:
329+
330+
```bash
331+
npx hyperframes capture https://stripe.com
332+
npx hyperframes capture https://linear.app -o captures/linear
333+
npx hyperframes capture https://example.com --json
334+
```
335+
336+
```
337+
◇ Captured Stripe | Financial Infrastructure → captures/stripe-com
338+
339+
Screenshots: 12
340+
Assets: 45
341+
Sections: 15
342+
Fonts: sohne-var
343+
```
344+
345+
| Flag | Description |
346+
|------|-------------|
347+
| `-o, --output` | Output directory (default: `captures/<hostname>`) |
348+
| `--timeout` | Page load timeout in ms (default: 120000) |
349+
| `--skip-assets` | Skip downloading images and fonts |
350+
| `--max-screenshots` | Maximum screenshot count (default: 24) |
351+
| `--json` | Output structured JSON for programmatic use |
352+
353+
The capture command extracts everything an AI agent needs to understand a website's visual identity: viewport screenshots at every scroll depth, color palette (pixel-sampled + DOM computed), font files, images with semantic names, SVGs, Lottie animations, video previews, WebGL shaders, visible text, and page structure.
354+
355+
Output is a self-contained directory with a `CLAUDE.md` file that any AI agent can read to understand the captured site. Used by the `/website-to-hyperframes` skill as step 1 of the video production pipeline.
356+
357+
Set `GEMINI_API_KEY` in a `.env` file for AI-powered image descriptions via Gemini vision (~$0.001/image). See the [Website to Video](/guides/website-to-video#enriching-captures-with-gemini-vision) guide for details.
358+
324359
</Tab>
325360
<Tab title="Preview">
326361
### `preview`
@@ -375,6 +410,32 @@ This is suppressed in CI environments, non-TTY shells, and when `HYPERFRAMES_NO_
375410
- **Info** (``) — informational notices, shown only with `--verbose`
376411

377412
The linter detects missing attributes, missing adapter libraries (GSAP, Lottie, Three.js), structural problems, and more. See [Common Mistakes](/guides/common-mistakes) for details on each rule.
413+
414+
### `snapshot`
415+
416+
Capture key frames from a composition as PNG screenshots — verify visual output without a full render:
417+
418+
```bash
419+
npx hyperframes snapshot my-project --at 2.9,10.4,18.7
420+
npx hyperframes snapshot my-project --frames 10
421+
```
422+
423+
```
424+
◆ Capturing 3 frames at [2.9s, 10.4s, 18.7s] from my-project
425+
426+
◇ 3 snapshots saved to snapshots/
427+
snapshots/frame-00-at-2.9s.png
428+
snapshots/frame-01-at-10.4s.png
429+
snapshots/frame-02-at-18.7s.png
430+
```
431+
432+
| Flag | Description |
433+
|------|-------------|
434+
| `--frames` | Number of evenly-spaced frames to capture (default: 5) |
435+
| `--at` | Comma-separated timestamps in seconds (e.g., `3.0,10.5,18.0`) |
436+
| `--timeout` | Ms to wait for runtime to initialize (default: 5000) |
437+
438+
The snapshot command bundles the project, serves it locally, launches headless Chrome, seeks to each timestamp, and captures a 1920×1080 PNG. Useful for visual verification during the build step of the [website-to-video](/guides/website-to-video) workflow.
378439
</Tab>
379440
<Tab title="Build">
380441
### `render`

packages/cli/src/capture/contentExtractor.ts

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,11 @@ export async function captionImagesWithGemini(
174174
// Free tier: 5 RPM → batch 5, 12s pause (~$0 but slow)
175175
// Paid tier: 2000 RPM → batch 20, 1s pause (~$0.001/image, fast)
176176
// We try a larger batch first; if rate-limited, fall back to smaller batches.
177-
const model = "gemini-2.5-flash";
177+
// Default is a preview model — update when GA ships.
178+
// Benchmark (49 images, paid tier): 3.1-flash-lite-preview ~507ms/img 131ch avg,
179+
// 2.5-flash-lite ~230ms/img 117ch avg. Preview has richer captions but higher variance.
180+
// Override: HYPERFRAMES_GEMINI_MODEL=gemini-2.5-flash-lite
181+
const model = process.env.HYPERFRAMES_GEMINI_MODEL || "gemini-3.1-flash-lite-preview";
178182
const BATCH_SIZE = 20;
179183
for (let i = 0; i < imageFiles.length; i += BATCH_SIZE) {
180184
const batch = imageFiles.slice(i, i + BATCH_SIZE);
@@ -210,7 +214,7 @@ export async function captionImagesWithGemini(
210214
geminiCaptions[result.value.file] = result.value.caption;
211215
}
212216
}
213-
// Pace requests to stay under free tier rate limits (5 RPM for gemini-2.5-flash)
217+
// Pace requests between batches (paid tier: 2000+ RPM, free tier: rate-limited)
214218
if (i + BATCH_SIZE < imageFiles.length) {
215219
await new Promise((r) => setTimeout(r, 2000)); // 2s pause between batches — paid tier handles 2000 RPM, free tier retries via Promise.allSettled
216220
}

0 commit comments

Comments
 (0)