Skip to content

Commit 6ef5297

Browse files
committed
fix: resolve merge conflict in shader-transitions capture.ts
Merges main's refactored capture (CaptureSceneOptions, forceVisible, stabilizeTransformedBoxShadows, foreignObjectRendering fallback) with our HTML-in-Canvas drawElementImage capture path. The native capture tries first and falls back to html2canvas on failure.
2 parents 3aae641 + b0fb664 commit 6ef5297

47 files changed

Lines changed: 4587 additions & 559 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/guides/remove-background.mdx

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,71 @@ npx hyperframes remove-background subject.mp4 -o transparent.mov # editi
8080
npx hyperframes remove-background portrait.jpg -o cutout.png # still image
8181
```
8282

83+
## Layer separation: emit the cutout and the background plate together
84+
85+
Pass `--background-output` (alias `-b`) to write a *second* transparent video alongside the cutout. Same source RGB, alpha is the *inverse* mask — opaque where the surroundings were, transparent where the subject is. The result is a clean two-layer separation in a single inference pass:
86+
87+
```bash Terminal
88+
npx hyperframes remove-background subject.mp4 \
89+
-o subject.webm \
90+
--background-output plate.webm
91+
```
92+
93+
| Output | Alpha | Use it as |
94+
| ------ | ----- | --------- |
95+
| `subject.webm` | Mask — subject opaque | Foreground layer (top of stack) |
96+
| `plate.webm` | `255 − mask` — subject region transparent | Background layer; place anything you want **under the subject's silhouette** between this and `subject.webm` |
97+
98+
Both encoders share the source W/H/fps and your `--quality` preset, so the layers are pixel-aligned. Encode cost roughly doubles; segmentation cost is unchanged.
99+
100+
<Tip>
101+
**This is a hole-cut plate, not an inpainted clean plate.** The subject region in `plate.webm` is fully transparent — you have to composite something opaque under it (a graphic, a blurred copy, a different scene) to fill the hole. If you need an actual filled background where the subject was, use a video inpainter (LaMa, ProPainter, RunwayML Inpaint) — `remove-background` is not the right tool for that.
102+
</Tip>
103+
104+
### Hole-cut vs. clean plate — when does the difference matter?
105+
106+
A **hole-cut plate** keeps the original surroundings and makes the subject region transparent. A **clean plate** fills the subject region with reconstructed background — produced by a separate inpainting model. Display each alone over black:
107+
108+
| | Hole-cut plate (this command) | Clean plate (inpainted) |
109+
| --- | --- | --- |
110+
| Subject region | Transparent silhouette | Reconstructed background pixels |
111+
| What you see alone | A person-shaped hole | An empty room |
112+
| Cost | One inference pass, one extra ffmpeg encode | A second model (LaMa, ProPainter, E2FGVI) |
113+
| Tool | `remove-background --background-output` | Outside this CLI |
114+
115+
The line is: **does anything ever need to be visible *through* the subject's silhouette where the subject used to be?**
116+
117+
| Use case | What you need |
118+
| --- | --- |
119+
| Text/graphics live *between* the cutout and the plate (the example above) | **Hole-cut** — the graphics fill the hole. |
120+
| Composite the subject onto an unrelated scene | Neither. Just use `subject.webm`; the plate is irrelevant. |
121+
| Show "the room without the person" as a real background | **Clean plate** — a hole-cut plate would show a transparent void. |
122+
| Replace the person with a different subject (re-target) | **Clean plate** — the new subject needs real pixels under it. |
123+
| VFX rotoscoping / "remove an extra from this take" | **Clean plate** — the canonical inpainting use case. |
124+
125+
If something opaque always covers the silhouette, hole-cut is sufficient and ~1000× cheaper than running an inpainter.
126+
127+
### The two-layer composition pattern
128+
129+
The two-layer pattern is functionally a drop-in for [text-behind-subject](#text-behind-subject-the-recommended-layout) without needing the original `presenter.mp4` in the project — the plate replaces it as the bottom layer:
130+
131+
```html
132+
<!-- z=1 inverse-alpha plate fills everything except the subject's silhouette -->
133+
<video src="plate.webm" data-start="0" data-duration="6" data-track-index="0" muted playsinline></video>
134+
135+
<!-- z=2 anything you want occluded by the subject lives here -->
136+
<h1 style="z-index:2; position:absolute; top:50%; left:50%; transform:translate(-50%,-50%);">
137+
MAKE IT IN HYPERFRAMES
138+
</h1>
139+
140+
<!-- z=3 the cutout puts the subject back on top -->
141+
<div class="cutout-wrap" style="position:absolute;inset:0;z-index:3">
142+
<video src="subject.webm" data-start="0" data-duration="6" data-track-index="1" muted playsinline></video>
143+
</div>
144+
```
145+
146+
Constraints: the flag requires a video input and `.webm` or `.mov` for both outputs. It's not valid for image inputs (no temporal pairing to do) and won't accept `.png` for the plate.
147+
83148
## Performance
84149

85150
Real-world numbers from the [matting eval](https://www.heygenverse.com/a/0dd5a431-1832-4858-862d-de7fb7d02654), running u²-net_human_seg on a 4-second 1080p clip:

docs/packages/cli.mdx

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,10 @@ This is suppressed in CI environments, non-TTY shells, and when `HYPERFRAMES_NO_
356356
# Single image → transparent PNG
357357
npx hyperframes remove-background portrait.jpg -o cutout.png
358358

359+
# Layer separation: cutout AND inverse-alpha background plate in one pass
360+
npx hyperframes remove-background avatar.mp4 \
361+
-o subject.webm --background-output plate.webm
362+
359363
# Force CPU on a machine that has CoreML or CUDA
360364
npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cpu
361365

@@ -366,8 +370,9 @@ This is suppressed in CI environments, non-TTY shells, and when `HYPERFRAMES_NO_
366370
| Flag | Description |
367371
|------|-------------|
368372
| `--output, -o` | Output path. Format inferred from extension: `.webm` (default), `.mov`, `.png` |
373+
| `--background-output, -b` | Optional second output: inverse-alpha background plate (subject region transparent, surroundings opaque). Same source RGB, complementary mask. Must be `.webm` or `.mov`. Hole-cut, not inpainted — composite something underneath to fill the hole. |
369374
| `--device` | Execution provider: `auto` (default), `cpu`, `coreml`, `cuda` |
370-
| `--quality` | WebM encoder preset: `fast` (crf 30, smallest), `balanced` (crf 18, default), `best` (crf 12, near-lossless). Higher quality keeps the cutout's RGB closer to the source mp4 — important when overlaying the cutout on its own source for text-behind-subject effects. Ignored for `.mov` / `.png`. |
375+
| `--quality` | WebM encoder preset: `fast` (crf 30, smallest), `balanced` (crf 18, default), `best` (crf 12, near-lossless). Higher quality keeps the cutout's RGB closer to the source mp4 — important when overlaying the cutout on its own source for text-behind-subject effects. Applies to both `--output` and `--background-output`. Ignored for `.mov` / `.png`. |
371376
| `--info` | Print detected execution providers and exit (no render) |
372377
| `--json` | Output result as JSON |
373378

packages/cli/src/background-removal/inference.test.ts

Lines changed: 110 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import { describe, expect, it } from "vitest";
2-
import { MEAN, STD } from "./inference.js";
2+
import { MEAN, STD, applyMask } from "./inference.js";
33

44
// Regression: the u2net_human_seg model was trained with ImageNet
55
// normalization. Drifting away from these exact values changes the input
@@ -16,3 +16,112 @@ describe("background-removal/inference — rembg u2net_human_seg parity", () =>
1616
expect(STD).toEqual([0.229, 0.224, 0.225]);
1717
});
1818
});
19+
20+
// These tests pin the contract that `--background-output` is built on:
21+
// fg.alpha + bg.alpha === 255 per pixel, and the RGB plane is byte-identical
22+
// between fg and bg. A future change to the postprocess loop (different mask
23+
// threshold, premultiplied alpha, gamma-corrected compositing) that breaks
24+
// either invariant should fail here loudly.
25+
describe("background-removal/inference — applyMask invariants", () => {
26+
function makeRgb(pixels: number): Buffer {
27+
// Deterministic but non-trivial RGB so byte equality is meaningful.
28+
const buf = Buffer.allocUnsafe(pixels * 3);
29+
for (let i = 0; i < pixels; i++) {
30+
buf[i * 3] = (i * 7) & 0xff;
31+
buf[i * 3 + 1] = (i * 13 + 31) & 0xff;
32+
buf[i * 3 + 2] = (i * 19 + 61) & 0xff;
33+
}
34+
return buf;
35+
}
36+
37+
function makeMask(pixels: number): Buffer {
38+
// Hit the saturation endpoints (0, 255) and a few mid-tone values so the
39+
// 255-m inversion is exercised across the full byte range.
40+
const buf = Buffer.allocUnsafe(pixels);
41+
for (let i = 0; i < pixels; i++) buf[i] = (i * 37) & 0xff;
42+
return buf;
43+
}
44+
45+
it("dual-output: fg.alpha + bg.alpha === 255 for every pixel", () => {
46+
const pixels = 64;
47+
const rgb = makeRgb(pixels);
48+
const mask = makeMask(pixels);
49+
const fg = Buffer.allocUnsafe(pixels * 4);
50+
const bg = Buffer.allocUnsafe(pixels * 4);
51+
52+
const result = applyMask(rgb, mask, fg, bg, pixels);
53+
54+
expect(result.fg).toBe(fg);
55+
expect(result.bg).toBe(bg);
56+
for (let i = 0; i < pixels; i++) {
57+
const sum = fg[i * 4 + 3]! + bg[i * 4 + 3]!;
58+
expect(sum).toBe(255);
59+
}
60+
});
61+
62+
it("dual-output: RGB triples are byte-identical between fg and bg", () => {
63+
const pixels = 64;
64+
const rgb = makeRgb(pixels);
65+
const mask = makeMask(pixels);
66+
const fg = Buffer.allocUnsafe(pixels * 4);
67+
const bg = Buffer.allocUnsafe(pixels * 4);
68+
69+
applyMask(rgb, mask, fg, bg, pixels);
70+
71+
for (let i = 0; i < pixels; i++) {
72+
expect(fg[i * 4]).toBe(bg[i * 4]);
73+
expect(fg[i * 4 + 1]).toBe(bg[i * 4 + 1]);
74+
expect(fg[i * 4 + 2]).toBe(bg[i * 4 + 2]);
75+
// And both match the source.
76+
expect(fg[i * 4]).toBe(rgb[i * 3]);
77+
expect(fg[i * 4 + 1]).toBe(rgb[i * 3 + 1]);
78+
expect(fg[i * 4 + 2]).toBe(rgb[i * 3 + 2]);
79+
}
80+
});
81+
82+
it("dual-output: fg.alpha equals the input mask", () => {
83+
const pixels = 32;
84+
const rgb = makeRgb(pixels);
85+
const mask = makeMask(pixels);
86+
const fg = Buffer.allocUnsafe(pixels * 4);
87+
const bg = Buffer.allocUnsafe(pixels * 4);
88+
89+
applyMask(rgb, mask, fg, bg, pixels);
90+
91+
for (let i = 0; i < pixels; i++) {
92+
expect(fg[i * 4 + 3]).toBe(mask[i]);
93+
}
94+
});
95+
96+
it("single-output: bg=null returns bg=null and writes only fg", () => {
97+
const pixels = 32;
98+
const rgb = makeRgb(pixels);
99+
const mask = makeMask(pixels);
100+
const fg = Buffer.allocUnsafe(pixels * 4);
101+
102+
const result = applyMask(rgb, mask, fg, null, pixels);
103+
104+
expect(result.bg).toBeNull();
105+
expect(result.fg).toBe(fg);
106+
for (let i = 0; i < pixels; i++) {
107+
expect(fg[i * 4]).toBe(rgb[i * 3]);
108+
expect(fg[i * 4 + 3]).toBe(mask[i]);
109+
}
110+
});
111+
112+
it("saturates correctly at mask=0 and mask=255", () => {
113+
// mask=0 → fg.alpha=0 (transparent subject), bg.alpha=255 (fully opaque plate)
114+
// mask=255 → fg.alpha=255 (fully opaque subject), bg.alpha=0 (transparent plate)
115+
const rgb = Buffer.from([10, 20, 30, 40, 50, 60]);
116+
const mask = Buffer.from([0, 255]);
117+
const fg = Buffer.allocUnsafe(8);
118+
const bg = Buffer.allocUnsafe(8);
119+
120+
applyMask(rgb, mask, fg, bg, 2);
121+
122+
expect(fg[3]).toBe(0);
123+
expect(bg[3]).toBe(255);
124+
expect(fg[7]).toBe(255);
125+
expect(bg[7]).toBe(0);
126+
});
127+
});

packages/cli/src/background-removal/inference.ts

Lines changed: 82 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,24 @@ interface OrtModule {
2424
Tensor: typeof Tensor;
2525
}
2626

27+
export interface SessionResult {
28+
/** Subject opaque, background fully transparent. */
29+
fg: Buffer;
30+
/** Inverse-alpha plate: same RGB, alpha is `255 − mask`. Null unless `withBackground` was true. */
31+
bg: Buffer | null;
32+
}
33+
2734
export interface Session {
28-
/** Run inference on one RGB frame, return RGBA bytes (H*W*4). */
29-
process(rgb: Buffer, width: number, height: number): Promise<Buffer>;
30-
/** ORT EP that was actually selected. */
35+
/**
36+
* Both `fg` and `bg` (when requested) are session-owned buffers reused on the
37+
* next call — drain the encoder's stdin before invoking `process` again.
38+
*/
39+
process(
40+
rgb: Buffer,
41+
width: number,
42+
height: number,
43+
withBackground?: boolean,
44+
): Promise<SessionResult>;
3145
provider: string;
3246
close(): Promise<void>;
3347
}
@@ -73,16 +87,15 @@ export async function createSession(options: CreateSessionOptions = {}): Promise
7387
throw new Error("ONNX session is missing input or output bindings");
7488
}
7589

76-
// Pre-allocated per-frame buffers reused across every process() call.
77-
// At 1080p this saves ~9 MB of allocations per frame. rgbaBuf is sized
78-
// lazily on the first call (we don't know W/H until then).
90+
// Reused across calls; sized lazily on first frame. Saves ~9 MB/frame at 1080p.
7991
const inputData = new Float32Array(3 * INPUT_PLANE);
8092
const maskBuf = Buffer.allocUnsafe(INPUT_PLANE);
8193
let rgbaBuf: Buffer | null = null;
94+
let rgbaBgBuf: Buffer | null = null;
8295

8396
return {
8497
provider: providerUsed,
85-
async process(rgb, width, height) {
98+
async process(rgb, width, height, withBackground = false) {
8699
const tensor = await preprocess(sharp, ort, rgb, width, height, inputData);
87100
const outputs = await session.run({ [inputName]: tensor });
88101
const output = outputs[outputName];
@@ -91,7 +104,21 @@ export async function createSession(options: CreateSessionOptions = {}): Promise
91104
if (!rgbaBuf || rgbaBuf.length !== expectedBytes) {
92105
rgbaBuf = Buffer.allocUnsafe(expectedBytes);
93106
}
94-
return await postprocess(sharp, output, rgb, width, height, maskBuf, rgbaBuf);
107+
if (withBackground) {
108+
if (!rgbaBgBuf || rgbaBgBuf.length !== expectedBytes) {
109+
rgbaBgBuf = Buffer.allocUnsafe(expectedBytes);
110+
}
111+
}
112+
return await postprocess(
113+
sharp,
114+
output,
115+
rgb,
116+
width,
117+
height,
118+
maskBuf,
119+
rgbaBuf,
120+
withBackground ? rgbaBgBuf : null,
121+
);
95122
},
96123
async close() {
97124
await session.release();
@@ -141,7 +168,8 @@ async function postprocess(
141168
height: number,
142169
maskBuf: Buffer,
143170
rgbaBuf: Buffer,
144-
): Promise<Buffer> {
171+
rgbaBgBuf: Buffer | null,
172+
): Promise<SessionResult> {
145173
const raw = output.data as Float32Array;
146174

147175
let lo = Infinity;
@@ -172,11 +200,50 @@ async function postprocess(
172200
.raw()
173201
.toBuffer();
174202

175-
for (let i = 0; i < width * height; i++) {
176-
rgbaBuf[i * 4] = rgb[i * 3]!;
177-
rgbaBuf[i * 4 + 1] = rgb[i * 3 + 1]!;
178-
rgbaBuf[i * 4 + 2] = rgb[i * 3 + 2]!;
179-
rgbaBuf[i * 4 + 3] = fullMask[i]!;
203+
return applyMask(rgb, fullMask, rgbaBuf, rgbaBgBuf, width * height);
204+
}
205+
206+
/**
207+
* Composite the RGB source frame with the segmentation mask into one or two
208+
* RGBA buffers. The contract this PR is built on:
209+
* - `fg`'s alpha is the mask, `bg`'s alpha (when provided) is `255 − mask`,
210+
* so `fg.alpha + bg.alpha === 255` for every pixel.
211+
* - RGB triples are byte-identical between `fg` and `bg`.
212+
* - When `bg` is null, only `fg` is touched.
213+
*
214+
* Exported for direct unit testing of the invariants above without spinning
215+
* up an ONNX session.
216+
*/
217+
export function applyMask(
218+
rgb: Buffer,
219+
mask: Buffer,
220+
fg: Buffer,
221+
bg: Buffer | null,
222+
pixels: number,
223+
): SessionResult {
224+
if (bg) {
225+
for (let i = 0; i < pixels; i++) {
226+
const r = rgb[i * 3]!;
227+
const g = rgb[i * 3 + 1]!;
228+
const b = rgb[i * 3 + 2]!;
229+
const m = mask[i]!;
230+
const o = i * 4;
231+
fg[o] = r;
232+
fg[o + 1] = g;
233+
fg[o + 2] = b;
234+
fg[o + 3] = m;
235+
bg[o] = r;
236+
bg[o + 1] = g;
237+
bg[o + 2] = b;
238+
bg[o + 3] = 255 - m;
239+
}
240+
return { fg, bg };
241+
}
242+
for (let i = 0; i < pixels; i++) {
243+
fg[i * 4] = rgb[i * 3]!;
244+
fg[i * 4 + 1] = rgb[i * 3 + 1]!;
245+
fg[i * 4 + 2] = rgb[i * 3 + 2]!;
246+
fg[i * 4 + 3] = mask[i]!;
180247
}
181-
return rgbaBuf;
248+
return { fg, bg: null };
182249
}

0 commit comments

Comments
 (0)