Skip to content

[WIP] Lets figure out performance#2768

Draft
rafaellehmkuhl wants to merge 12 commits into
bluerobotics:masterfrom
rafaellehmkuhl:lets-figure-out-performance
Draft

[WIP] Lets figure out performance#2768
rafaellehmkuhl wants to merge 12 commits into
bluerobotics:masterfrom
rafaellehmkuhl:lets-figure-out-performance

Conversation

@rafaellehmkuhl

Copy link
Copy Markdown
Member

This branch is being used to investigate performance issues on Cockpit. It may or may not be open for merge in the future.

Add always-on long-task monitoring that coalesces main-thread stalls into
episodes and logs a single summary on recovery (visibility-gated and
rate-limited), so degraded performance is captured in the system logs without
the instrumentation itself becoming a cost.

Add opt-in (off by default) deeper instrumentation behind a Development
setting: a no-op-when-disabled instrument() User Timing helper wired at the
MAVLink receive entry, and JS self-profiling capture. Surface recent long
tasks, instrumented-section stats and a profile-capture button in the
Development configuration view.
Add a periodic (30s) trend snapshot to the always-on monitoring that captures
windowed framerate health (avg fps, frame-interval stddev, p95/max) alongside
leak indicators (data lake variable and listener counts, DOM node count,
memory). This targets gradual degradation over long sessions - where steady
framerate slowly sinks and oscillates - rather than discrete startup stalls,
by making a long session self-document what is growing as fps falls.

Expose cheap data-lake variable/listener counts for the leak indicators and
surface the trend snapshots in the Development configuration view.
The self-profiler was blocked by a missing Document-Policy: js-profiling
header, so captures failed with a policy violation. Inject that header from
the Vite dev server and from Electron (main-frame responses, covering both the
dev URL and the file://-served production build).

Also make availability detection real: the Profiler global can exist while the
API is still policy-blocked, so probe once by constructing a throwaway profiler
and cache the result, instead of a misleading typeof check that left the button
enabled and triggered a violation on every capture attempt.
…ability

The number of data lake variables is not a leak vector, so remove it from the
trend snapshot, the dev view column and the now-unused accessor; the listener
count remains as the meaningful indicator.

Also log the concrete reason when self-profiling is unavailable (missing
Profiler global, non-secure context, or constructor error) so it's clear why
the capture button is disabled in a given runtime.
The Long Tasks API only names the frame container (almost always "unknown" for
top-level work), so long-task attribution was useless. Add a Long Animation
Frames (LoAF) observer that records the dominant script of each long frame and
back-fills overlapping long tasks (and episode summaries) with a real
function/source attribution. Chromium-only; no-ops where unsupported.
Add opt-in instrument() probes (no-op when profiling is disabled) to attribute
the telemetry-driven reactive load:
- data-lake fan-out, keyed per variable (datalake-fanout:<id>), to see which
  variables drive the synchronous listener fan-out
- the generic settings-sync deep watcher, keyed per setting (settings-watch:<key>)
- the controller protocol-mapping deep watchers
activeButtonActions runs on every joystick poll (~60Hz) and was deep-comparing
each button's action against the constant modifier set via JSON.stringify,
re-serializing everything per button per poll. Profiling on throttled hardware
showed this as a top cost. Precompute the modifier action id set once and
compare by id instead.
The HUD widgets (Compass, CompassHUD, VirtualHorizon, Attitude, DepthHUD) tweened
a reactive() render-state object with GSAP and redrew via a watch on it. Every
animation frame therefore paid the full Vue reactivity cost (set/trigger/deep
traverse) plus a watcher fire, when only the canvas redraw is needed. Profiling on
throttled hardware showed this as the dominant main-thread cost (GSAP ticking
through reactive setters, with the deep traverse the top sampled function).

Make the render-state objects plain (non-reactive) and drive the canvas redraw
directly from GSAP's onUpdate, coalesced to one redraw per animation frame
(CompassHUD keeps its existing rAF loop, now started explicitly). Size/option
changes still trigger redraws via a dedicated watch. Same visuals, without the
per-frame reactivity storm.
@rafaellehmkuhl rafaellehmkuhl force-pushed the lets-figure-out-performance branch from 51e1cf1 to 975a308 Compare June 8, 2026 17:37
The Stats-for-Nerds overlay (commonly left open) ran a 50Hz update timer pushing
into reactive arrays plus a continuous 60Hz canvas redraw loop, even though the
underlying WebRTC stats only refresh at ~10Hz. Profiling showed this as a real
contributor.

Make the plot/stat buffers plain (non-reactive) since only the canvas draw reads
them, sample at the stats cadence (~10Hz) instead of a separate 50Hz interval,
and coalesce redraws to one animation frame triggered on new data instead of a
constant 60Hz loop.
emitStateEvent and the SDL/gamepad handlers run on every joystick state change
(~50Hz). They re-derived the joystick model by allocating and recompiling two
regexes per call (getVidPid, called twice per event) and read settings on every
event. Hoist the regexes to module constants, cache the model per gamepad id
(it never changes), and cache the disabled-models list via a settings listener.
Move the Vue components that read the data lake via listenDataLakeVariable onto
the useDataLakeVariable composable: the VeryGenericIndicator and EkfStateIndicator
mini-widgets and the Slider/Switch/Dropdown/Dial/Checkbox input elements. The
composable auto-resubscribes when the variable id changes and cleans up on
unmount, removing the manual listener bookkeeping (and fixing a listener leak in
VeryGenericIndicator). Input widgets keep their write path and echo-guard.

This funnels render consumers through one binding so it can be coalesced. Plotter
(needs notifyOnTimestampChange / plot-on-constant), IFrame (forwards to external
widgets) and store-based widgets are intentionally left as-is.
Now that render consumers funnel through the composable, batch incoming data
lake updates to at most one ref write per animation frame (latest value wins).
This caps reactive re-renders at display rate regardless of telemetry/message
rate, and collapses many variables updating in the same tick into a single
render pass. The initial value is still applied synchronously so the binding is
never blank on mount, and pending frames are cancelled on resubscribe/unmount.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant