[WIP] Lets figure out performance by rafaellehmkuhl · Pull Request #2768 · bluerobotics/cockpit

rafaellehmkuhl · 2026-06-08T17:02:32Z

This branch is being used to investigate performance issues on Cockpit. It may or may not be open for merge in the future.

Add always-on long-task monitoring that coalesces main-thread stalls into episodes and logs a single summary on recovery (visibility-gated and rate-limited), so degraded performance is captured in the system logs without the instrumentation itself becoming a cost. Add opt-in (off by default) deeper instrumentation behind a Development setting: a no-op-when-disabled instrument() User Timing helper wired at the MAVLink receive entry, and JS self-profiling capture. Surface recent long tasks, instrumented-section stats and a profile-capture button in the Development configuration view.

Add a periodic (30s) trend snapshot to the always-on monitoring that captures windowed framerate health (avg fps, frame-interval stddev, p95/max) alongside leak indicators (data lake variable and listener counts, DOM node count, memory). This targets gradual degradation over long sessions - where steady framerate slowly sinks and oscillates - rather than discrete startup stalls, by making a long session self-document what is growing as fps falls. Expose cheap data-lake variable/listener counts for the leak indicators and surface the trend snapshots in the Development configuration view.

The self-profiler was blocked by a missing Document-Policy: js-profiling header, so captures failed with a policy violation. Inject that header from the Vite dev server and from Electron (main-frame responses, covering both the dev URL and the file://-served production build). Also make availability detection real: the Profiler global can exist while the API is still policy-blocked, so probe once by constructing a throwaway profiler and cache the result, instead of a misleading typeof check that left the button enabled and triggered a violation on every capture attempt.

…ability The number of data lake variables is not a leak vector, so remove it from the trend snapshot, the dev view column and the now-unused accessor; the listener count remains as the meaningful indicator. Also log the concrete reason when self-profiling is unavailable (missing Profiler global, non-secure context, or constructor error) so it's clear why the capture button is disabled in a given runtime.

The Long Tasks API only names the frame container (almost always "unknown" for top-level work), so long-task attribution was useless. Add a Long Animation Frames (LoAF) observer that records the dominant script of each long frame and back-fills overlapping long tasks (and episode summaries) with a real function/source attribution. Chromium-only; no-ops where unsupported.

Add opt-in instrument() probes (no-op when profiling is disabled) to attribute the telemetry-driven reactive load: - data-lake fan-out, keyed per variable (datalake-fanout:<id>), to see which variables drive the synchronous listener fan-out - the generic settings-sync deep watcher, keyed per setting (settings-watch:<key>) - the controller protocol-mapping deep watchers

activeButtonActions runs on every joystick poll (~60Hz) and was deep-comparing each button's action against the constant modifier set via JSON.stringify, re-serializing everything per button per poll. Profiling on throttled hardware showed this as a top cost. Precompute the modifier action id set once and compare by id instead.

The HUD widgets (Compass, CompassHUD, VirtualHorizon, Attitude, DepthHUD) tweened a reactive() render-state object with GSAP and redrew via a watch on it. Every animation frame therefore paid the full Vue reactivity cost (set/trigger/deep traverse) plus a watcher fire, when only the canvas redraw is needed. Profiling on throttled hardware showed this as the dominant main-thread cost (GSAP ticking through reactive setters, with the deep traverse the top sampled function). Make the render-state objects plain (non-reactive) and drive the canvas redraw directly from GSAP's onUpdate, coalesced to one redraw per animation frame (CompassHUD keeps its existing rAF loop, now started explicitly). Size/option changes still trigger redraws via a dedicated watch. Same visuals, without the per-frame reactivity storm.

The Stats-for-Nerds overlay (commonly left open) ran a 50Hz update timer pushing into reactive arrays plus a continuous 60Hz canvas redraw loop, even though the underlying WebRTC stats only refresh at ~10Hz. Profiling showed this as a real contributor. Make the plot/stat buffers plain (non-reactive) since only the canvas draw reads them, sample at the stats cadence (~10Hz) instead of a separate 50Hz interval, and coalesce redraws to one animation frame triggered on new data instead of a constant 60Hz loop.

emitStateEvent and the SDL/gamepad handlers run on every joystick state change (~50Hz). They re-derived the joystick model by allocating and recompiling two regexes per call (getVidPid, called twice per event) and read settings on every event. Hoist the regexes to module constants, cache the model per gamepad id (it never changes), and cache the disabled-models list via a settings listener.

Move the Vue components that read the data lake via listenDataLakeVariable onto the useDataLakeVariable composable: the VeryGenericIndicator and EkfStateIndicator mini-widgets and the Slider/Switch/Dropdown/Dial/Checkbox input elements. The composable auto-resubscribes when the variable id changes and cleans up on unmount, removing the manual listener bookkeeping (and fixing a listener leak in VeryGenericIndicator). Input widgets keep their write path and echo-guard. This funnels render consumers through one binding so it can be coalesced. Plotter (needs notifyOnTimestampChange / plot-on-constant), IFrame (forwards to external widgets) and store-based widgets are intentionally left as-is.

Now that render consumers funnel through the composable, batch incoming data lake updates to at most one ref write per animation frame (latest value wins). This caps reactive re-renders at display rate regardless of telemetry/message rate, and collapses many variables updating in the same tick into a single render pass. The initial value is still applied synchronously so the binding is never blank on mount, and pending frames are cancelled on resubscribe/unmount.

rafaellehmkuhl added 8 commits June 8, 2026 13:58

rafaellehmkuhl force-pushed the lets-figure-out-performance branch from 51e1cf1 to 975a308 Compare June 8, 2026 17:37

rafaellehmkuhl added 4 commits June 8, 2026 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Lets figure out performance#2768

[WIP] Lets figure out performance#2768
rafaellehmkuhl wants to merge 12 commits into
bluerobotics:masterfrom
rafaellehmkuhl:lets-figure-out-performance

rafaellehmkuhl commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rafaellehmkuhl commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant