Skip to content

[Bug]: OTEL "Dynamo bytecode transform" span reports wildly incorrect duration (tens of thousands of days)Β #46193

@nicklasfrahm

Description

@nicklasfrahm

πŸ› Describe the bug

When OpenTelemetry tracing is enabled (--otlp-traces-endpoint), the Dynamo bytecode transform span emitted from torch.compile instrumentation reports a wildly incorrect duration β€” tens of thousands of days instead of the actual compile time (seconds).

Image

Example span observed in a trace backend:

Dynamo bytecode transform
Service: llm
Duration: 20622d 20h
Start Time: 0ΞΌs (01:51:02.838)
Kind: internal
Status: unset
Library Name: vllm.tracing.otel

Span attributes
dynamo.time_seconds   17.16094771400094

The dynamo.time_seconds attribute (17.16s) is correct, but the span's own start/end timestamps used for the rendered Duration are off by ~56 years (~20622 days), which matches the gap between the Unix epoch and "now".

Root cause

In vllm/compilation/backends.py (around line 1145-1156):

from .monitor import torch_compile_start_time

dynamo_time = time.perf_counter() - torch_compile_start_time
...
start_time = int(torch_compile_start_time * 1e9)
attributes = {"dynamo.time_seconds": dynamo_time}
instrument_manual("Dynamo bytecode transform", start_time, None, attributes)

torch_compile_start_time is set in vllm/compilation/monitor.py:

torch_compile_start_time: float = 0.0
...
def ...():
    global torch_compile_start_time
    torch_compile_start_time = time.perf_counter()

time.perf_counter() returns a value from a monotonic clock with an arbitrary, implementation-defined reference point (e.g. system boot time on Linux). It is only meaningful for measuring elapsed intervals via subtraction β€” it is not wall-clock/Unix-epoch time.

However, instrument_manual forwards start_time directly into OpenTelemetry's tracer.start_span(start_time=...) (see vllm/tracing/otel.py::manual_instrument_otel), which expects nanoseconds since the Unix epoch:

span_kwargs: dict[str, Any] = {
    "name": span_name,
    "context": ctx,
    "start_time": start_time,
}
...
span = tracer.start_span(**span_kwargs)
...
span.end()  # end_time is None here, so OTel uses the real wall-clock "now"

Because end_time is None, span.end() stamps the end time using the real wall-clock time.time_ns(). So the span ends up with:

  • start_time β‰ˆ perf_counter() value (small, interpreted as if it were close to the 1970 epoch)
  • end_time β‰ˆ actual wall-clock now

The resulting duration is therefore roughly "now minus 1970", i.e. tens of thousands of days, regardless of how long Dynamo actually took to compile.

Suggested fix

Anchor the span timestamps to wall-clock time instead of the monotonic clock, e.g. capture both a monotonic start (for the elapsed delta) and a wall-clock reference in vllm/compilation/monitor.py, and compute the OTel start_time from wall-clock time at the call site in vllm/compilation/backends.py, for example:

# monitor.py
torch_compile_start_time: float = 0.0       # time.perf_counter()
torch_compile_start_time_wall: float = 0.0  # time.time()
# backends.py
dynamo_time = time.perf_counter() - torch_compile_start_time
start_time_wall = torch_compile_start_time_wall  # seconds, epoch-relative
start_time = int(start_time_wall * 1e9)
end_time = int((start_time_wall + dynamo_time) * 1e9)
instrument_manual("Dynamo bytecode transform", start_time, end_time, attributes)

This would also let other callers of instrument_manual avoid the same pitfall if they pass a time.perf_counter()-derived value by mistake.

Impact

Any OTel trace viewer (Jaeger, Tempo, etc.) shows the Dynamo bytecode transform span with a nonsensical multi-decade duration, which is confusing/misleading for anyone using tracing to debug compile-time overhead, and can visually break trace waterfalls/timelines in the UI (span extends far outside the rest of the trace).

Environment

Found via source inspection on vllm at commit 3b03a2cf4772838da622d81315941bb41bcc03ff (main, 2026-06-11). The bug is not environment-specific β€” it reproduces whenever --otlp-traces-endpoint is set and a torch.compile-triggered Dynamo bytecode transform occurs.

Before submitting a new issue...

  • I searched for relevant existing issues (no existing report found for this specific span/duration bug).

Root cause and analysis above were investigated with AI assistance (Claude); the underlying source locations and reasoning were reviewed by me before filing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions