CUPTI UVM Activity API diagnostic for GB10 / DGX Spark.
CUPTI (CUDA Profiling Tools Interface) is NVIDIA's C-based interface for building profiling and tracing tools. Nsight Systems uses CUPTI internally via the Activity API to collect UVM traces.
This tool calls the CUPTI Activity API directly — bypassing Nsight's toolchain — to determine whether CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER events are supported at the API level on GB10.
When Nsight UVM profiling is unsupported on a platform, this tool helps determine whether the limitation exists at the CUPTI Activity API level or above it.
Nsight Systems UVM profiling confirmed unsupported on GB10: https://forums.developer.nvidia.com/t/nsight-systems-unified-memory-trace-support-for-gb10-sm121/357848
UVM driver must be active before CUPTI can subscribe to UVM events. The tool loads UVM first, then enables CUPTI — this is the required initialization order.
- cudaMallocManaged — loads UVM driver
- CPU touch — activates UVM
- GPU touch — activates UVM on the GPU side
- cuptiActivityRegisterCallbacks — registers buffer callbacks
- cuptiActivityEnable(UNIFIED_MEMORY_COUNTER) — enables UVM events
- CPU + GPU access with CUPTI active — should trigger events
- Reports CUPTI return codes and event counts
| Platform | Expected |
|---|---|
| Discrete PCIe (Pascal through Ampere) | UVM events received |
| GB10 hardware-coherent UMA | Under investigation |
GB10 systems with both CUDA 13.0 and 13.1 installed: use 13.0. CUDA 13.1 has a known event timing issue on GB10.
# Default
nvcc -O2 -std=c++17 cupti_uma_probe.cu -o cupti_uma_probe \
-lcudart -lcupti
# If cupti.h not found, specify include path explicitly
nvcc -O2 -std=c++17 cupti_uma_probe.cu -o cupti_uma_probe \
-lcudart -lcupti \
-I$(CUDA_HOME)/extras/CUPTI/include./cupti_uma_probe # human-readable output + JSON log
./cupti_uma_probe --json # JSON onlyIf you have a GB10 / DGX Spark, share your JSON output via GitHub Issues — this tool is actively collecting GB10 results: https://github.com/parallelArchitect/cupti-uma-probe/issues
nvidia-uma-fault-probe (PTX-based latency, bandwidth, coherence): https://github.com/parallelArchitect/nvidia-uma-fault-probe
sparkview (continuous system health monitor): https://github.com/parallelArchitect/sparkview
GB10 hardware baseline findings: https://forums.developer.nvidia.com/t/gb10-hardware-baseline-first-direct-measurements-and-findings/367851
parallelArchitect — Human-directed GPU engineering with AI assistance.
MIT