EMAHook._build_averaged_modeloverride seam, so a caller that owns model sharding can supply a pre-builtAveragedModelinstead of the default deepcopy — enabling EMA onfully_shard(FSDP2) / DTensor models. Default behaviour unchanged.- Checkpointable training hooks. Hooks such as EMA can now save restart state with strategy checkpoints, so resumed training keeps averaged weights instead of starting them over.
- Training strategy checkpoint restart support, including a periodic checkpoint hook for step- or epoch-based saves and restart loading with models, optimizers, schedulers, runtime counters, and restart-safe device placement.
- PhysicsNeMo-compatible atomic datapipes with
MultiDatasetcomposition, multidataset-aware sampling policies, and fused batch loading that preserves the Zarr reader's coalesced I/O path. - First-class validation on
TrainingStrategy. Set aValidationConfigonstrategy.validation_configand validation runs automatically at the configured step or epoch cadence, plus one final pass at end-of-training; the latest summary is stored onstrategy.last_validation. Mechanics live in a public, context-managedValidationLoopthat can also be run standalone outside training. Aninference_modelslot lets EMA (or SWA / a distillation teacher) publish averaged weights for validation to read. A newAFTER_VALIDATIONhook stage fires immediately after each pass so loggers can read the live summary. For per-batch logging, pass abatch_callback(any object matching theBatchValidationCallbackprotocol) on the config; it is invoked once per validation batch with the batch, predictions, and per-batch loss. - Metric-driven learning-rate schedulers.
ReduceLROnPlateauis now supported viaOptimizerConfig.scheduler_metric_adapter(a summary-dict key string or a callable). Time-based schedulers step every optimizer step as before; metric-driven schedulers step only at validation checkpoints, where the validation summary supplies the metric.
- User-specified transforms -
Datasetaccepts atransforms=kwarg (per-sample(AtomicData, metadata) -> (AtomicData, metadata)) andDataLoaderaccepts abatch_transforms=kwarg (per-batchBatch -> Batch). Both default toNone(backward compatible). Newnvalchemi.data.transformssubpackage exposes a polymorphicComposeutility plusSampleTransformandBatchTransformtype aliases, re-exported fromnvalchemi.data. Per-sample transforms run after device transfer on both sync and prefetch paths; per-batch transforms run on the consumer thread afterBatch.from_data_list. Transform failures are wrapped inRuntimeErrorwithtransform[<i>]breadcrumb and__cause__preserved.
- UMA (fairchem-core) wrapper — new
UMAWrapperexposes UMA (Universal Models for Atoms) foundation models (uma-s-1p1,uma-s-1p2,uma-m-1p1) through theBaseModelMixininterface, ready for any dynamics engine or standalone inference. UMA is multi-task; the wrapper is pinned to one head at construction (OMol, OMat, OC20, ODAC, OMC). Input conversion is tensor-native (no ASE round trip); energy is the differentiable primitive with forces and (for periodic tasks) stress from autograd. Install via the newumaoptional extra (pip install 'nvalchemi-toolkit[uma]'), which is declared conflicting with themaceandcu12/cu13extras (incompatiblee3nn/torchpins) and resolves into its own environment.from_checkpointforwards fairchem'sinference_settings(including"turbo"fortorch.compile). See theexamples/advanced/09_uma_nve.pyNVE/NVT/NPT walkthrough.
- Zarr dataloader custom fields — validated
Datasetbatch paths now preserve reader field-level metadata so custom atom-, edge-, and system-level tensors survive batching like theskip_validationpath. - EMA checkpointing now restores averaged tensors to the corresponding live model tensor devices, publishes restored EMA weights during SETUP before validation, and supports callable reconstruction specs for model wrappers that must rebuild from factory methods, including MACE checkpoints with cuEquivariance enabled.
- NVT Nosé-Hoover velocity collapse (#104) — reset the NHC
total_scalescratch accumulator to the multiplicative identity on each chain update, preventing persistent state from zeroing or compounding velocity rescaling. - MTK NPT barostat runaway (#89, #90) — four bugs in
nvalchemi/dynamics/integrators/npt.py(with matching fixes innph.py) that combined to drive unbounded cell-volume drift in long NPT runs. Cross-validated against ASEMTKNPT/IsotropicMTKNPTand TorchSimnpt_nose_hoover_isotropic. Isotropic users will see their barostat massWshrink by 3× (now matches canonical MTK). - Ewald / PME energies buffer leak (#82) — in-place
scatter_add_of gradient-carryingper_atom_energieschained each forward's Warp backward tape onto_energies_buf, causing linear per-step slowdown and unbounded GPU memory growth.detach_()the buffer after each forward.
cells_invargument on_cell_kinetic_energy. Cell kinetic energy is computed directly from the strain rateε̇and no longer needs the cell inverse. The argument is retained for backwards compatibility (aDeprecationWarningis emitted when passed) and will be removed in a future release.
-
Dataset-level explicit batch reads now use
load_batches(...). The rawread_many(...)API remains on readers, where storage backends can optimize ordered I/O, butDataset.read_many(...)andDataset.get_batch(...)have been removed to keep the public Dataset API focused on sample access, batch materialization, and prefetching. -
Split hook context state into
HookContext,DynamicsContext, andTrainContextso each workflow exposes only the fields it owns. Dynamics-specific state such asstep_count,converged_mask, andglobal_ranknow lives onDynamicsContext, while training state lives onTrainContext. Existing hooks that usedHookContextfor dynamics-only fields should update their annotations toDynamicsContext. -
Standardized public
stressoutputs on tensile-positive Cauchy stress (sigma = -W / V) while keeping low-level virials defined as negative strain derivatives. -
Removed
EvaluateHookin favor of first-class validation onTrainingStrategy. Validation is no longer a registered hook. Migrate by moving the hook's arguments onto aValidationConfig:# Before strategy.register_hook( EvaluateHook(validation_data=val_data, every_n_epochs=1) ) # After strategy.validation_config = ValidationConfig( validation_data=val_data, every_n_epochs=1 )
Validation then runs automatically during
strategy.run(...)at the configured cadence and once at end-of-training. TheEvaluationSink/EvaluationZarrSinkoutput classes were removed; replace summary logging with anAFTER_VALIDATIONhook and per-batch logging with aValidationConfig(batch_callback=...).
Initial public-beta release of NVIDIA ALCHEMI Toolkit, a GPU-first Python framework for AI-driven atomic simulation workflows.
- AtomicData — Pydantic-backed graph representation of atomic systems
(positions, atomic numbers, masses, node/edge properties) with factory
constructors
from_atoms()(ASE) andfrom_structure()(pymatgen). - Batch — GPU-resident graph batch with
MultiLevelStoragebackend supporting node-, edge-, and system-level tensors. Lazybatch_idx/batch_ptr,index_select,append, andfrom_data_listfor efficient batching. - Zarr I/O —
AtomicDataZarrWriterandAtomicDataZarrReaderwith configurable Zstd compression, chunking, and sharding for high-throughput trajectory storage. - Dataset & DataLoader — CUDA-stream prefetching, async I/O, and
drop-in
DataLoaderreplacement yieldingBatchobjects.
All wrappers implement BaseModelMixin with a unified ModelConfig for
capability declaration and runtime control.
- DemoModelWrapper — Lightweight test/demo model (point-cloud energy + autograd forces).
- MACEWrapper — MACE equivariant neural network; supports foundation checkpoints; COO neighbor format; conservative forces via autograd.
- AIMNet2Wrapper — AIMNet2 atom-in-molecule network; energy, forces, charges, stress; MATRIX neighbor format; NSE auto-detection.
- LennardJonesModelWrapper — Warp-accelerated single-species LJ with analytical forces and optional virial stress.
- EwaldModelWrapper — Real + reciprocal space Ewald summation for periodic charged systems; k-vector caching; hybrid analytical forces.
- PMEModelWrapper — Particle Mesh Ewald (FFT-based, O(N log N)) for large periodic systems.
- DFTD3ModelWrapper — DFT-D3(BJ) dispersion correction with auto-downloaded reference parameters and cutoff smoothing.
- PipelineModelWrapper — Compose multiple models into groups with independent derivative strategies (autograd vs. analytical).
- BaseDynamics — Abstract base orchestrating model evaluation, integrator updates, hook dispatch, convergence detection, and inflight batching.
- 9 hook insertion points per step (
DynamicsStageenum):BEFORE_STEP,BEFORE_PRE_UPDATE,AFTER_PRE_UPDATE,BEFORE_COMPUTE,AFTER_COMPUTE,BEFORE_POST_UPDATE,AFTER_POST_UPDATE,AFTER_STEP,ON_CONVERGE. - ConvergenceHook — Flexible convergence criteria with
from_fmax()convenience constructor and per-system masking.
- NVE — Velocity Verlet; symplectic, time-reversible, energy-conserving.
- NVTLangevin — BAOAB Langevin dynamics with Ornstein-Uhlenbeck thermostat for canonical sampling.
- NVTNoseHoover — Nosé-Hoover chain thermostat with Yoshida-Suzuki factorization; deterministic and ergodic.
- NPT — Martyna-Tobias-Klein isothermal-isobaric with dual Nosé-Hoover chains (particle + cell DOFs).
- NPH — MTK isenthalpic-isobaric without thermostat.
- FIRE — Fast Inertial Relaxation Engine with adaptive timestep.
- FIREVariableCell — FIRE with NPH-like variable-cell propagation.
- FIRE2 — Improved FIRE (Shuang et al. 2020) with better restart conditions and modified velocity mixing.
- FIRE2VariableCell — FIRE2 with variable-cell structural relaxation.
Dynamics hooks (nvalchemi.dynamics.hooks):
LoggingHook— Per-graph scalar statistics with thread-pooled I/O and optional CUDA stream prefetch.NaNDetectorHook— Immediate NaN/Inf detection in forces and energy.MaxForceClampHook— Clamps force magnitudes to prevent numerical explosions.EnergyDriftMonitorHook— Cumulative energy drift tracking with configurable thresholds (absolute and per-atom-per-step).FreezeAtomsHook— Freezes selected atoms by category during MD.SnapshotHook— Periodic full-state snapshots to aDataSink.ConvergedSnapshotHook— Snapshot on convergence.ProfilerHook— Per-stage wall-clock profiling with NVTX annotations and CSV output.AlignCellHook— Upper-triangular cell alignment for variable-cell optimization.
General hooks (nvalchemi.hooks):
NeighborListHook— On-the-fly neighbor list construction/refresh with Verlet skin buffer; MATRIX and COO formats.WrapPeriodicHook— GPU-accelerated PBC wrapping via Warp kernel.BiasedPotentialHook— External bias potentials for enhanced sampling (umbrella sampling, metadynamics, etc.).
- FusedStage (
+operator) — Compose dynamics stages on a single GPU with shared forward pass and masked updates per sub-stage. - DistributedPipeline (
|operator) — Distribute stages across GPU ranks with blocking inter-rank communication. - SizeAwareSampler — Bin-packing inflight batching that respects
max_atoms,max_edges, andmax_batch_sizeconstraints. - Data sinks —
HostMemory(CPU),GPUBuffer(device),ZarrData(persistent disk) for capturing pipeline outputs.
All low-level kernels built on
nvalchemi-toolkit-ops
via NVIDIA Warp:
- Velocity Verlet position/velocity updates
- BAOAB Langevin half-steps
- Nosé-Hoover chain integration
- MTK barostat (NPT/NPH) cell and position propagation
- FIRE/FIRE2 coordinate and cell steps
- Kinetic energy and velocity initialization
- Neighbor list rebuild with Verlet skin
- Cell alignment to upper-triangular form
- 20 worked examples across four tiers (basic, intermediate, advanced, distributed) covering data structures, optimization, MD ensembles, Zarr I/O, inflight batching, custom hooks, model composition, Ewald electrostatics, and multi-GPU pipelines.
- 7 Claude Code agent skills (
.claude/skills/) for guided workflows: model wrapping, data structures, data storage, dynamics API, dynamics hooks, dynamics implementation, and engineering scoping. OptionalDependencyguards for graceful degradation when MACE, AIMNet2, ASE, or pymatgen are not installed.
- Python 3.11–3.13
- PyTorch >= 2.8
nvalchemi-toolkit-ops[torch]>= 0.3.1- Optional:
[mace],[aimnet],[ase],[pymatgen]extras