|
1 | | -# AGENTS.md — NVIDIA ALCHEMI toolkit |
| 1 | +# AGENTS.md - NVIDIA ALCHEMI Toolkit |
2 | 2 |
|
3 | | -> Guidelines for AI coding agents operating in this repository. |
| 3 | +Guidelines for AI coding agents operating in this repository. |
4 | 4 |
|
5 | 5 | ## Project Overview |
6 | 6 |
|
7 | | -`nvalchemi` is an NVIDIA deep-learning framework for atomic simulations. Python 3.12+, |
8 | | -built with PyTorch, Pydantic, and jaxtyping. Package manager is `uv`, build backend is |
9 | | -`hatchling`. |
| 7 | +`nvalchemi-toolkit` provides the `nvalchemi` Python package: a GPU-first |
| 8 | +framework for AI atomic simulation workflows. It covers graph-structured atomic |
| 9 | +data, model wrappers for machine-learned interatomic potentials, batched |
| 10 | +dynamics, hooks/reporting, and training/finetuning workflows. |
10 | 11 |
|
11 | | -## Build & Run Commands |
| 12 | +- Python support: `>=3.11,<3.14`; CI and setup examples use Python 3.12. |
| 13 | +- Package manager: `uv`; build backend: `hatchling`. |
| 14 | +- Core dependencies: PyTorch, Pydantic v2, jaxtyping, TensorDict, Zarr, Rich, |
| 15 | + PhysicsNeMo, and `nvalchemi-toolkit-ops`. |
| 16 | +- The project is in public beta. Public PRs may not be accepted immediately, but |
| 17 | + bug reports, feature requests, and scoped implementation discussions are welcome. |
12 | 18 |
|
13 | | -```bash |
14 | | -# Install all dependencies |
15 | | -make install # or: uv sync --all-extras |
| 19 | +## Repository Practices |
| 20 | + |
| 21 | +- Read `CONTRIBUTING.md` and the docs under `docs/userguide/about/` before broad |
| 22 | + changes; keep work tightly scoped. |
| 23 | +- Use DCO sign-off for commits: `git commit -s -m "fix: describe change"`. |
| 24 | +- Prefer Conventional Commits-style messages unless maintainers request otherwise. |
| 25 | +- Install and run pre-commit hooks for development. PRs that skip pre-commit are |
| 26 | + not expected to be reviewed. |
| 27 | +- The PR template expects a short description, testing notes, changelog updates, |
| 28 | + docstring/docs updates where applicable, and the relevant type-of-change box. |
16 | 29 |
|
17 | | -# Lint (ruff, pyupgrade, whitespace, debug statements) |
18 | | -make lint |
| 30 | +## CUDA And Environment Setup |
19 | 31 |
|
20 | | -# Docstring coverage check |
21 | | -make interrogate |
| 32 | +First check CUDA availability: |
22 | 33 |
|
23 | | -# License header check |
24 | | -make license |
| 34 | +```bash |
| 35 | +nvidia-smi |
25 | 36 | ``` |
26 | 37 |
|
27 | | -### Testing |
| 38 | +- If `nvidia-smi` is missing or reports no usable device, use default `uv` |
| 39 | + commands without CUDA extras where possible. |
| 40 | +- If it reports CUDA 12.x, pass `--extra cu12` to `uv` commands and |
| 41 | + `CUDA_EXTRA=cu12` to `make` targets. |
| 42 | +- If it reports CUDA 13.x, `cu13` is the default Makefile extra; explicit |
| 43 | + commands can still use `--extra cu13` or `CUDA_EXTRA=cu13`. |
| 44 | +- Do not use `uv sync --all-extras`: the CUDA variants and some model extras are |
| 45 | + mutually exclusive. |
28 | 46 |
|
29 | | -This project uses [pytest-testmon](https://testmon.org) to skip tests unaffected by |
30 | | -recent code changes. A `.testmondata` database tracks which tests depend on which source |
31 | | -files; only tests whose dependencies changed are re-run. |
| 47 | +Common setup commands: |
32 | 48 |
|
33 | 49 | ```bash |
34 | | -# --- Local development --- |
| 50 | +# Default development environment; Makefile currently defaults CUDA_EXTRA=cu13. |
| 51 | +make install |
35 | 52 |
|
36 | | -make test # Run only affected tests (fast, requires .testmondata) |
37 | | -make test-all # Run ALL tests and rebuild .testmondata |
38 | | -make pytest # Run ALL tests with coverage (no testmon) |
| 53 | +# CUDA 12 development environment. |
| 54 | +make install CUDA_EXTRA=cu12 |
39 | 55 |
|
40 | | -# --- CI targets (not intended for local use) --- |
| 56 | +# Add CUDA-aligned optional extras, for example MACE. |
| 57 | +make install CUDA_EXTRA=cu12 OPTIONAL_EXTRAS=mace |
41 | 58 |
|
42 | | -make testmon-coverage # Run with testmon + coverage; used by CI workflows |
| 59 | +# Direct uv equivalents. |
| 60 | +uv sync --extra cu13 |
| 61 | +uv sync --extra cu12 --extra mace |
43 | 62 |
|
44 | | -# --- Targeting specific tests --- |
| 63 | +# Include documentation dependencies when needed. |
| 64 | +uv sync --extra cu13 --group docs |
| 65 | +``` |
45 | 66 |
|
46 | | -# Run a SINGLE test file |
47 | | -uv run pytest test/data/test_data_mixin.py |
| 67 | +Optional extras include `aimnet`, `ase`, `cu12`, `cu13`, `mace`, `pymatgen`, |
| 68 | +`tensorboard`, and `uma`. `uma` conflicts with the CUDA/MACE stack and should be |
| 69 | +resolved in its own environment, as CI does with: |
48 | 70 |
|
49 | | -# Run a SINGLE test case |
50 | | -uv run pytest test/data/test_data_mixin.py::TestMoveObjToDevice::test_move_tensor_to_device |
| 71 | +```bash |
| 72 | +UV_PROJECT_ENVIRONMENT=.venv-uma uv sync --extra uma --extra ase |
| 73 | +``` |
51 | 74 |
|
52 | | -# Run a single test by keyword match |
53 | | -uv run pytest -k "test_move_tensor" test/ |
| 75 | +## Build, Lint, Test |
54 | 76 |
|
55 | | -# Run a specific test module (via Makefile target) |
56 | | -make pytest-target TARGET=test/data/test_data_mixin.py |
| 77 | +Use Makefile targets when possible because they keep `uv run` aligned with the |
| 78 | +selected CUDA extra. |
57 | 79 |
|
58 | | -# Specialized test suites |
59 | | -make pytest-data # data, md, training, models, neighborlist |
60 | | -make pytest-models # models only |
61 | | -make pytest-dynamics # dynamics, md, autobatch, optim |
62 | | -make pytest-al # active learning |
63 | | -make pytest-utils # utils, common, help |
| 80 | +```bash |
| 81 | +make lint # whitespace, debug, ruff check/format |
| 82 | +make lint-fix # ruff check/format auto-fix path |
| 83 | +make format # ruff format plus ruff check --fix |
| 84 | +make interrogate # docstring coverage |
| 85 | +make license # SPDX/license header validation |
| 86 | +make docs # build Sphinx docs |
| 87 | +make build # build package artifacts |
| 88 | +``` |
| 89 | + |
| 90 | +Testing uses `pytest-testmon` for affected-test selection. A `.testmondata` |
| 91 | +database is populated by full runs and reused by fast selective runs. |
| 92 | + |
| 93 | +```bash |
| 94 | +make test # affected tests with testmon --testmon-nocollect |
| 95 | +make test-all # all tests and rebuild testmon database |
| 96 | +make pytest # all tests with coverage, no testmon |
| 97 | +make testmon-coverage # CI-style testmon plus coverage |
| 98 | + |
| 99 | +# Narrow tests with Makefile pass-through. |
| 100 | +make test PYTEST_ARGS="test/data/test_atomic_data.py" |
| 101 | +make pytest PYTEST_ARGS="-k test_move_tensor test/" |
| 102 | + |
| 103 | +# Direct uv commands must include the active CUDA extra. |
| 104 | +uv run --extra cu13 pytest test/models/test_lj_model.py |
| 105 | +uv run --extra cu12 pytest \ |
| 106 | + test/data/test_data_mixin.py::TestMoveObjToDevice::test_move_tensor_to_device |
64 | 107 | ``` |
65 | 108 |
|
66 | | -**Typical local workflow:** run `make test-all` once to build the testmon database, |
67 | | -then use `make test` for fast iteration. The database persists across runs in |
68 | | -`.testmondata` (git-ignored). |
69 | | - |
70 | | -**Coverage:** the coverage threshold (75%) is configured in `pyproject.toml` |
71 | | -(`[tool.coverage.report] fail_under`). Branch coverage is disabled for testmon |
72 | | -compatibility. |
73 | | - |
74 | | -## Code Style |
75 | | - |
76 | | -### Formatting & Linting |
77 | | - |
78 | | -- **Formatters**: `ruff-format` (via pre-commit hooks). |
79 | | -- **Linter**: `ruff` with rules: `E` (pycodestyle), `F` (pyflakes), `S` (bandit), |
80 | | - `I` (isort), `PERF` (performance). Only `I` rules are auto-fixable. |
81 | | -- **Ignored globally**: `E501` (line length), `S311` (random generators), |
82 | | - `F722` and `F821` (break jaxtyping annotations). |
83 | | -- **Per-file overrides**: `F401` ignored in `__init__.py` and `docs/*.py`; |
84 | | - `S101` (assert) ignored in `test/*.py`; `E402` ignored in `examples/*.py`. |
85 | | -- **pyupgrade**: targets `--py310-plus`. |
86 | | -- **Docstrings**: `interrogate` enforces 95% coverage (excludes tests, init, magic, |
87 | | - private, semiprivate, property decorators, nested functions/classes). |
88 | | -- **Markdown**: `markdownlint` runs in pre-commit (MD024 disabled). |
89 | | - |
90 | | -### License Header |
91 | | - |
92 | | -Every `.py` file MUST start with this exact SPDX header (see `test/_license/header.txt`). |
93 | | - |
94 | | -The pre-commit hook (`test/_license/header_check.py`) validates this on every commit. |
95 | | - |
96 | | -### Imports |
97 | | - |
98 | | -- Always use `from __future__ import annotations` at the top of source files. |
99 | | -- Import order is enforced by ruff/isort: stdlib, third-party, local (`nvalchemi`). |
100 | | -- Use `TYPE_CHECKING` blocks for imports only needed at type-check time. |
101 | | -- Unused imports are allowed only in `__init__.py` files (F401 suppressed). |
102 | | - |
103 | | -### Type Annotations |
104 | | - |
105 | | -- All functions and methods MUST have type annotations. |
106 | | -- Use `jaxtyping` for tensor shape annotations (e.g., `Float[torch.Tensor, "V 3"]`). |
107 | | -- Shape dimension aliases are defined in `nvalchemi/_typing.py`: |
108 | | - `B` (batch), `V` (nodes), `E` (edges), `H` (hidden), `C` (centroids), `M` (ensemble). |
109 | | -- Use semantic type aliases from `_typing.py` (e.g., `NodePositions`, `Forces`, `Energy`). |
110 | | -- Use `Annotated[type, Field(...)]` for Pydantic model fields with descriptions. |
111 | | -- Use `typing.Protocol` for structural typing / interfaces. |
112 | | -- Use `TypeAlias` for type alias declarations. |
113 | | - |
114 | | -### Naming Conventions |
115 | | - |
116 | | -- **Classes**: `PascalCase` — `AtomicData`, `ModelConfig`, `BaseModelMixin`. |
117 | | -- **Functions/methods**: `snake_case` — `compute_embeddings`, `adapt_input`. |
118 | | -- **Private**: prefix with `_` — `_adapt_input`, `_verify_request`, `_typing.py`. |
119 | | -- **Type aliases**: `PascalCase` — `NodePositions`, `GraphEmbeddings`, `ModelOutputs`. |
120 | | -- **Constants/module-level**: `UPPER_SNAKE_CASE` or `PascalCase` for type vars. |
121 | | -- **TypeVars**: single uppercase letter or short name — `T`, `F`, `C`. |
122 | | -- **Test classes**: `Test` prefix — `TestMoveObjToDevice`, `TestDataMixin`. |
123 | | -- **Test methods**: `test_` prefix with descriptive snake_case — `test_move_tensor_to_device`. |
124 | | - |
125 | | -### Docstrings |
126 | | - |
127 | | -- NumPy-style docstrings are required (enforced at 95% coverage). |
128 | | -- Must include `Parameters`, `Returns`, `Raises` sections as applicable. |
129 | | -- Class docstrings should include `Attributes` section. |
130 | | -- Use `Examples` section with doctestable code where appropriate. |
131 | | - |
132 | | -### Error Handling |
133 | | - |
134 | | -- Use `ValueError`, `KeyError`, `TypeError` for validation with descriptive messages. |
135 | | -- Use `NotImplementedError` for abstract/unimplemented methods. |
136 | | -- Use `warnings.warn(..., UserWarning)` for capability mismatches (not hard errors). |
137 | | -- Use `raise RuntimeError(...)` for internal consistency violations. |
138 | | -- Custom errors: `OptionalDependencyError(ImportError)` for missing optional deps. |
139 | | - |
140 | | -### Pydantic Patterns |
141 | | - |
142 | | -- Data structures inherit from `pydantic.BaseModel` (often mixed with custom mixins). |
143 | | -- Use `@model_validator(mode="after")` for cross-field consistency checks. |
144 | | -- Use `PlainSerializer` for custom tensor serialization. |
145 | | -- Use `ConfigDict(extra="allow")` when models need extensibility. |
146 | | -- Use `model_config = {"arbitrary_types_allowed": True}` for torch.Tensor fields. |
147 | | -- Document fields with `Annotated[..., Field(description=...)]`. |
148 | | - |
149 | | -### Testing Patterns |
150 | | - |
151 | | -- Framework: `pytest` with `pytest-timeout`, `pytest-asyncio`, `hypothesis`. |
152 | | -- Test files mirror source structure under `test/`. |
153 | | -- Group related tests in classes with `Test` prefix. |
154 | | -- Use `setup_method` for per-test fixtures within test classes. |
155 | | -- Use `unittest.mock.Mock`, `patch`, `patch.object` for mocking. |
156 | | -- Markers: `@pytest.mark.slow`, `@pytest.mark.cli`. |
157 | | -- Deselect slow tests with `-m 'not slow'`. |
158 | | -- `asyncio_mode = "auto"` — async tests run automatically. |
159 | | -- Test verbosity: `-vv -r xfXs` (show extra info on xfailed/xpassed/skipped). |
160 | | -- When possible, use `Demo*` classes (e.g., `DemoModelWrapper`, `DemoDynamics`) |
161 | | - to compose example and unit test workflows instead of bespoke classes. |
162 | | - |
163 | | -### Architecture Notes |
164 | | - |
165 | | -- `nvalchemi/_typing.py`: Central type definitions — always import types from here. |
166 | | -- `nvalchemi/data/atomic_data.py`: Core `AtomicData` structure (Pydantic + DataMixin). |
167 | | -- `nvalchemi/data/batch.py`: `Batch` — batched disjoint graph (like torch_geometric). |
168 | | -- `nvalchemi/models/base.py`: `BaseModelMixin` — abstract interface for ML potentials. |
169 | | - **Note:** `nvalchemi/models/__init__.py` has broken imports (aimnet2, mace); import |
170 | | - `BaseModelMixin` directly from `nvalchemi.models.base` or under `TYPE_CHECKING`. |
171 | | -- `nvalchemi/_imports.py`: Optional dependency management with decorator pattern. |
172 | | -- `nvalchemi/_utils.py`: Context managers for device/dtype/env management. |
173 | | -- `nvalchemi/dynamics/`: Dynamics simulation framework. Inheritance: |
174 | | - `_CommunicationMixin` → `BaseDynamics` → `FusedStage` / `DemoDynamics`. |
175 | | - Hook system via `DynamicsStage` + `Hook` protocol. Data sinks: `GPUBuffer`, |
176 | | - `HostMemory`, `ZarrData`. Orchestration: `DistributedPipeline`. |
177 | | - |
178 | | -### Key Dependencies |
179 | | - |
180 | | -`torch` (>=2.5.1), `pydantic` (>=2.11.7), `jaxtyping` (>=0.3.2), `loguru`, |
181 | | -`plum-dispatch`, `dm-tree`, `nvtx`, `numpy`, `periodictable`, `tensordict` (>=0.11), |
182 | | -`zarr` (>=3). Optional: `nvidia-physicsnemo` (training extra), `ase` (>=3.27). |
| 109 | +Coverage is configured in `pyproject.toml` with `fail_under = 75`, branch |
| 110 | +coverage disabled, and `nvalchemi.coverage.xml` as the XML output. Interrogate |
| 111 | +docstring coverage requires 95%. |
| 112 | + |
| 113 | +## Tooling And Style |
| 114 | + |
| 115 | +- Ruff lint rules: `E`, `F`, `S`, `I`, and `PERF`; only import sorting (`I`) is |
| 116 | + marked auto-fixable in `pyproject.toml`. |
| 117 | +- Ruff ignores: `E501`, `S311`, `F722`, and `F821`. |
| 118 | +- Per-file ignores: `F401` in `__init__.py` and `docs/*.py`; `E402` and `S101` |
| 119 | + in `examples/*.py`; `S101` in `test/*.py`. |
| 120 | +- Pre-commit also runs large-file checks, trailing-whitespace, end-of-file fixer, |
| 121 | + YAML checks, debug-statements, Ruff, interrogate, markdownlint with `MD024` |
| 122 | + disabled, and the local license hook. |
| 123 | +- Every `.py` file must start with the exact SPDX header in |
| 124 | + `test/_license/header.txt`. |
| 125 | +- New source files should use `from __future__ import annotations`. |
| 126 | +- Keep imports ordered by Ruff/isort: standard library, third-party, local |
| 127 | + `nvalchemi`. |
| 128 | +- Use `TYPE_CHECKING` for type-only imports and optional-heavy imports. |
| 129 | +- Examples in the `examples` folder should follow `sphinx-gallery` style; |
| 130 | +this implies no interactivity, and for distributed examples they should |
| 131 | +be skippable with the `NVALCHEMI_SPHINX_BUILD` flag (see `docs/conf.py`) |
| 132 | + |
| 133 | +## Coding Conventions |
| 134 | + |
| 135 | +- All public functions and methods should be type annotated and documented with |
| 136 | + NumPy-style docstrings. |
| 137 | +- Use jaxtyping and semantic aliases from `nvalchemi/_typing.py` for tensor shape |
| 138 | + and domain types. |
| 139 | +- Use Pydantic v2 patterns: `Annotated[..., Field(description=...)]`, |
| 140 | + `@model_validator(mode="after")`, `ConfigDict`, and serializers where |
| 141 | + appropriate. |
| 142 | +- Prefer `typing.Protocol` for structural interfaces and `TypeAlias` for named |
| 143 | + aliases. |
| 144 | +- Keep errors precise: `ValueError`, `KeyError`, or `TypeError` for validation; |
| 145 | + `NotImplementedError` for abstract/unimplemented behavior; `RuntimeError` for |
| 146 | + internal consistency failures; `warnings.warn(..., UserWarning)` for capability |
| 147 | + mismatches. |
| 148 | +- Guard optional integrations with `nvalchemi._optional.OptionalDependency` and |
| 149 | + raise `OptionalDependencyError` through that mechanism. |
| 150 | +- Do not add private helper functions that only wrap a single obvious call unless |
| 151 | + the wrapper removes real complexity or matches an existing local pattern. |
| 152 | +- Add short comments where they explain intent or non-obvious constraints; avoid |
| 153 | + comments that restate the code. |
| 154 | + |
| 155 | +## Tests |
| 156 | + |
| 157 | +- Test files mirror package areas under `test/`: `data`, `dynamics`, `hooks`, |
| 158 | + `models`, and `training`. |
| 159 | +- Test classes use `Test*`; test methods use descriptive `test_*` names. |
| 160 | +- Use `setup_method` for per-test class fixtures when local tests already follow |
| 161 | + that pattern. |
| 162 | +- Use `unittest.mock.Mock`, `patch`, and `patch.object` for mocking. |
| 163 | +- Mark slow tests with `@pytest.mark.slow`; deselect with `-m 'not slow'`. |
| 164 | +- CLI tests use `@pytest.mark.cli`. |
| 165 | +- `asyncio_mode = "auto"` is enabled. |
| 166 | +- Prefer existing demo/test utilities such as `DemoModelWrapper`, `DemoDynamics`, |
| 167 | + and local `conftest.py` fixtures over bespoke scaffolding. |
| 168 | +- Add or update regression tests for behavior changes, especially model adapters, |
| 169 | + dynamics hooks, data serialization, training specs, and optional-dependency |
| 170 | + paths. |
| 171 | + |
| 172 | +## Architecture Notes |
| 173 | + |
| 174 | +- `nvalchemi/_typing.py`: central shape aliases and domain type aliases. |
| 175 | +- `nvalchemi/_optional.py`: optional dependency registry and clean error path. |
| 176 | +- `nvalchemi/_serialization.py`: tensor/model serialization helpers. |
| 177 | +- `nvalchemi/data/`: `AtomicData`, `Batch`, data mixins, Zarr/level storage, |
| 178 | + datapipes, samplers, and transforms. |
| 179 | +- `nvalchemi/models/`: `BaseModelMixin`, demo/LJ/DFTD3/Ewald/PME models, |
| 180 | + optional AIMNet2/MACE/UMA wrappers, neighbor filters, and composable pipelines. |
| 181 | +- `nvalchemi/dynamics/`: base dynamics, demo dynamics, integrators, optimizers, |
| 182 | + sampler, sinks, hooks, and low-level ops. |
| 183 | +- `nvalchemi/hooks/`: shared hook protocol/registry/context plus reporting, |
| 184 | + periodic, neighbor-list, profiling, and timing hooks. |
| 185 | +- `nvalchemi/training/`: CLI, strategy/spec validation, runtime, distributed |
| 186 | + helpers, finetuning, checkpoints, losses, optimizers, and training hooks. |
| 187 | +- `nvalchemi/distributed.py`: distributed utilities used by training and |
| 188 | + multi-stage workflows. |
| 189 | + |
| 190 | +Import from concrete modules when optional exports might pull unavailable extras. |
| 191 | +For example, prefer `from nvalchemi.models.base import BaseModelMixin` in code |
| 192 | +that should not import optional model backends. |
| 193 | + |
| 194 | +## Documentation And Agent Skills |
| 195 | + |
| 196 | +- User docs live in `docs/userguide/`; API docs live in `docs/modules/`; examples |
| 197 | + live in `examples/`. |
| 198 | +- Project conventions, including virial/stress/pressure signs, are documented in |
| 199 | + `docs/userguide/about/conventions.md`. |
| 200 | +- Agent-facing API skills live in `.claude/skills/`. Check the relevant |
| 201 | + `SKILL.md` before nontrivial work in data structures/storage, dynamics, |
| 202 | + hooks, model wrapping, training, finetuning, reporting, losses, or Zarr |
| 203 | + performance. |
| 204 | +- When docs, examples, or public APIs change, update related docs and consider |
| 205 | + `CHANGELOG.md` because the PR template asks for it. |
0 commit comments