crates/
├── arborium/ ← main crate with inventory
├── tree-sitter/ ← fork to avoid upstream build errors
└── tree-sitter-highlight/ ← fork to avoid upstream build errors
- These are always in the repo (not generated) and are published every release.
- The tree-sitter forks track upstream but include fixes so CI builds reliably across targets.
Everything is generated once, then a single tag publishes all crates + npm packages.
# Pick a version and generate everything (core + all groups)
# This runs tree-sitter-cli for all grammars, but results are cached
# by tree-sitter-cli version + grammar.js hash, so reruns are cheap.
cargo xtask gen --version 0.3.0xtask gen also records the current release version in a small metadata file.
CI will regenerate from source of truth as well, but running this locally lets
you review changes before tagging.
git commit -am "Release v0.3.0"
git tag v0.3.0
git push origin main --tagsThe v0.3.0 tag triggers the CI workflow in .github/workflows/ci.yml which:
- Regenerates all grammars and crates with
arborium-xtask gen --version 0.3.0 - Builds and tests the
arboriumcrate - Builds all WASM plugins into
dist/plugins - Publishes:
- All grammar crates and core crates to crates.io via
cargo xtask publish crates - All npm packages (per-language +
@arborium/arborium) viacargo xtask publish npm -o dist/plugins
- All grammar crates and core crates to crates.io via
- ~98 grammar crates organized into ~10 hand-picked animal groups
- Core crates (
arborium,arborium-collection,tree-sitter-*) always published - Each group publishes independently from
langs/group-{animal}/viacargo ws publish --publish-as-is - We use
cargo ws publishinstead ofcargo publish --workspacebecause the latter is still brittle with partial publishes;cargo ws publishcan resume cleanly. - Retry-safe: crates.io warns and skips already-published versions
- All grammars with
generate-plugin: trueinlangs/group-{animal}/{lang}/def/arborium.kdl - Built via
cargo xtask buildforwasm32-unknown-unknownfrom the same group directory - Published as per-language packages under the
@arboriumscope, e.g.@arborium/rust,@arborium/javascript, etc. - Published together with crates.io in the same per-group CI job for version sync
- A single tag (
vX.Y.Z) publishes all crates.io + npm artifacts together. - Core crates (
arborium,tree-sitter-*, etc.) and allarborium-{lang}crates publish once per release.
Cargo handles already-published versions gracefully - it warns and continues:
warning: crate arborium-rust@0.3.0 already exists on crates.io
So retrying a publish job is safe; already-published crates are skipped.
npm is not graceful - it hard-fails with EPUBLISHCONFLICT:
npm ERR! code EPUBLISHCONFLICT
npm ERR! Cannot publish over existing version
xtask publish (per group) must:
- Check if version exists before publishing
- Distinguish
EPUBLISHCONFLICT(skip, continue) from real errors (fail) - Handle retries without re-publishing successes
The CI workflow builds all plugins into dist/plugins first, then one job
calls xtask publish npm -o dist/plugins to publish every npm package in one
go.
xtask publish npm reads the canonical version from version.json (written by
xtask gen --version) and enforces:
- Refuses to publish if the version is a dev one like
0.0.0-dev - Refuses to publish if any npm package version does not match
version.json
This means you must run cargo xtask gen --version X.Y.Z before publishing
to npm, otherwise the command will fail instead of silently pushing the wrong
version (e.g., 0.0.0-dev) to the registry.
The source of truth for each language lives under its group and language,
inside a def/ directory. Generated artifacts live next to it as crate/
(static Rust crate) and npm/ (WASM plugin package).
langs/
├── group-acorn/ (Web languages)
│ ├── rust/
│ │ ├── def/ ← LANGUAGE DEFINITIONS (committed)
│ │ │ ├── arborium.kdl ← SOURCE OF TRUTH
│ │ │ ├── grammar/
│ │ │ │ ├── grammar.js ← tree-sitter grammar
│ │ │ │ └── scanner.c ← custom scanner (if any)
│ │ │ ├── queries/
│ │ │ │ └── highlights.scm ← highlight queries
│ │ │ └── samples/ ← test samples
│ │ ├── crate/ ← Static linking crate (generated)
│ │ └── npm/ ← WASM plugin package (generated)
│ ├── javascript/
│ ├── html/
│ └── [other web languages...]
├── group-birch/ (Systems: C, C++, Rust, Go, Zig, etc.)
├── group-cedar/ (JVM: Java, Scala, Kotlin, Clojure)
├── group-fern/ (Functional: Haskell, OCaml, Elixir, etc.)
├── group-hazel/ (Scripting: Python, Ruby, Bash, Lua, etc.)
├── group-maple/ (Config/data: TOML, YAML, JSON, SQL, etc.)
├── group-moss/ (Scientific: R, Julia, MATLAB, GLSL, etc.)
├── group-pine/ (Modern: Swift, Dart, Zig, etc.)
├── group-sage/ (Legacy/enterprise: C#, VB, Elisp, etc.)
└── group-willow/ (Markup: Markdown, HTML templates, etc.)
langs/group-{animal}/{lang}/crate/
├── Cargo.toml ← GENERATED by xtask gen
├── build.rs ← GENERATED by xtask gen
├── src/lib.rs ← GENERATED by xtask gen
└── grammar/
└── src/ ← GENERATED by xtask gen (tree-sitter generate)
├── parser.c
├── grammar.json
└── ...
langs/group-{animal}/{lang}/npm/
├── Cargo.toml ← GENERATED for cargo xtask build
├── src/
│ └── bindings.rs ← GENERATED bindings
└── package.json ← GENERATED npm package
These crates don't have arborium.kdl and are fully hand-written:
arborium(main crate)arborium-test-harnessarborium-sysrootarborium-hostarborium-wirearborium-plugin-runtime
-
Updates core crate versions:
arborium/Cargo.tomlversion = "X.Y.Z"arborium-collection/Cargo.tomlversion = "X.Y.Z"
-
Generates group workspace files:
langs/group-{animal}/Cargo.tomlwith member crates and version "X.Y.Z"
-
Generates grammar crate files from definitions:
- Reads
langs/group-{animal}/{lang}/def/arborium.kdland friends - Writes
langs/group-{animal}/{lang}/crate/Cargo.tomlwith version "X.Y.Z" - Writes
build.rswith correct C compilation setup - Writes
src/lib.rswith language exports - Runs tree-sitter generate into
grammar/src/*
- Reads
-
Generates WASM plugin packages:
- Writes
langs/group-{animal}/{lang}/npm/Cargo.tomlfor plugin builds - Writes
langs/group-{animal}/{lang}/npm/package.jsonfor npm publishing - Writes
langs/group-{animal}/{lang}/npm/src/bindings.rsgenerated bindings
- Writes
When called without --version, uses 0.0.0-dev (fine for local dev since path deps ignore versions).
# Edit arborium.kdl, grammar.js, queries, etc.
# Regenerate (uses 0.0.0-dev version, doesn't matter locally)
cargo xtask gen
# Build and test
cargo build
cargo testSee Release Flow above for the full tag + CI story.
| Registry | Package | Count |
|---|---|---|
| crates.io | arborium (core with inventory) |
1 |
| crates.io | arborium-collection (feature-gated) |
1 |
| crates.io | arborium-{lang} (static crates) |
98 |
| crates.io | arborium-test-harness |
1 |
| crates.io | arborium-sysroot |
1 |
| crates.io | tree-sitter-patched-arborium |
1 |
| crates.io | tree-sitter-highlight-patched-arborium |
1 |
| npmjs.com | @arborium/arborium (bundle) |
1 |
| npmjs.com | @arborium/{lang} (per-language WASM plugins) |
98 |
These projects use arborium for syntax highlighting:
| Integration | Description | Status |
|---|---|---|
| dodeca | Static site generator (website, GitHub) | In use |
| docs.rs | Via --html-in-header for TOML/shell highlighting in rustdoc |
Documented |
- Core Rust library with 69+ language grammars
- Theme system with 15+ bundled themes (Catppuccin, Dracula, Tokyo Night, etc.)
- HTML rendering with compact custom elements (
<a-k>vs<span class="keyword">) - ANSI output for terminal applications
- Browser IIFE drop-in script (auto-highlights code blocks)
- Browser ESM module for bundlers
- WASM compilation target (
wasm32-unknown-unknown) - docs.rs integration via
--html-in-header - IIFE skips already-highlighted blocks (docs.rs compatibility)
- WASM plugin system
- Plugin runtime crate (
arborium-plugin-runtime) - Host runtime crate (
arborium-host) - Injection dependency resolution in browser
- Dynamic grammar loading from bytes (currently loads from CDN)
- Plugin runtime crate (
- Tree-named grammar groups - 10 groups finalized (acorn, birch, cedar, etc.)
- API stability review - Documented above in "API Stability" section
- Language injection in browser - HTML→JS→SQL nesting works
- Visual regression tests - Playwright integration with CI
- Standalone CLI tool for highlighting files
- Plugin discovery/registry mechanism (dynamic grammar loading)
- Performance benchmarks and optimization pass
- Incremental parsing with
apply-editsupport - Sample files for all ~98 grammars (see
docs/samples_todo.md)
| Crate | Stability | Notes |
|---|---|---|
arborium |
Stable | Main entry point, semver guarantees |
arborium-highlight |
Stable | Core highlighting traits and types |
arborium-theme |
Stable | Theme definitions and builtins |
arborium-{lang} |
Stable | Per-language grammar crates |
arborium-wire |
Internal | Plugin protocol, may change |
arborium-plugin-runtime |
Internal | Plugin internals, may change |
arborium-host |
Internal | WASM host, may change |
arborium-sysroot |
Internal | WASM build support, may change |
- Update generate caching to tree-sitter-cli output only
- Standardize wasm-opt settings to -Oz