This repo (mesh-llm) contains mesh-llm — a Rust binary that pools GPUs over QUIC for distributed LLM inference using llama.cpp.
| Doc | What it covers |
|---|---|
README.md |
Usage, install, CLI flags, examples |
CONTRIBUTING.md |
Build from source, dev workflow, UI dev |
RELEASE.md |
Release process (build, bundle, tag, GitHub release) |
ROADMAP.md |
Future directions |
PLAN.md |
Historical design notes and benchmarks |
mesh-llm/TODO.md |
Current work items and backlog |
mesh-llm/README.md |
Rust crate overview and file map |
mesh-llm/docs/DESIGN.md |
Architecture, protocols, features |
mesh-llm/docs/TESTING.md |
Test playbook, scenarios, remote deploy |
mesh-llm/docs/MoE_PLAN.md |
MoE expert sharding design |
mesh-llm/docs/MoE_DEPLOY_DESIGN.md |
MoE auto-deploy UX |
mesh-llm/docs/MoE_SPLIT_REPORT.md |
MoE splitting validation results |
fly/README.md |
Fly.io deployment (console + API apps) |
relay/README.md |
Self-hosted iroh relay on Fly |
Always use just. Never build manually.
just build # llama.cpp fork + mesh-llm + UI
just bundle # portable tarball
just stop # kill mesh/rpc/llama processes
just test # quick inference test against :9337
just auto # build + stop + start with --auto
just ui-dev # vite dev server with HMRSee CONTRIBUTING.md for full dev workflow.
mesh-llm/src/— Rust sourcemesh-llm/ui/— React web console (shadcn/ui patterns, see https://ui.shadcn.com/llms.txt)mesh-llm/docs/— Design and testing docsfly/— Fly.io deployment (console + API client apps)relay/— Self-hosted iroh relayevals/— Benchmarking and evaluation scripts
The crate root should stay minimal.
- Keep
mesh-llm/src/lib.rsandmesh-llm/src/main.rsas the only root.rsfiles unless there is a strong reason otherwise. - New code should go into an existing domain directory when possible.
Use semantic ownership for module placement.
mesh-llm/src/cli/— Clap types, command parsing, command dispatch, and user-facing command handlers.mesh-llm/src/runtime/— top-level process orchestration and startup/runtime coordination.mesh-llm/src/network/— request routing, proxying, tunneling, relay/discovery networking, request-affinity logic, and endpoint rewrite support.mesh-llm/src/inference/— model-serving logic, election, launch, pipeline, and MoE behavior.mesh-llm/src/system/— machine-local environment and platform concerns such as hardware detection, benchmarking, self-update, and local system integration.mesh-llm/src/models/— model catalog, resolution, downloads, local model storage, and model metadata.mesh-llm/src/mesh/— peer membership, gossip, identity, peer state, and mesh node behavior.mesh-llm/src/plugin/— plugin host, plugin runtime, transport, config, and MCP bridge support.mesh-llm/src/api/— management API surface and route handling.mesh-llm/src/protocol/— wire protocol types, encoding/decoding, and conversions.
CLI ownership rule.
- All command handlers belong under
mesh-llm/src/cli/, usuallymesh-llm/src/cli/commands/. - Domain modules should not own Clap parsing or top-level command dispatch.
- Domain modules may expose reusable functions that CLI handlers call.
Do not introduce generic buckets.
- Avoid directories or modules named
app,utils,misc,common, or similar catch-alls. - Name modules after the responsibility they own.
Keep shared code honest.
- If code is only used by one subsystem, keep it inside that subsystem.
- Only move code to a shared module when it is truly cross-domain.
- Do not create shared helpers prematurely.
Prefer semantic grouping over symmetry.
- Do not create one directory per file just for visual symmetry.
- A single
foo.rsfile is already a Rust module; use a directory only whenfoohas meaningful substructure.
Minimize crate-root re-exports.
- Root re-exports are acceptable as temporary compatibility shims during refactors.
- New code should prefer importing from the owning module directly.
- Remove transitional re-exports once call sites have been updated.
When to split a file.
- Split a file when it contains multiple separable responsibilities, when navigation becomes difficult, or when tests naturally cluster by concern.
- Do not split purely to reduce line count if the code still represents one coherent object or subsystem.
Naming rule.
- File and module names should describe responsibility, not implementation detail.
- Prefer names like
affinity,discovery,transport,maintenance,warnings. - Avoid vague names like
helpers,stuff,logic, ormanagerunless the abstraction is genuinely that broad.
Current structure notes.
- Request-affinity code belongs with networking/routing behavior, not
system/. - Plugin MCP support belongs inside
mesh-llm/src/plugin/, not as a separate root module. - Model command handlers belong in
mesh-llm/src/cli/commands/;mesh-llm/src/models/should stay domain-focused.
mesh-llm/src/main.rs— CLI args, orchestration:run_auto(),run_idle(),run_passive()mesh-llm/src/mesh.rs—Nodestruct, gossip, mesh_id, peer managementmesh-llm/src/election.rs— Host election, tensor split calculationmesh-llm/src/proxy.rs— HTTP proxy: request parsing, model routing, response helpersmesh-llm/src/api.rs— Management API (:3131):/api/status,/api/events,/api/discover,/api/joinmesh-llm/src/nostr.rs— Nostr discovery,score_mesh(),smart_auto()mesh-llm/src/download.rs— Model catalog (MODEL_CATALOG), HuggingFace downloadsmesh-llm/src/moe.rs— MoE detection, expert rankings, split orchestrationmesh-llm/src/launch.rs— llama-server/rpc-server process management
When iterating on the plugin protocol, always consider protocol compatibility.
- If a protocol change may be breaking, explicitly ask the developer whether the change is intended to be breaking.
- If the change is not intended to be breaking, the previous version of the plugin protocol must continue to be supported.
- Do not silently ship plugin protocol changes that strand older plugins or hosts without confirming that outcome is acceptable.
For changes in mesh-llm/ui/, use components and compose interfaces consistently with shadcn/ui patterns. Prefer extending existing primitives in ui/src/components/ui/ over ad-hoc markup.
Read mesh-llm/docs/TESTING.md before running tests. It has all test scenarios, remote deploy instructions, and cleanup commands.
Before committing Rust changes, format only the changed Rust files from the repo root, for example with cargo fmt --all -- path/to/file.rs, and include those formatting changes in the commit.
Do not leave Rust compiler warnings behind in code you touched.
- Fix or remove unused code, dead code, and other warnings introduced or surfaced by your change before committing.
- Do not silence warnings with
#[allow(...)]unless there is a clear reason and the developer has asked for that tradeoff.
Pull request titles and descriptions should be user-focused by default.
- Title PRs around the user-visible change or capability, not the implementation detail.
- Start the description with what the user can now do, see, or understand after the change.
- Keep architectural refactors, internal state reshaping, and code-organization notes out of the opening summary unless they directly change user behavior.
- If there are important architectural changes, add a separate
## Architecturesection. - If there are protocol or compatibility implications, add a separate
## Protocolsection that clearly calls out compatibility, migration, or breaking-change impact. - If the PR changes CLI behavior or touches user-facing CLI flows, include example commands and representative output in the PR description.
- If the PR changes the UI, include at least one screenshot in the PR description.
- Validation and screenshots should stay separate from the user-facing summary.
just bundle
# scp bundle to remote, tar xzf, codesign -s - the three binariespkill -f mesh-llm; pkill -f rpc-server; pkill -f llama-serverEvery deploy to test machines MUST follow this checklist.
- Bump VERSION in
main.rsso you can verify the running binary is new code. just build && just bundle- Kill ALL processes on ALL nodes —
pkill -9 -f mesh-llm; pkill -9 -f llama-server; pkill -9 -f rpc-server - Verify clean —
ps -eo pid,args | grep -E 'mesh-llm|llama-server|rpc-server' | grep -v grepmust be empty. - Deploy bundle — scp + tar + codesign on remote nodes.
- Verify version —
mesh-llm --versionon every node.
- Verify exactly 1 mesh-llm process per node.
- Verify child processes (at most 1 rpc-server + 1 llama-server per mesh-llm).
curl -s http://localhost:3131/api/statusreturns valid JSON on every node.- Check
/api/statuspeers for new version string. - Verify expected peer count.
- Test inference through every model in
/v1/models. - Test
/v1/passthrough on port 3131.
If llama-server fails to start (stuck at "⏳ Starting llama-server..."), check its log file. Rust's std::env::temp_dir() on macOS points to the per-user temp dir, not /tmp:
cat "$(python3 -c 'import tempfile; print(tempfile.gettempdir())')/mesh-llm-llama-server.log"Typical path: /var/folders/XX/.../T/mesh-llm-llama-server.log. rpc-server logs are in the same directory as mesh-llm-rpc-{port}.log.
- nohup over SSH doesn't stick — use
bash -c "nohup ... & disown", verify process survives disconnect. - Duplicate processes — always kill-verify-start.
- codesign changes the hash — don't compare local vs codesigned remote.
See RELEASE.md for the full process.
Current release flow:
- Build and verify locally:
just build just bundle
- Release from a clean local
mainbranch:This bumps the version, refreshesjust release v0.X.Y
Cargo.lockwithout upgrading dependencies, commits asv0.X.Y: release, pushesmain, and then pushes only the new release tag. - Pushing a
v*tag triggers.github/workflows/release.yml, which builds the release artifacts on Linux CPU, Linux CUDA, and macOS and creates the GitHub release automatically.
Test machine IPs, SSH details, and passwords are in ~/Documents/private-note.txt (outside the repo). Never commit credentials to any tracked file.
- No
api_key_tokenfeature — explicitly rejected, removed in v0.26.0 - No credentials in tracked files — IPs, passwords, SSH commands belong in
~/Documents/private-note.txtonly