Agent-based automation for a global integration team. This is the reference
implementation skeleton for the architecture described in docs/.
Shape: agents reason, Ansible mutates, Temporal orchestrates, skills are
the unit of change. See docs/agent-automation-implementation.md for the
full design.
docs/ # Full design docs + vision
skills/ # Versioned runbooks (mutation, diagnostic, composite)
duplicate-matching-lines-with-suffix/ (mutation archetype)
top-process-check/ (diagnostic archetype)
guarded-change/ (composite archetype)
agents/ # L3 orchestration agents
incident-responder/ (canonical template)
src/
control_plane/ # Client + Temporal workflow stubs
agent_runtime/ # Agent SDK runtime + AuditHooks stub
tests/ # Skill folder linter + unit tests
scripts/ # bootstrap, validate_skills, run_skill_locally
.vscode/ # Editor config (interpreter, extensions, launch)
.github/workflows/ # CI: skill lint + signature
docker-compose.yml # Local Temporal + Postgres + MinIO
Makefile # Common commands
pyproject.toml # Python deps + tool config
- Python 3.11+
- Docker + Docker Compose
- VSCode with the Python + YAML + Ansible extensions
(see
.vscode/extensions.json— VSCode will prompt to install them)
./scripts/bootstrap.shThis creates a virtualenv in .venv/, installs dependencies, and prints
next steps. If you prefer to do it by hand:
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .envdocker compose up -dThis starts:
- Temporal on
localhost:7233(+ Web UI onhttp://localhost:8233) - Postgres on
localhost:5432(audit store) - MinIO on
localhost:9000(S3-compatible blob store for snapshots)
Verify:
make statusmake lint-skillsThe skill linter validates every folder under skills/:
- SKILL.md present and non-empty
metadata.yamlparses and matches the schema for itsoperation_typeinputs.schema.jsonis valid JSON Schema- Mutation skills have
apply.yml+rollback.yml+verify.sh - Diagnostic skills have
apply.yml+verify.sh+inverse: null - Composite skills have
compose.yaml+verify.sh+ no playbooks
make testcode .VSCode will:
- Pick up the
.venv/Python interpreter automatically. - Prompt to install the recommended extensions.
- Offer launch configurations for:
- Running the skill linter
- Running a skill locally via
scripts/run_skill_locally.py - Starting the Temporal worker (once you wire it up)
- Copy an existing skill folder that matches your archetype:
- Mutation → copy
skills/duplicate-matching-lines-with-suffix/ - Diagnostic → copy
skills/top-process-check/ - Composite → copy
skills/guarded-change/
- Mutation → copy
- Rename the folder to your skill id.
- Edit
SKILL.md(human intent),metadata.yaml(safety envelope),inputs.schema.json(input contract). - Replace
apply.yml/rollback.yml/verify.shwith your actual logic. - Run
make lint-skills— all checks must pass. - Add tests under
tests/skills/<your_skill_id>/. - Open a PR. CI runs the linter + Molecule tests + signs the skill on
merge to
main.
See docs/agent-automation-implementation.md §3 for the full skill
contract and §10 for worked examples.
- Copy
agents/incident-responder/. - Edit
metadata.yaml— setallowed_tools,disallowed_tool_classes,allowed_skill_invocation_types, model pin, reasoning caps. - Write
system_prompt.md. - Implement the tool adapters in
tools.py— keep them read-only or plan-producing. Never add shell or file-write tools. - Implement the driver workflow in
workflow.py. - Register the agent with the runtime in
src/agent_runtime/registry.py.
See docs/agent-automation-implementation.md §11 for the L3 design.
This platform is only safe if these hold. Review any PR that might break one with extra scrutiny.
- Agents cannot mutate. Tool surfaces are declarative; the loader
enforces
allowed_tools. No agent hasshell,file_write, or SSH. - Every mutation is reversible. Mutation skills ship
rollback.yml;apply.ymlsnapshots pre-state before any change. - Every action is auditable. Every tool call, workflow activity, and mutation writes a hash-chained event to Postgres; snapshots go to S3/MinIO.
- The system can stop itself.
kill_switch_groupin skill and agent metadata; Temporal workers honor thepausedflag.
In this order:
docs/agent-automation-implementation.md— overall architecture.skills/top-process-check/— simplest complete skill (diagnostic).skills/duplicate-matching-lines-with-suffix/— mutation with rollback.skills/guarded-change/compose.yaml— composite step graph.agents/incident-responder/— L3 agent folder.src/control_plane/client.py— the API surface skills and agents call.scripts/validate_skills.py— reference implementation of the skill contract (useful as executable documentation).
This is a starter repo. The intent is: the skills, agents, design, and
folder discipline are real and correct; the runtime glue (src/) is
scaffolding. Filling in the runtime is Phase 1 of the build plan in
docs/agent-automation-implementation.md §7.