🔴 Autonomous Pentest Agent

A Claude-powered autonomous penetration testing framework that executes the full offensive security kill chain — from reconnaissance to root — against HackTheBox machines.

Overview

This prototype combines two cybersecurity use cases into a single cohesive system:

Autonomous CTF Solving — Claude reasons through challenges with clear success criteria (flags)
Automated Penetration Testing — Full kill chain state machine: Recon → Enumeration → Exploitation → Privilege Escalation

Each phase is driven by a specialized Claude agent with phase-appropriate system prompts, tool access, and success criteria. A central orchestrator manages state transitions, context accumulation, and reporting.

Architecture

┌───────────────���──────────────────────────────────────┐
│                 PHASE ORCHESTRATOR                    │
│   Kill chain state machine with phase transitions    │
└──────────────────────┬───────────────────────────────┘
                       │
        ┌──────────────┼──────────────┐────────────┐
        ▼              ▼              ▼            ▼
   ┌─────────┐   ┌──────────┐   ┌─────────┐  ┌─────────┐
   │  RECON  │   │  ENUM    │   │ EXPLOIT │  │ PRIVESC │
   │  Agent  │   │  Agent   │   │  Agent  │  │  Agent  │
   └────┬────┘   └────┬─────┘   └────┬────┘  └────┬────┘
        │              │              │            │
        └──────────────┼──────────────┘────────────┘
                       ▼
              ┌─────────────────┐
              │  TOOL EXECUTOR  │  Sandboxed command runner
              │  Safety checks  │  Timeout enforcement
              │  Audit logging  │  Blocklist / confirmation
              └────────┬────────┘
                       ▼
              ┌─────────────────┐
              │ FINDINGS MEMORY │  Cross-phase context
              │ Hypothesis      │  Dead-end tracking
              │ tracking        │  Deduplication
              └────────┬────────┘
                       ▼
           ┌────────────────────┐
           │ VALIDATION ENGINE  │  Writeup-based scoring
           │ REPORT GENERATOR   │  Full attack narrative
           └────────────────────┘

Key Features

Phase-Specialized Agents: Each kill chain phase has a dedicated Claude agent with tailored prompts and tool access
Structured Reasoning: Claude responds in structured JSON with explicit reasoning, enabling audit trails and debugging
Safety Guardrails: Command blocklist, confirmation prompts, timeouts, and dry-run mode
Memory & Hypothesis Tracking: Findings accumulate across phases; dead ends are tracked to prevent retry loops
Writeup-Based Validation: Ground truth from published writeups enables automated scoring against known-correct attack paths
Multi-Machine Support: Validated against HTB Shocker and Lame with YAML-based machine profiles

Validated Targets

Machine	OS	Difficulty	Key Techniques
Shocker	Linux	Easy	Shellshock (CVE-2014-6271), sudo perl privesc
Lame	Linux	Easy	Samba usermap_script (CVE-2007-2447), direct root

Quick Start

# Clone
git clone https://github.com/deltaRed1a/autonomous-pentest-agent.git
cd autonomous-pentest-agent

# Setup
python -m venv venv
source venv/bin/activate        # Linux/Mac
# .\venv\Scripts\Activate.ps1   # Windows
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY

# Dry run (no commands executed — validates logic + API)
python -m src.main --target 10.10.10.56 --machine shocker --attacker-ip 10.10.14.1 --dry-run

# Live run with confirmation prompts (on Kali/Pwnbox with HTB VPN)
python -m src.main --target 10.10.10.56 --machine shocker --attacker-ip YOUR_TUN0_IP

Usage

python -m src.main [OPTIONS]

Required:
  --target, -t        Target IP address
  --machine, -m       Machine profile (shocker|lame)
  --attacker-ip, -a   Your IP on the HTB VPN (tun0)

Optional:
  --dry-run           Don't execute commands (test logic only)
  --no-confirm        Skip command confirmation prompts
  --max-commands      Max total commands (default: 50)
  --timeout           Per-command timeout in seconds (default: 120)
  --output-dir, -o    Report output directory (default: output/)
  --model             Anthropic model (default: claude-sonnet-4-20250514)
  --log-level         DEBUG|INFO|WARNING|ERROR (default: INFO)

Project Structure

autonomous-pentest-agent/
├── src/
│   ├── main.py              # Entry point & CLI
│   ├── orchestrator.py      # Kill chain state machine
│   ├── state.py             # Phase/state data models
│   ├── agents/              # Phase-specific Claude agents
│   │   ├── recon.py         # Port scanning & service discovery
│   │   ├── enumeration.py   # Deep-dive service enumeration
│   │   ├── exploit.py       # Vulnerability ID & exploitation
│   │   └── privesc.py       # Privilege escalation
│   ├── tools/
│   │   └── executor.py      # Sandboxed command execution
│   ├── memory/
│   │   └── findings.py      # Cross-phase findings store
│   ├── reporting/
│   │   └── reporter.py      # Report generation
│   └── validation/
│       └── validator.py     # Writeup-based scoring
├── config/
│   ├── machines/            # Target machine profiles + ground truth
│   └── tools.yaml           # Available tool definitions
├── docs/
│   ├── design_doc.md        # Architecture & design decisions
│   └── final_report.md      # Results & analysis
├── tests/                   # Unit tests
├── requirements.txt
└── .env.example

Technology Stack

Claude claude-sonnet-4-20250514 (Anthropic SDK) — Reasoning engine for all phase agents
Python 3.12+ — Framework and orchestration
Rich — Terminal UI for live demo output
PyYAML — Machine configs and ground truth
Pydantic — Data validation
pytest — Testing

Safety & Ethics

This tool is designed for authorized penetration testing only against systems you have explicit permission to test (e.g., HackTheBox lab machines). The tool executor includes:

Blocked command patterns (destructive operations)
Optional confirmation prompts for every command
Dry-run mode for safe testing
Full audit logging of all commands and outputs

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
docs		docs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
check_python.ps1		check_python.ps1
full_windows_setup.ps1		full_windows_setup.ps1
htb_setup_checklist.sh		htb_setup_checklist.sh
preflight_check.sh		preflight_check.sh
pwnbox_setup.sh		pwnbox_setup.sh
requirements.txt		requirements.txt
run_dry.sh		run_dry.sh
run_lame.sh		run_lame.sh
run_shocker_confirmed.sh		run_shocker_confirmed.sh
setup_repo.sh		setup_repo.sh
setup_venv_windows.ps1		setup_venv_windows.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔴 Autonomous Pentest Agent

Overview

Architecture

Key Features

Validated Targets

Quick Start

Usage

Project Structure

Technology Stack

Safety & Ethics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔴 Autonomous Pentest Agent

Overview

Architecture

Key Features

Validated Targets

Quick Start

Usage

Project Structure

Technology Stack

Safety & Ethics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages