A long-running autonomous agent scaffold that competes in Kaggle competitions. AutoKG orchestrates AI agents through iterative research-experiment-review rounds, accumulating knowledge and improving submissions until the competition ends or a round budget is exhausted.
Round 0: Competition Setup & Baseline
Planner reads competition description, downloads data, does EDA, establishes baseline
Round 1-N: Iterative Research & Experimentation
Planning -> Research papers/forums, read user comments, review past rounds, write plan
Development -> Implement experiments, train models, log metrics, optimize
Review -> Verify approach, submit to Kaggle, wait for LB score, write round review
Loop continues until max rounds or user cancellation
Three AI agents rotate through each round:
| Agent | Role | Output |
|---|---|---|
| Planner | Research papers, forums, notebooks; decide what to try | rounds/round_N/plan.md |
| Developer | Implement experiments, train models, log CV scores | rounds/round_N/dev_log.md |
| Reviewer | Validate, submit to Kaggle, analyze CV-LB gap | rounds/round_N/review.md |
- Never stop improving - Even at rank #1, keep iterating to maintain edge
- Knowledge accumulates - Every round builds on all previous findings
- User steerable - Inject guidance via comments or kill signal at any time
- Research-first - Deep investigation before coding, not just trying random things
- Reproducible - Every experiment is logged and documented
- Full hardware utilization - Agents use all available GPUs directly on the host
- Clean environments - All Python packages managed via
uv
- bash >= 4.0
- jq - JSON processing
- Claude Code - AI agent CLI backend (default)
- uv - Python package manager
- tmux (optional) - Monitor dashboard
./install.shRecommended MCP setup:
"mcpServers": {
"context7": {
"url": "https://mcp.context7.com/mcp",
"headers": {
"CONTEXT7_API_KEY": "Your-Key"
}
},
"kaggle": {
"url": "https://www.kaggle.com/mcp",
"type": "http",
"headers": {
"Authorization": "Bearer Your-Key"
}
},
"arxiv-mcp-server": {
"command": "uv",
"args": [
"tool",
"run",
"arxiv-mcp-server",
"--storage-path",
".cache/paper"
]
},
"browser-use": {
"command": "npx",
"args": [
"mcp-remote",
"https://api.browser-use.com/mcp",
"--header",
"X-Browser-Use-API-Key: Your-Key"
]
},
"hf-mcp-server": {
"url": "https://huggingface.co/mcp?login",
"headers": {}
},
"ssh-mcp": {
"command": "npx",
"args": [
"ssh-mcp",
"-y",
"--",
"--host=Your-Host",
"--user=username",
"--password=pass",
"--timeout=30000",
"--maxChars=none"
]
}
}
Fill in your auth keys. ssh-mcp is a compute cluster for heavy model training.
# Validate agent CLI, auth, model, and dependencies
autokg checkSample output:
[1/5] Claude Code CLI
✓ Installed
[2/5] Authentication
+ Logged in as user@example.com
[3/5] MCP Servers
✓ kaggle: running
! context7: not loaded (needs approval)
[4/5] Model Configuration
✓ Model 'opus-4.5-thinking' is available
[5/5] Dependencies
✓ jq: jq-1.7.1
✓ uv: uv 0.9.11
# Basic init
autokg init titanic
# With a competition spec file
autokg init titanic --spec MYSPEC.mdThis creates the full project structure:
.autokg/ # AutoKG state
config.json # Project configuration
round_state.json # Current round/phase tracking
status.json # Agent communication file
competition.json # Competition metadata
leaderboard.json # Submission & score history
knowledge_base.json # Accumulated research knowledge
master_comments.json # User-injected guidance
rounds/ # Per-round artifacts (plan.md, dev_log.md, review.md)
data/raw/ # Competition data
src/ # Experiment code
models/ # Saved model artifacts
submissions/ # Submission CSV files
# Start the agent loop
autokg run
# Run with tmux dashboard (3 panes: runner, status, scores)
autokg --monitor run# Inject guidance (highest priority - planner reads these first)
autokg comment "Try using LightGBM instead of XGBoost"
autokg comment "Focus on feature engineering for the next round"
autokg comment "Check arxiv paper 2401.xxxxx for feature selection ideas"
# Graceful stop (finishes current phase, then exits)
autokg --kill# Current status (round, phase, scores, rate limits)
autokg status
# Score history table (CV, LB, gap, rank, trend)
autokg scores
# Accumulated knowledge base
autokg knowledge
autokg knowledge --category modelautokg init <competition> [--spec <file>] Initialize for a competition
autokg check Preflight check (auth, MCP, deps)
autokg run Run the agent loop
autokg status Show current status
autokg comment "<message>" Add user guidance
autokg scores Show score history & trends
autokg knowledge [--category <cat>] View knowledge base
autokg --monitor run Run with tmux dashboard
autokg --model <model> run Use specific AI model
autokg --kill Graceful stop signal
autokg --reset-rate-limit Reset API rate limiter
autokg --help Show help
autokg --version Show version
| Code | Meaning |
|---|---|
| 0 | User --kill (graceful exit) |
| 1 | General error |
| 21 | Max rounds reached |
Edit .autokg/config.json after init:
{
"agent": {
"cli_tool": "claude",
"model": "opus",
"timeout_minutes": 1440,
"models_by_role": {
"planner": "opus-4.6-thinking",
"developer": "gpt-5.3-codex-xhigh-fast",
"reviewer": "opus-4.6-thinking"
}
},
"round": {
"max_rounds": 50,
"planning_max_loops": 3,
"development_max_loops": 10,
"review_max_loops": 5
},
"rate_limiting": {
"max_calls_per_hour": 100
},
"kaggle": {
"daily_submission_limit": 5
}
}autokg.sh Main CLI and orchestration loop
lib/
utils.sh Logging, timestamps, JSON helpers
rate_limiter.sh API call rate + Kaggle submission tracking
round_manager.sh Round state, phase transitions, kill signal
agent_adapter.sh Agent CLI integration (Claude Code / cursor-agent), prompt generation
metrics_collector.sh Score tracking, CV-LB gap analysis, trends
competition_manager.sh Competition metadata management
knowledge_manager.sh Knowledge base CRUD
comment_manager.sh User comment management
prompts/
planner.md Research-first planning prompt
developer.md ML engineering prompt
reviewer.md Submission management prompt
templates/
config.json Default configuration
No circuit breaker. Unlike Sprinty (the reference project), AutoKG never stops on agent errors. If an agent crashes, the orchestrator retries up to max_loops for that phase, then moves on to the next phase. For Kaggle, persistence beats correctness of any single round.
Knowledge accumulation. knowledge_base.json stores confirmed insights, failed approaches, reviewed papers, forum findings, and top notebook analyses. Every round's planner reads this to avoid repeating mistakes and build on what works.
User steering. master_comments.json lets users inject guidance at any time. The planner reads unread comments before anything else - they take highest priority over all other research.
Kill signal. autokg --kill writes a .autokg/.kill file. The orchestrator checks for it between phases and exits gracefully, allowing the current phase to finish.
| File | Purpose |
|---|---|
competition.json |
Competition name, metric, deadline, submission limits, best scores |
leaderboard.json |
Full submission history with CV/LB scores, gaps, ranks, trends |
knowledge_base.json |
Insights, failed approaches, papers, forums, notebooks reviewed |
master_comments.json |
User-injected guidance with read/unread status |
.autokg/round_state.json |
Current round number, phase, loop count, round history |
.autokg/status.json |
Agent communication - agents update this after each execution |
.autokg/config.json |
Project configuration (models, timeouts, limits) |