Skip to content

Commit f61190b

Browse files
Initial commit: LLM inference benchmarking tool for OpenAI-compatible providers
0 parents  commit f61190b

47 files changed

Lines changed: 4041 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.env.example

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# API Keys for inference providers
2+
# General Compute is served by SambaNova's cloud.
3+
SAMBANOVA_API_KEY=your_sambanova_api_key_here
4+
OPENROUTER_API_KEY=your_openrouter_api_key_here
5+
6+
# Optional Configuration Overrides
7+
# DEFAULT_ITERATIONS=50
8+
# RESULTS_DIR=./results
9+
# CONFIG_FILE=config/config.yaml

.github/workflows/ci.yml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: CI
2+
3+
on:
4+
pull_request:
5+
push:
6+
branches: [main]
7+
8+
jobs:
9+
test:
10+
runs-on: ubuntu-latest
11+
strategy:
12+
matrix:
13+
python-version: ["3.10", "3.11", "3.12"]
14+
15+
steps:
16+
- uses: actions/checkout@v4
17+
18+
- uses: actions/setup-python@v5
19+
with:
20+
python-version: ${{ matrix.python-version }}
21+
22+
- name: Install
23+
run: |
24+
python -m pip install --upgrade pip
25+
python -m pip install -e ".[dev]"
26+
27+
- name: Test
28+
run: pytest
29+
30+
- name: Lint
31+
run: ruff check src tests
32+
33+
- name: Type check
34+
run: mypy src

.gitignore

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Environment variables
2+
.env
3+
.env.*
4+
!.env.example
5+
6+
# Python
7+
__pycache__/
8+
*.py[cod]
9+
*$py.class
10+
*.so
11+
.Python
12+
build/
13+
develop-eggs/
14+
dist/
15+
downloads/
16+
eggs/
17+
.eggs/
18+
lib/
19+
lib64/
20+
parts/
21+
sdist/
22+
var/
23+
wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# Virtual environments
30+
venv/
31+
env/
32+
ENV/
33+
.venv
34+
35+
# IDE
36+
.vscode/
37+
.idea/
38+
*.swp
39+
*.swo
40+
*~
41+
.DS_Store
42+
43+
# Testing
44+
.pytest_cache/
45+
.ruff_cache/
46+
.coverage
47+
htmlcov/
48+
.tox/
49+
50+
# Results and output
51+
results/*
52+
!results/.gitkeep
53+
*.csv
54+
*.html
55+
*.xls
56+
*.xlsx
57+
*.xlsm
58+
!src/benchmarking/reporting/templates/*.html
59+
60+
# Logs
61+
*.log
62+
63+
# Type checking
64+
.mypy_cache/
65+
.dmypy.json
66+
dmypy.json

CONTRIBUTING.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Contributing
2+
3+
Thanks for improving GC Benchmarking.
4+
5+
## Development Setup
6+
7+
```bash
8+
python3 -m venv venv
9+
source venv/bin/activate
10+
pip install -e ".[dev]"
11+
```
12+
13+
Create `.env` from `.env.example` only when you need live provider calls. Unit
14+
tests should not require provider credentials.
15+
16+
## Checks
17+
18+
Run these before opening a pull request:
19+
20+
```bash
21+
pytest
22+
ruff check src tests
23+
mypy src
24+
```
25+
26+
Format changed Python files with:
27+
28+
```bash
29+
black src tests
30+
```
31+
32+
## Pull Request Guidance
33+
34+
- Keep generated benchmark output out of commits.
35+
- Do not include API keys, provider account identifiers, or private benchmark
36+
data.
37+
- Add or update tests for changes to metrics, config loading, CLI behavior, or
38+
report generation.
39+
- When changing benchmark methodology, document the tradeoff in `README.md`.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 General Compute
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# GC Benchmarking
2+
3+
LLM inference benchmarking for OpenAI-compatible providers. The tool runs the
4+
same logical model across every enabled provider that has a configured model ID,
5+
then reports time to first token, end-to-end latency, output throughput, token
6+
counts, retry attempts, and error rate.
7+
8+
The default configuration compares General Compute, served through SambaNova's
9+
cloud API, with OpenRouter for the same model families. Add or disable providers
10+
in `config/config.yaml` without changing code.
11+
12+
## Features
13+
14+
- Same-model provider comparisons
15+
- Provider interleaving within each iteration to reduce time-window bias
16+
- Warm-up requests that are discarded from metrics
17+
- Prompt variation to reduce provider-side cache effects
18+
- Streaming TTFT measurement, including reasoning-token streams
19+
- Incremental raw CSV writes so interrupted runs keep completed samples
20+
- CSV, HTML, and static-site JSON report generation
21+
22+
## Installation
23+
24+
Use Python 3.10 or newer.
25+
26+
```bash
27+
python3 -m venv venv
28+
source venv/bin/activate
29+
pip install -e ".[dev]"
30+
```
31+
32+
Or run the local setup script:
33+
34+
```bash
35+
./setup.sh
36+
```
37+
38+
## Configuration
39+
40+
Create a local `.env` from the example and add provider keys:
41+
42+
```bash
43+
cp .env.example .env
44+
```
45+
46+
Required by the default config:
47+
48+
```bash
49+
SAMBANOVA_API_KEY=your_sambanova_api_key_here
50+
OPENROUTER_API_KEY=your_openrouter_api_key_here
51+
```
52+
53+
`.env` and benchmark outputs are intentionally ignored by Git. Do not commit
54+
real API keys or generated result files.
55+
56+
By default, the CLI loads `config/config.yaml` from the current working
57+
directory when present. Otherwise, it falls back to the packaged default config.
58+
Set `CONFIG_FILE=/path/to/config.yaml` to use an explicit file.
59+
60+
## Usage
61+
62+
List configured providers, models, and workloads:
63+
64+
```bash
65+
benchmark providers
66+
benchmark models
67+
benchmark workloads
68+
```
69+
70+
Run a quick connectivity test:
71+
72+
```bash
73+
benchmark test --provider general_compute --model gpt-oss-120b --workload ctx_256 --iterations 1
74+
```
75+
76+
Run a benchmark:
77+
78+
```bash
79+
benchmark run --providers general_compute,openrouter --models gpt-oss-120b --workloads ctx_256,ctx_1k --iterations 5
80+
```
81+
82+
Run all enabled providers, models, and workloads:
83+
84+
```bash
85+
benchmark run --iterations 50
86+
```
87+
88+
Regenerate reports for an existing session:
89+
90+
```bash
91+
benchmark report <session-id>
92+
```
93+
94+
List local sessions:
95+
96+
```bash
97+
benchmark list-sessions
98+
```
99+
100+
## Workloads
101+
102+
The default workloads are context-size sweeps:
103+
104+
- `ctx_256`: 256 input tokens
105+
- `ctx_1k`: 1,024 input tokens
106+
- `ctx_4k`: 4,096 input tokens
107+
- `ctx_16k`: 16,384 input tokens
108+
- `ctx_64k`: 65,536 input tokens
109+
- `ctx_128k`: 131,072 input tokens
110+
111+
Token counts are approximate because prompts are generated with `tiktoken`
112+
`cl100k_base`, not each model provider's native tokenizer.
113+
114+
## Outputs
115+
116+
Results are written under `results/`:
117+
118+
- `session_<id>_raw.csv`: one row per request
119+
- `session_<id>_summary.csv`: aggregate statistics by model, provider, and workload
120+
- `session_<id>_report.html`: general HTML charts and tables
121+
- `session_<id>_provider_performance.html`: provider performance charts
122+
123+
HTML reports load Plotly from the public CDN. Use the CSV outputs if you need a
124+
fully offline artifact.
125+
126+
## Static Site Export
127+
128+
Export a completed session as pre-aggregated JSON for a static site:
129+
130+
```bash
131+
benchmark publish <session-id> --site-path ../my-site --label "June benchmark"
132+
```
133+
134+
This writes files under `../my-site/public/benchmarks/`:
135+
136+
- `manifest.json`
137+
- `<session-id>.json`
138+
- `<session-id>_raw.csv` unless `--no-copy-raw` is passed
139+
140+
Remove a published session:
141+
142+
```bash
143+
benchmark unpublish <session-id> --site-path ../my-site
144+
```
145+
146+
## Methodology Notes
147+
148+
Comparisons are meaningful only within the same logical model. OpenRouter is an
149+
aggregator, so its latency can include routing overhead and can vary by selected
150+
backend. Review provider routing settings in `config/config.yaml` before
151+
publishing benchmark claims.
152+
153+
The tool measures output throughput after TTFT, so decode speed is separated
154+
from queueing and prompt-processing overhead. Retries are limited to transient
155+
errors; failed attempts and backoff sleeps do not inflate successful-attempt
156+
latency metrics.
157+
158+
## Development
159+
160+
```bash
161+
pytest
162+
ruff check src tests
163+
mypy src
164+
```
165+
166+
Format code:
167+
168+
```bash
169+
black src tests
170+
```
171+
172+
## Security
173+
174+
Please do not open public issues with secrets, API keys, private benchmark data,
175+
or unpublished provider credentials. See `SECURITY.md` for reporting guidance.
176+
177+
## License
178+
179+
MIT. See `LICENSE`.

SECURITY.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Security Policy
2+
3+
## Reporting
4+
5+
Please report suspected vulnerabilities privately to the maintainers instead of
6+
opening a public issue.
7+
8+
Include:
9+
10+
- A concise description of the issue
11+
- Steps to reproduce or a proof of concept
12+
- Affected version or commit, when known
13+
- Any known impact on API keys, benchmark data, or generated reports
14+
15+
## Secret Handling
16+
17+
Never commit `.env`, provider API keys, account identifiers, private result
18+
files, or unpublished benchmark data. If a credential is committed or shared,
19+
rotate it with the provider immediately.
20+
21+
## Scope
22+
23+
Security-sensitive areas include:
24+
25+
- API key loading and error handling
26+
- Generated HTML reports
27+
- Static-site export files
28+
- CSV parsing and report regeneration from local session files

0 commit comments

Comments
 (0)