OpenGUI-RL does not redistribute benchmark datasets, raw screenshots, cached benchmark records, or model checkpoints. Users should obtain each dataset from its official source and follow its license and terms.
| Benchmark | Role in the project | Public release status |
|---|---|---|
| Mind2Web | Main training and split-wise generalization benchmark for candidate-aware GUI grounding. | Not redistributed. Users must obtain access from the official source. |
| ScreenSpot-v2 | Primary held-out GUI grounding benchmark for point-native and dual-path transfer inference. | Not redistributed. Configs can load from the upstream Hugging Face dataset when available. |
| VisualWebBench | Supplementary transfer benchmark for protocol-sensitive transfer analysis. | Not redistributed. Configs can load from the upstream dataset when available. |
| ScreenSpot-Pro | Mentioned in the proposal as an optional stress test. | Not completed in the final reported scope. |
The repository keeps only placeholder directories under data/:
data/
├── raw/ # user-provided original benchmark payloads
├── interim/ # temporary conversion products
├── processed/ # model-ready local records and screenshots
└── manifests/ # lightweight local manifests
Large files in these directories are ignored by git. The tracked .gitkeep files only preserve the directory structure.
A GUI grounding example is represented conceptually as:
{
"sample_id": "example_0001",
"screenshot_path": "path/to/screenshot.png",
"instruction": "Click the Submit button.",
"candidates": [
{
"candidate_slot": 0,
"text": "Submit",
"bbox": [120, 340, 220, 380],
"source": "ocr_or_dom"
}
],
"target": {
"candidate_slot": 0,
"click_point": [170, 360],
"bbox": [120, 340, 220, 380],
"action_type": "click"
}
}See data_examples/sample_gui_grounding_record.jsonl for a tiny synthetic example that documents the expected interface without using benchmark data.
OCR and DOM are external structured observations, not separate trained encoders inside the model. The candidate-aware setting serializes these cues as compact text / metadata context for the VLM policy:
screenshot + instruction + OCR/DOM candidate list -> Qwen2.5-VL + LoRA -> structured GUI action
This distinction matters for transfer: candidate-aware grounding works when the benchmark exposes semantically meaningful candidates, but it does not transfer unchanged to anonymous option-box protocols.
The public repository includes:
- code and configs,
- public documentation and summary artifacts,
- small summary figures and tables,
- synthetic examples,
- documentation for expected dataset interfaces.
The public repository excludes:
- raw benchmark screenshots,
- processed benchmark records,
- cached benchmark prediction dumps,
- model checkpoints and LoRA weights,
- local training logs and experiment caches.