Error in user YAML: (<unknown>): mapping values are not allowed in this context at line 2 column 378
---
name: cluster-corpus-by-theme
description: Performs axial-coding-style thematic clustering over the substacker corpus of published posts to surface candidate sections. Uses Braun & Clarke's six-phase thematic analysis — familiarization, initial coding, searching for themes, reviewing themes, defining themes, naming. Reads full bodies, not titles. Use when re-opening the section question. Trigger keywords: cluster, theme, axial coding, thematic analysis, candidate sections.
---
Per Curator run:
- [ ] Step 1: Read every post in corpus/published/** end-to-end (not just titles)
- [ ] Step 2: Extract 3-5 codes per post (concepts, methods, domains)
- [ ] Step 3: Group codes across posts by semantic similarity (axial)
- [ ] Step 4: Validate clusters — split or merge where needed
- [ ] Step 5: Report candidate clusters with membership, cohesion, outliers
cluster_1:
candidate_handle: "kalshi-log"
posts: [list of slugs]
cohesion: high | medium | low
centroid_codes: [top 5 codes]
outlier_posts: [weakly-attached members]
rejected_clusters: [clusters with <3 posts]- Cohesion
high: ≥5 posts, shared centroid, clear register. - Cohesion
medium: 3-4 posts or mixed register. - Cohesion
low: cluster exists but coherence is weak. - Reject any cluster with <3 posts (below the 3-post floor for real sections).
- Read full post bodies. Titles are marketing; bodies are the beat.
- Do not force every post into a cluster. Outliers are legitimate.
- If >30% of corpus doesn't cluster coherently (all cohesion low), emit "corpus too heterogeneous" → Curator abandons section proposals this run.
- Include rejected clusters in output — they feed
recommend-pruneand "watch" candidates. - Single-threaded. Don't race on the corpus.