fix: strengthen structural noise extraction prompt#836
Conversation
AliceLJY
left a comment
There was a problem hiding this comment.
看了下 diff —— 在 buildExtractionPrompt 的"不要存为 memory"清单里加了 5 条 structural noise 规则:
- Raw conversation carryover(quoted/attributed transcript ≥3 行)
- System/runtime artifacts("System:", compaction notices, model-switch traces, tool-call transcripts, raw JSON)
- Fragment blobs(mixed filename shards / code snippets / metadata fields / partial sentences)
- Atomic memory shape(每条 memory 必须是一条 durable fact/preference/decision/entity/event/case/pattern)
- Length/distillation gate(>200 chars 且看起来像原始对话的,先压缩成一句 factual statement,压不出来就 skip)
新加的 test/extraction-prompt-structural-noise.test.mjs 用 assert.match 验证 prompt 里包含这 5 条规则的关键字 —— 是 prompt-content regression test,不是行为 e2e test,但作为 extraction prompt 改动的回归 anchor 是合理的。
package.json 也把新测试加进了 test script,会随 CI 跑。
行为面上,这 5 条规则会让 LLM 在 extraction 阶段更 conservative —— 长 transcript / raw log / fragment blob 更容易被 skip 而不是被压缩成 hollow memory。配合最近 mlp 在治理 noise extraction 的方向(#693 validation / #786 structural drift)是一致的。
LGTM —— 准备 approve 后 assign 给 @rwmjhb。
rwmjhb
left a comment
There was a problem hiding this comment.
PR #836 Review: fix: strengthen structural noise extraction prompt
Verdict: APPROVE | 6 rounds completed | Value: 55% | Size: SMALL | Author: TurboTheTurtle
Value Assessment
Problem: The PR addresses structural memory contamination at extraction time, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being rejected or compressed into atomic facts.
| Dimension | Assessment |
|---|---|
| Value Score | 55% |
| Value Verdict | review |
| Issue Linked | true |
| Project Aligned | true |
| Duplicate | false |
| AI Slop Score | 0/6 |
| User Impact | medium |
| Urgency | medium |
Open Questions:
- Issue #127 has no labels or assignment in the provided context, so maintainer acknowledgment of the issue itself cannot be confirmed even though the PR has an approving review.
- Does the project expect this issue to be solved only at prompt level, or should a deterministic write-time gate in src/tools.ts and index.ts follow?
Summary
The PR addresses structural memory contamination at extraction time, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being rejected or compressed into atomic facts.
Evaluation Signals
| Signal | Value |
|---|---|
| Blockers | 0 |
| Warnings | 0 |
| PR Size | SMALL |
| Verdict Floor | approve |
| Risk Level | normal |
| Value Model | codex |
| Primary Model | codex |
| Adversarial Model | claude |
Nice to Have
- F1: Structural-noise fix only covers the LLM extraction prompt
- MR1: New test provides zero behavioral coverage — it only asserts the prompt contains the strings the PR added
- MR2: Vague '~200 character' distillation heuristic risks dropping legitimate long-form memories
- MR3: dist/ artifact was hand-edited and the build was not run to verify it matches source
Recommended Action
Ready to merge.
Reviewed at 2026-05-31T04:52:29Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude
PR #836 Review: fix: strengthen structural noise extraction promptVerdict: RESOLVE-CONFLICTS-FIRST | Author: TurboTheTurtle | Merge state: DIRTY
Problem Statement (R1)The PR addresses structural memory contamination at extraction time, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being rejected or compressed into atomic facts. Why This Stopped HereGitHub reports
Recommended ActionAuthor should:
Reviewed at 2026-05-31T07:40:35Z | R0+R1 gate | Conflict gate |
…e-prompt # Conflicts: # package.json
|
Resolved the merge conflict by merging current Checked locally: |
rwmjhb
left a comment
There was a problem hiding this comment.
PR #836 Review: fix: strengthen structural noise extraction prompt
Verdict: APPROVE | 6 rounds completed | Value: 55% | Size: SMALL | Author: TurboTheTurtle
Value Assessment
Problem: The PR addresses structural memory contamination during extraction, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being skipped or distilled into atomic facts.
| Dimension | Assessment |
|---|---|
| Value Score | 55% |
| Value Verdict | review |
| Issue Linked | true |
| Project Aligned | true |
| Duplicate | false |
| AI Slop Score | 0/6 |
| User Impact | medium |
| Urgency | medium |
Open Questions:
- Issue #127 has no labels or assignment in the provided context, so maintainer acknowledgment of the issue itself cannot be confirmed from issue metadata.
- Should this issue be considered resolved by prompt hardening alone, or should a deterministic persistence-time validation gate follow?
Summary
The PR addresses structural memory contamination during extraction, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being skipped or distilled into atomic facts.
Evaluation Signals
| Signal | Value |
|---|---|
| Blockers | 0 |
| Warnings | 0 |
| PR Size | SMALL |
| Verdict Floor | approve |
| Risk Level | normal |
| Value Model | codex |
| Primary Model | codex |
| Adversarial Model | claude |
Nice to Have
- F1: Structural-noise rules only affect the LLM extraction prompt
- MR1: New test validates the TS source, not the committed dist artifact that runtime actually loads
Recommended Action
Ready to merge.
Reviewed at 2026-06-01T06:08:48Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude
Summary
Validation
node test/extraction-prompt-structural-noise.test.mjsnode test/smart-extractor-branches.mjsFixes #127.