Skip to content

fix: strengthen structural noise extraction prompt#836

Merged
rwmjhb merged 2 commits into
CortexReach:masterfrom
TurboTheTurtle:fix/structural-noise-prompt
Jun 1, 2026
Merged

fix: strengthen structural noise extraction prompt#836
rwmjhb merged 2 commits into
CortexReach:masterfrom
TurboTheTurtle:fix/structural-noise-prompt

Conversation

@TurboTheTurtle

Copy link
Copy Markdown
Contributor

Summary

  • Strengthen the extraction prompt with structural noise rejection and distillation rules.
  • Instruct the extractor to reject or distill raw transcript carryover, runtime artifacts, fragment blobs, and long unprocessed excerpts.
  • Add prompt regression coverage and include it in the default test script.

Validation

  • node test/extraction-prompt-structural-noise.test.mjs
  • node test/smart-extractor-branches.mjs

Fixes #127.

@AliceLJY AliceLJY left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看了下 diff —— 在 buildExtractionPrompt 的"不要存为 memory"清单里加了 5 条 structural noise 规则:

  • Raw conversation carryover(quoted/attributed transcript ≥3 行)
  • System/runtime artifacts("System:", compaction notices, model-switch traces, tool-call transcripts, raw JSON)
  • Fragment blobs(mixed filename shards / code snippets / metadata fields / partial sentences)
  • Atomic memory shape(每条 memory 必须是一条 durable fact/preference/decision/entity/event/case/pattern)
  • Length/distillation gate(>200 chars 且看起来像原始对话的,先压缩成一句 factual statement,压不出来就 skip)

新加的 test/extraction-prompt-structural-noise.test.mjsassert.match 验证 prompt 里包含这 5 条规则的关键字 —— 是 prompt-content regression test,不是行为 e2e test,但作为 extraction prompt 改动的回归 anchor 是合理的。

package.json 也把新测试加进了 test script,会随 CI 跑。

行为面上,这 5 条规则会让 LLM 在 extraction 阶段更 conservative —— 长 transcript / raw log / fragment blob 更容易被 skip 而不是被压缩成 hollow memory。配合最近 mlp 在治理 noise extraction 的方向(#693 validation / #786 structural drift)是一致的。

LGTM —— 准备 approve 后 assign 给 @rwmjhb

@rwmjhb rwmjhb left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #836 Review: fix: strengthen structural noise extraction prompt

Verdict: APPROVE | 6 rounds completed | Value: 55% | Size: SMALL | Author: TurboTheTurtle

Value Assessment

Problem: The PR addresses structural memory contamination at extraction time, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being rejected or compressed into atomic facts.

Dimension Assessment
Value Score 55%
Value Verdict review
Issue Linked true
Project Aligned true
Duplicate false
AI Slop Score 0/6
User Impact medium
Urgency medium

Open Questions:

  • Issue #127 has no labels or assignment in the provided context, so maintainer acknowledgment of the issue itself cannot be confirmed even though the PR has an approving review.
  • Does the project expect this issue to be solved only at prompt level, or should a deterministic write-time gate in src/tools.ts and index.ts follow?

Summary

The PR addresses structural memory contamination at extraction time, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being rejected or compressed into atomic facts.

Evaluation Signals

Signal Value
Blockers 0
Warnings 0
PR Size SMALL
Verdict Floor approve
Risk Level normal
Value Model codex
Primary Model codex
Adversarial Model claude

Nice to Have

  • F1: Structural-noise fix only covers the LLM extraction prompt
  • MR1: New test provides zero behavioral coverage — it only asserts the prompt contains the strings the PR added
  • MR2: Vague '~200 character' distillation heuristic risks dropping legitimate long-form memories
  • MR3: dist/ artifact was hand-edited and the build was not run to verify it matches source

Recommended Action

Ready to merge.


Reviewed at 2026-05-31T04:52:29Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude

@rwmjhb

rwmjhb commented May 31, 2026

Copy link
Copy Markdown
Collaborator

PR #836 Review: fix: strengthen structural noise extraction prompt

Verdict: RESOLVE-CONFLICTS-FIRST | Author: TurboTheTurtle | Merge state: DIRTY

Pipeline short-circuited at the conflict gate after R0 verification. Deep review deferred until the branch rebases cleanly onto the base.

Problem Statement (R1)

The PR addresses structural memory contamination at extraction time, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being rejected or compressed into atomic facts.

Why This Stopped Here

GitHub reports mergeable=CONFLICTING (merge_state_status=DIRTY) for this PR. Reviewing the diff now would:

  1. Give feedback against a branch the author must rewrite anyway
  2. Produce findings that may be invalidated by the conflict resolution
  3. Waste review cycles on code that cannot be merged as-is

Recommended Action

Author should:

  1. Rebase this branch onto the latest base (or merge the base into this branch)
  2. Resolve all merge conflicts
  3. Push the rebased branch — the re-review will be picked up automatically

Reviewed at 2026-05-31T07:40:35Z | R0+R1 gate | Conflict gate

@TurboTheTurtle

Copy link
Copy Markdown
Contributor Author

Resolved the merge conflict by merging current master into the branch. The only manual conflict was the package.json test script; I kept both the structural-noise prompt regression test from this PR and the newer i18n memory trigger test from master.

Checked locally:

node test/extraction-prompt-structural-noise.test.mjs
node test/i18n-memory-triggers.test.mjs
git diff --check

@rwmjhb rwmjhb left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #836 Review: fix: strengthen structural noise extraction prompt

Verdict: APPROVE | 6 rounds completed | Value: 55% | Size: SMALL | Author: TurboTheTurtle

Value Assessment

Problem: The PR addresses structural memory contamination during extraction, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being skipped or distilled into atomic facts.

Dimension Assessment
Value Score 55%
Value Verdict review
Issue Linked true
Project Aligned true
Duplicate false
AI Slop Score 0/6
User Impact medium
Urgency medium

Open Questions:

  • Issue #127 has no labels or assignment in the provided context, so maintainer acknowledgment of the issue itself cannot be confirmed from issue metadata.
  • Should this issue be considered resolved by prompt hardening alone, or should a deterministic persistence-time validation gate follow?

Summary

The PR addresses structural memory contamination during extraction, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being skipped or distilled into atomic facts.

Evaluation Signals

Signal Value
Blockers 0
Warnings 0
PR Size SMALL
Verdict Floor approve
Risk Level normal
Value Model codex
Primary Model codex
Adversarial Model claude

Nice to Have

  • F1: Structural-noise rules only affect the LLM extraction prompt
  • MR1: New test validates the TS source, not the committed dist artifact that runtime actually loads

Recommended Action

Ready to merge.


Reviewed at 2026-06-01T06:08:48Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude

@rwmjhb rwmjhb merged commit ae7c83a into CortexReach:master Jun 1, 2026
8 checks passed
@TurboTheTurtle TurboTheTurtle deleted the fix/structural-noise-prompt branch June 1, 2026 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Noise filter misses structural memory contamination at write time (System traces / raw blobs / fragments)

3 participants