fix: strengthen structural noise extraction prompt by TurboTheTurtle · Pull Request #836 · CortexReach/memory-lancedb-pro

TurboTheTurtle · 2026-05-27T19:57:54Z

Summary

Strengthen the extraction prompt with structural noise rejection and distillation rules.
Instruct the extractor to reject or distill raw transcript carryover, runtime artifacts, fragment blobs, and long unprocessed excerpts.
Add prompt regression coverage and include it in the default test script.

Validation

node test/extraction-prompt-structural-noise.test.mjs
node test/smart-extractor-branches.mjs

Fixes #127.

AliceLJY

看了下 diff —— 在 buildExtractionPrompt 的"不要存为 memory"清单里加了 5 条 structural noise 规则：

Raw conversation carryover（quoted/attributed transcript ≥3 行）
System/runtime artifacts（"System:", compaction notices, model-switch traces, tool-call transcripts, raw JSON）
Fragment blobs（mixed filename shards / code snippets / metadata fields / partial sentences）
Atomic memory shape（每条 memory 必须是一条 durable fact/preference/decision/entity/event/case/pattern）
Length/distillation gate（>200 chars 且看起来像原始对话的，先压缩成一句 factual statement，压不出来就 skip）

新加的 test/extraction-prompt-structural-noise.test.mjs 用 assert.match 验证 prompt 里包含这 5 条规则的关键字 —— 是 prompt-content regression test，不是行为 e2e test，但作为 extraction prompt 改动的回归 anchor 是合理的。

package.json 也把新测试加进了 test script，会随 CI 跑。

行为面上，这 5 条规则会让 LLM 在 extraction 阶段更 conservative —— 长 transcript / raw log / fragment blob 更容易被 skip 而不是被压缩成 hollow memory。配合最近 mlp 在治理 noise extraction 的方向（#693 validation / #786 structural drift）是一致的。

LGTM —— 准备 approve 后 assign 给 @rwmjhb。

rwmjhb

PR #836 Review: fix: strengthen structural noise extraction prompt

Verdict: APPROVE | 6 rounds completed | Value: 55% | Size: SMALL | Author: TurboTheTurtle

Value Assessment

Problem: The PR addresses structural memory contamination at extraction time, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being rejected or compressed into atomic facts.

Dimension	Assessment
Value Score	55%
Value Verdict	review
Issue Linked	true
Project Aligned	true
Duplicate	false
AI Slop Score	0/6
User Impact	medium
Urgency	medium

Open Questions:

Issue #127 has no labels or assignment in the provided context, so maintainer acknowledgment of the issue itself cannot be confirmed even though the PR has an approving review.
Does the project expect this issue to be solved only at prompt level, or should a deterministic write-time gate in src/tools.ts and index.ts follow?

Summary

The PR addresses structural memory contamination at extraction time, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being rejected or compressed into atomic facts.

Evaluation Signals

Signal	Value
Blockers	0
Warnings	0
PR Size	SMALL
Verdict Floor	approve
Risk Level	normal
Value Model	codex
Primary Model	codex
Adversarial Model	claude

Nice to Have

F1: Structural-noise fix only covers the LLM extraction prompt
MR1: New test provides zero behavioral coverage — it only asserts the prompt contains the strings the PR added
MR2: Vague '~200 character' distillation heuristic risks dropping legitimate long-form memories
MR3: dist/ artifact was hand-edited and the build was not run to verify it matches source

Recommended Action

Ready to merge.

Reviewed at 2026-05-31T04:52:29Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude

rwmjhb · 2026-05-31T07:40:49Z

PR #836 Review: fix: strengthen structural noise extraction prompt

Verdict: RESOLVE-CONFLICTS-FIRST | Author: TurboTheTurtle | Merge state: DIRTY

Pipeline short-circuited at the conflict gate after R0 verification. Deep review deferred until the branch rebases cleanly onto the base.

Problem Statement (R1)

The PR addresses structural memory contamination at extraction time, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being rejected or compressed into atomic facts.

Why This Stopped Here

GitHub reports mergeable=CONFLICTING (merge_state_status=DIRTY) for this PR. Reviewing the diff now would:

Give feedback against a branch the author must rewrite anyway
Produce findings that may be invalidated by the conflict resolution
Waste review cycles on code that cannot be merged as-is

Recommended Action

Author should:

Rebase this branch onto the latest base (or merge the base into this branch)
Resolve all merge conflicts
Push the rebased branch — the re-review will be picked up automatically

Reviewed at 2026-05-31T07:40:35Z | R0+R1 gate | Conflict gate

…e-prompt # Conflicts: # package.json

TurboTheTurtle · 2026-05-31T18:11:36Z

Resolved the merge conflict by merging current master into the branch. The only manual conflict was the package.json test script; I kept both the structural-noise prompt regression test from this PR and the newer i18n memory trigger test from master.

Checked locally:

node test/extraction-prompt-structural-noise.test.mjs
node test/i18n-memory-triggers.test.mjs
git diff --check

rwmjhb

PR #836 Review: fix: strengthen structural noise extraction prompt

Verdict: APPROVE | 6 rounds completed | Value: 55% | Size: SMALL | Author: TurboTheTurtle

Value Assessment

Problem: The PR addresses structural memory contamination during extraction, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being skipped or distilled into atomic facts.

Dimension	Assessment
Value Score	55%
Value Verdict	review
Issue Linked	true
Project Aligned	true
Duplicate	false
AI Slop Score	0/6
User Impact	medium
Urgency	medium

Open Questions:

Issue #127 has no labels or assignment in the provided context, so maintainer acknowledgment of the issue itself cannot be confirmed from issue metadata.
Should this issue be considered resolved by prompt hardening alone, or should a deterministic persistence-time validation gate follow?

Summary

The PR addresses structural memory contamination during extraction, where raw transcripts, system/runtime traces, fragment blobs, and long undistilled excerpts can be stored as memories instead of being skipped or distilled into atomic facts.

Evaluation Signals

Signal	Value
Blockers	0
Warnings	0
PR Size	SMALL
Verdict Floor	approve
Risk Level	normal
Value Model	codex
Primary Model	codex
Adversarial Model	claude

Nice to Have

F1: Structural-noise rules only affect the LLM extraction prompt
MR1: New test validates the TS source, not the committed dist artifact that runtime actually loads

Recommended Action

Ready to merge.

Reviewed at 2026-06-01T06:08:48Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude

strengthen structural noise extraction prompt

0193025

AliceLJY approved these changes May 28, 2026

View reviewed changes

AliceLJY assigned rwmjhb May 28, 2026

rwmjhb approved these changes May 31, 2026

View reviewed changes

Merge remote-tracking branch 'origin/master' into fix/structural-nois…

d6a4564

…e-prompt # Conflicts: # package.json

rwmjhb approved these changes Jun 1, 2026

View reviewed changes

rwmjhb merged commit ae7c83a into CortexReach:master Jun 1, 2026
8 checks passed

TurboTheTurtle deleted the fix/structural-noise-prompt branch June 1, 2026 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: strengthen structural noise extraction prompt#836

fix: strengthen structural noise extraction prompt#836
rwmjhb merged 2 commits into
CortexReach:masterfrom
TurboTheTurtle:fix/structural-noise-prompt

TurboTheTurtle commented May 27, 2026

Uh oh!

AliceLJY left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb commented May 31, 2026

Uh oh!

TurboTheTurtle commented May 31, 2026

Uh oh!

rwmjhb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TurboTheTurtle commented May 27, 2026

Summary

Validation

Uh oh!

AliceLJY left a comment

Choose a reason for hiding this comment

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

PR #836 Review: fix: strengthen structural noise extraction prompt

Value Assessment

Summary

Evaluation Signals

Nice to Have

Recommended Action

Uh oh!

rwmjhb commented May 31, 2026

PR #836 Review: fix: strengthen structural noise extraction prompt

Problem Statement (R1)

Why This Stopped Here

Recommended Action

Uh oh!

TurboTheTurtle commented May 31, 2026

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

PR #836 Review: fix: strengthen structural noise extraction prompt

Value Assessment

Summary

Evaluation Signals

Nice to Have

Recommended Action

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants