feat: add retrieval neighbor enrichment by TurboTheTurtle · Pull Request #897 · CortexReach/memory-lancedb-pro

TurboTheTurtle · 2026-06-13T23:31:03Z

Summary

add default-off hybrid retrieval neighbor enrichment using same-scope BM25 lookups
attach bounded neighbors as supplemental recall context without changing top-level result counts
add config schema/UI fields and focused regression coverage

Refs #538
Refs #445

Verification

node --test test/retriever-neighbor-enrichment.test.mjs
node test/retriever-rerank-regression.mjs
node scripts/verify-ci-test-manifest.mjs
node test/plugin-manifest-regression.mjs
npm run build
npm run test:core-regression
npm run test:packaging-and-workflow

rickthomasjr · 2026-06-13T23:43:33Z

Code Review (Agent)

Verdict: APPROVE

Summary

Feature adds default-off hybrid retrieval neighbor enrichment — attaches bounded same-scope BM25 neighbors to top-level hybrid results without changing result counts.

What is Good

Default-off design: Neighbor enrichment is opt-in (enabled: false), zero impact on existing users. Low risk.
Scope-aware filtering: Neighbors are restricted to the same scope and (when provided) same category as the primary result. No cross-contamination.
Graceful degradation: BM25 neighbor lookup failure falls through with a warning, not an exception. Primary results still return.
Config normalization: normalizeRetrievalConfig() properly deep-merges neighborEnrichment so partial configs do not lose other settings.
Test coverage: 183-line test file covers disabled mode, enabled mode with neighbors, BM25 fallback failure, and partial config parsing. All CI checks pass.
Bounding: maxPerResult clamped to 1-5. Supplemental BM25 fetch capped at 25. Good defense against runaway queries.
Serialization: sanitizeMemoryForSerialization() and tool output render properly handle the optional neighbors field.

Notes

Performance: Each primary result triggers a supplemental BM25 lookup. The Math.min(maxPerResult + primaryIds.size + 4, 25) cap is sensible. BM25 is fast, but worth monitoring if retrieval latency is already tight.
Context budget: Neighbor context is appended to contentText before sanitizeForContext().slice(0, effectivePerItemMaxChars). Total context window budget (primary + neighbors) is still capped per item — good.
Test mocking: The test uses a mock store. Fine for unit-level logic testing; consider a lightweight integration test with a real in-memory index for the enrichment path.

Verdict: APPROVE

Clean feature, well-scoped, defaults off, good tests. Safe to merge.

rickthomasjr

Summary

Verdict: APPROVE (with one minor concern)

Andy has implemented B-2 from Issue #445 — retrieval-time neighbor enrichment for hybrid search results. The feature is default-off, well-scoped, and follows the existing patterns in the codebase cleanly.

What This Does

Adds optional same-scope BM25 neighbor enrichment to hybrid retrieval results. When enabled:

After MMR diversity filtering, for each result the retriever does an additional bm25Search() using the result entry's text as the query
Filters out primary results, cross-scope entries, expired entries, and noise
Caps at maxPerResult (default 2, range 1-5)
Attaches neighbors as a neighbors array on each result
Tools render neighbors inline in recall output and debug/explain-rank tools

Key design decisions (matching the B-2 proposal from Issue #445):

Default-off — opt-in feature
Same-scope only — respects scope boundaries
Doesn't change top-level result count — pure enrichment
BM25 for neighbor search — consistent with B-2 discussion
Graceful degradation — BM25 failure falls back to primary-only results

Code Quality Assessment

Strengths:

normalizeRetrievalConfig() — proper nested merge pattern for neighborEnrichment. The explicit nesting avoids shallow-spread issues.
enrichHybridNeighbors() — clean filter chain (id exclusion → scope check → category → expiry → noise). Easy to follow.
Test coverage is thorough: disabled mode, enabled mode, failure fallback, and config parsing all covered.
sanitizeMemoryForSerialization() properly propagates neighbors through the tool result pipeline.
Plugin config schema (openclaw.plugin.json) has correct types, defaults, min/max bounds, and advanced UI labels.

Changes look mechanical and consistent — dist/ built from src/, CI manifest updated, all tool rendering paths updated.

One Minor Concern: Per-Result BM25 Latency

Each enriched result triggers an additional bm25Search() call. With maxPerResult=2 and limit=20, that's up to 20 extra BM25 lookups per retrieval. The candidates limit is capped at 25 via Math.min(maxPerResult + primaryIds.size + 4, 25), which is reasonable, but:

In a busy system with many concurrent retrievals and neighbor enrichment enabled, this could add noticeable latency
The B-2 issue (Issue #445) mentioned this as an open question but the PR doesn't address batching

This is not a blocker — the feature is default-off, and Andy has correctly gated it behind config.enabled. If latency becomes an issue, a batched approach (single bm25Search() for all results at once) could be added later. Worth a TODO comment or a follow-up issue if it's a real concern.

Rating: NIT — no action required for merge.

Verdict: APPROVE

This is a clean implementation of an already-designed feature (B-2 from Issue #445). It's default-off, well-tested, follows existing patterns, and degrades gracefully. The one latency concern is a future optimization, not a blocking issue.

Ready to merge once Rick gives a thumbs-up.

rwmjhb

Approved. The feature is default-off, same-scope, bounded per primary result, and the main retrieval/tool/config plumbing is covered by regression tests.

Verification checked:

orchestrator targeted tests passed
orchestrator full npm test passed
npm run build --if-present passed on head 0e9452cb735605f9f1079429f23a054fef95f12a
node --test test/retriever-neighbor-enrichment.test.mjs passed, 4/4

Non-blocking follow-ups worth considering:

Neighbor entries are not passed through the same auto-recall governance filters as top-level results. If enrichment is enabled, an eligible primary memory can carry a nested neighbor that would have been excluded as a top-level auto-recall result by confirmed-state, archive/reflection layer, Tier-1 suppression, or USER.md boundary rules.
Enrichment performs one supplemental BM25 lookup per top-level result; keep an eye on latency/load if this is enabled with larger limits.
The same neighbor can be attached under multiple primary results because there is no cross-result seenNeighborIds set.

rwmjhb

Approved. The current head addresses the earlier auto-recall governance concern by filtering/deduping related neighbor snippets before injection, while keeping the feature default-off and bounded.

Verification checked:

orchestrator targeted tests passed
orchestrator full npm test passed
npm run build --if-present passed on head c6c9ade030f567756112d7dd4b6e188e4e97ebfc
node --test test/retriever-neighbor-enrichment.test.mjs passed, 4/4
node --test test/recall-text-cleanup.test.mjs passed, 23/23

Non-blocking follow-ups worth considering:

Manual memory_recall summary output applies maxCharsPerItem to the primary text, then appends neighbor snippets afterward. With neighbor enrichment enabled, per-item output can exceed that budget.
Enrichment performs one supplemental BM25 lookup per top-level result using full entry text as the query; monitor latency/load if this is enabled with larger result limits.
The neighbor candidate fetch cap can under-deliver maxPerResult on large primary result sets after excluding primary ids and filtered candidates.
maxPerResult is clamped at use time, not in normalized config returned by getConfig().

rwmjhb

Requesting changes. The auto-recall neighbor governance fix is good, but the manual memory_recall path still has a blocking boundary leak.

Must fix

Nested neighbors are still rendered and serialized without applying the USER.md-exclusive recall filter.

The top-level memory_recall results are filtered with filterUserMdExclusiveRecallResults(...), but the new neighbor paths are not:

sanitizeMemoryForSerialization() serializes r.neighbors directly, including neighbor.entry.text.
The manual memory_recall text renderer appends r.neighbors directly as Neighbors: ....
The retriever enrichment path only knows about id/scope/category/expiry/noise and has no workspaceBoundary context.

So with workspaceBoundary.userMdExclusive.enabled / filterRecall enabled, a safe top-level memory can still carry a USER.md-exclusive profile/name/addressing fact as a nested neighbor. That leaks the hidden memory through both rendered tool text and details.memories[].neighbors.

Please apply the same USER.md-exclusive filtering to nested neighbors before manual recall rendering and serialization, or strip neighbors from any result set after top-level recall filtering removes boundary-sensitive entries. Add a regression test that enables USER.md-exclusive filtering and verifies a hidden neighbor is absent from both rendered output and serialized details.

Also worth fixing

Manual memory_recall neighbors also bypass other governance predicates that auto-recall applies to neighbors, such as confirmed state, archive/reflection layers, and Tier-1 suppression.
Auto-recall repeat suppression/history still tracks only top-level ids, not injected neighbor snippets.
Neighbor enrichment performs one extra BM25 query per top-level result with no visible cost ceiling or operator warning.

rwmjhb

Approved. The latest commit fixes the prior blocker by filtering manual memory_recall neighbors in both rendered output and serialized details. I verified this head independently:

npm run build --if-present
node --test test/retriever-neighbor-enrichment.test.mjs
node --test test/recall-text-cleanup.test.mjs

Non-blocking follow-ups: neighbor over-fetch can still be limited by MemoryStore.bm25Search's 20-result clamp after primary IDs are filtered, and enabled enrichment performs one supplemental BM25 lookup per top-level result. Both are worth tracking for high-limit/large-store usage.

rwmjhb

This PR is currently blocked by merge conflicts (mergeable=CONFLICTING, merge_state_status=DIRTY).

Please rebase this branch onto the latest base branch, resolve the conflicts, and push the updated branch. I'm deferring any further review until the branch is mergeable again, since conflict resolution can invalidate the current diff and prior findings.

TurboTheTurtle · 2026-06-15T01:35:10Z

This PR is currently blocked by merge conflicts (mergeable=CONFLICTING, merge_state_status=DIRTY).

Please rebase this branch onto the latest base branch, resolve the conflicts, and push the updated branch. I'm deferring any further review until the branch is mergeable again, since conflict resolution can invalidate the current diff and prior findings.

Thanks, rebased onto current upstream/master, resolved the conflicts, and force-pushed the branch.

rwmjhb

Requesting changes. The rebase fixed the conflict and the main recall paths have neighbor filtering, but there is still an unfiltered serialization path for nested neighbors.

Must fix

sanitizeMemoryForSerialization() serializes r.neighbors directly, including id/text/category/scope/importance/score. Manual memory_recall filters neighbors before calling it, and auto-recall filters before rendering, but this serializer is also used by other tool responses that can receive retriever results directly:

resolveMemoryId() ambiguous-match details: details.candidates = sanitizeMemoryForSerialization(results)
retrieval trace/debug details: details.memories = sanitizeMemoryForSerialization(results)
memory_explain_rank: details.results = sanitizeMemoryForSerialization(results)

Those paths do not apply filterManualRecallResultNeighbors() or the auto-recall governance filter. With neighbor enrichment enabled, a top-level eligible result can therefore carry nested neighbors that bypass USER.md-exclusive filtering, state/layer governance, tier suppression, or workspace-boundary filtering in serialized details.

Please centralize neighbor serialization behind the same governance/boundary filter used for manual recall, or make sanitizeMemoryForSerialization() exclude neighbors by default unless the caller explicitly passes already-filtered neighbors. Add regression coverage for at least one non-memory_recall caller, e.g. ambiguous resolveMemoryId() candidates or memory_explain_rank, proving USER.md-exclusive/governance-ineligible neighbors are not present in details.

Also worth addressing

Neighbor category filtering uses raw candidate.entry.category !== category instead of the existing smart category matcher, so canonical filters like preferences can silently drop legacy preference neighbors.
Supplemental neighbor lookups still do one BM25 query per top-level result using full entry text and do not observe the recall abort signal/deadline.
The BM25 over-fetch slack is still limited by the store-level 20-result clamp before primary IDs are filtered.

rwmjhb

Approved. The latest commit fixes the unfiltered serialized-neighbor issue, including coverage for non-recall serialized tool details. I verified this head independently:

npm run build --if-present
node --test test/retriever-neighbor-enrichment.test.mjs
node --test test/recall-text-cleanup.test.mjs
build left source/dist files clean

Remaining notes are non-blocking: neighbor category filtering should eventually use the canonical category matcher, high-limit enrichment can under-deliver because store BM25 search is capped at 20, and injected auto-recall neighbors are supplemental context rather than first-class tracked recall IDs.

rickthomasjr approved these changes Jun 14, 2026

View reviewed changes

rwmjhb approved these changes Jun 14, 2026

View reviewed changes

rwmjhb requested changes Jun 14, 2026

View reviewed changes

rwmjhb approved these changes Jun 14, 2026

View reviewed changes

rwmjhb reviewed Jun 15, 2026

View reviewed changes

TurboTheTurtle added 4 commits June 14, 2026 18:25

feat: add retrieval neighbor enrichment

7492854

fix: filter auto-recall neighbor snippets

e32b512

fix: bound neighbor-enriched recall output

7606b4b

fix: filter manual recall neighbors

2e37896

TurboTheTurtle force-pushed the codex/issue-538-neighbor-enrichment-v2 branch from b015b97 to 2e37896 Compare June 15, 2026 01:25

rwmjhb requested changes Jun 16, 2026

View reviewed changes

fix: omit unfiltered serialized neighbors

0988e40

rwmjhb approved these changes Jun 16, 2026

View reviewed changes

rwmjhb merged commit 37f5d8c into CortexReach:master Jun 17, 2026
8 checks passed

TurboTheTurtle deleted the codex/issue-538-neighbor-enrichment-v2 branch June 17, 2026 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add retrieval neighbor enrichment#897

feat: add retrieval neighbor enrichment#897
rwmjhb merged 5 commits into
CortexReach:masterfrom
TurboTheTurtle:codex/issue-538-neighbor-enrichment-v2

TurboTheTurtle commented Jun 13, 2026

Uh oh!

rickthomasjr commented Jun 13, 2026

Uh oh!

rickthomasjr left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

TurboTheTurtle commented Jun 15, 2026

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TurboTheTurtle commented Jun 13, 2026

Summary

Verification

Uh oh!

rickthomasjr commented Jun 13, 2026

Code Review (Agent)

Summary

What is Good

Notes

Verdict: APPROVE

Uh oh!

rickthomasjr left a comment

Choose a reason for hiding this comment

Summary

What This Does

Code Quality Assessment

One Minor Concern: Per-Result BM25 Latency

Verdict: APPROVE

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Must fix

Also worth fixing

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Uh oh!

TurboTheTurtle commented Jun 15, 2026

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Must fix

Also worth addressing

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants