fix: normalize memory category filters#876
Conversation
rwmjhb
left a comment
There was a problem hiding this comment.
Requesting changes. The direction is right, but the current implementation only partially fixes the taxonomy mismatch.
Must fix
- Smart-category filters still miss metadata-backed auto-captured rows.
The new retrieval/list filtering still compares only the storage category column: retriever uses matchesMemoryCategoryFilter(r.entry.category, category), and memory_list expands SQL predicates over category via resolveCategoryFilterCandidates(category).
That works for direct aliases like preference/preferences and entity/entities, but it does not handle the smart-extractor categories that are stored through legacy storage categories with the real smart category in metadata. In the normal auto-capture path, profile and cases are stored as fact, events as decision, and patterns as other, with metadata.memory_category carrying the smart category. Filters for profile, events, cases, and patterns can therefore still silently miss normal auto-captured rows.
Please make recall/list filtering metadata-aware, or over-fetch the relevant storage categories and then apply metadata.memory_category filtering in application code. Add regression coverage through the real smart-extractor/store path for the smart categories, not only direct row stubs.
- Newly accepted smart category values are not normalized on write.
The tool schema now accepts smart category values such as profile, preferences, events, cases, and patterns for memory_store / memory_update. But the write path still passes the raw category into storage and buildSmartMetadata() without explicitly setting the normalized smart category. For non-legacy values, reverseMapLegacyCategory() cannot infer the intended category reliably and can default to patterns.
Please normalize tool input at the execution boundary into both:
- the storage category to persist, and
metadata.memory_category/ smart category metadata to preserve semantics.
Also add tests for manual memory_store and memory_update using schema-legal smart category values.
Also worth covering
The new test mainly exercises retriever/BM25-style filtering. Please add coverage for the SQL memory_list path and keep resolveCategoryFilterCandidates() / matchesMemoryCategoryFilter() behavior equivalent so these two filter surfaces do not drift.
rwmjhb
left a comment
There was a problem hiding this comment.
Approved. The new commit addresses the blocking issues from the previous review: category filters are now metadata-aware for recall/list paths, and manual store/update normalize accepted smart category inputs into storage category plus smart metadata.
Verification I checked:
- orchestrator targeted tests passed
- orchestrator full npm test passed
npm run build --if-presentpassed on head3beb2781b2dbe0121e506d3eafb68d6c81601e5cnode --test test/category-filter-normalization.test.mjspassed, 7/7
Non-blocking follow-ups worth considering:
patternsfiltering can still differ between recall and list for legacyreflectionrows, because the in-memory predicate mapsreflection -> patternsbut the SQL candidate expansion for list does not fetchreflection.- Metadata-less legacy
factrows remain ambiguous and are treated ascases, so profile-like legacy facts withoutmemory_categorymetadata may not matchcategory=profile. - Category-only
memory_updatechangesmemory_category, but preserves category-derived fields such asfact_keyandmemory_layer; re-deriving or clearing those would avoid stale metadata after recategorization.
rickthomasjr
left a comment
There was a problem hiding this comment.
Agent Code Review: PR #876 — fix: normalize memory category filters
Verdict: APPROVE
Summary
Two commits introducing bidirectional smart/legacy category mapping. Core changes: memory-categories.ts adds LEGACY_MEMORY_CATEGORIES, TOOL_MEMORY_CATEGORIES, mapping tables, and helper functions (normalizeCategory, matchesMemoryCategoryFilter, resolveCategoryFilterCandidates, resolveToolMemoryCategory, extractMetadataMemoryCategory). Tools and retrieval now normalize incoming categories at the execution boundary, persisting both storageCategory and metadata.memory_category. SQL list() filtering uses candidate OR expansion + in-memory metadata-aware predicate. All retrieval paths use matchesMemoryCategoryFilter() instead of direct equality.
P1 — Supersede logic: APPROVED
Replacing hardcoded SUPERSEDE_ELIGIBLE with TEMPORAL_VERSIONED_CATEGORIES.has(memoryCategory) + metadata-aware matching prevents false-positive cross-supersedes between unrelated legacy categories. Correct.
P2 — Metadata-aware filtering: APPROVED
matchesMemoryCategoryFilter() three-tier approach (direct → normalize → metadata) handles the critical case: entries stored as fact with metadata.memory_category=profile are still found by category=profile filters. This was the core issue rwmjhb flagged.
P3 — SQL candidate expansion: APPROVED
resolveCategoryFilterCandidates() builds OR of all storage category values for a smart category. Set deduplication prevents duplication. Correct.
P4 — Write path normalization: APPROVED
Both memory_store and memory_update resolve storageCategory + memoryCategory at execution boundary. buildSmartMetadata() uses resolved categories. Category-only updates also update memory_category in metadata.
P5 — Response payloads: APPROVED
Adding rawCategory alongside category is useful for debugging.
P6 — Minor notes
- isLegacyStableMemoryId regex in store.ts is minor scope creep in this PR — consider separate PR.
- rwmjhb's non-blocking follow-ups (patterns/legacy reflection gap, ambiguous metadata-less legacy fact rows, stale fields on recategorization) are valid but out of scope.
Tests
7 test cases, 397 lines. Exercises smart-extractor rows through real store/tool paths. Covers normalization, metadata-aware retrieval, SQL list filtering, store tool writes, update tool writes. Good coverage.
Conclusion
Well-designed, well-tested fix for a real taxonomy mismatch. The dual-category normalization is the right approach. No blocking issues.
Summary
Fixes #873
Verification