Skip to content

fix: normalize memory category filters#876

Open
TurboTheTurtle wants to merge 2 commits into
CortexReach:masterfrom
TurboTheTurtle:codex/issue-873-category-filter
Open

fix: normalize memory category filters#876
TurboTheTurtle wants to merge 2 commits into
CortexReach:masterfrom
TurboTheTurtle:codex/issue-873-category-filter

Conversation

@TurboTheTurtle

Copy link
Copy Markdown
Contributor

Summary

  • expose both smart-extractor and legacy memory categories in tool schemas
  • normalize category filters so legacy singular filters match smart category rows
  • apply category candidate matching to memory_list and add CI regression coverage

Fixes #873

Verification

  • node --test test/category-filter-normalization.test.mjs
  • npm run build
  • npm run test:packaging-and-workflow
  • npm run test:core-regression

@rwmjhb rwmjhb left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes. The direction is right, but the current implementation only partially fixes the taxonomy mismatch.

Must fix

  1. Smart-category filters still miss metadata-backed auto-captured rows.

The new retrieval/list filtering still compares only the storage category column: retriever uses matchesMemoryCategoryFilter(r.entry.category, category), and memory_list expands SQL predicates over category via resolveCategoryFilterCandidates(category).

That works for direct aliases like preference/preferences and entity/entities, but it does not handle the smart-extractor categories that are stored through legacy storage categories with the real smart category in metadata. In the normal auto-capture path, profile and cases are stored as fact, events as decision, and patterns as other, with metadata.memory_category carrying the smart category. Filters for profile, events, cases, and patterns can therefore still silently miss normal auto-captured rows.

Please make recall/list filtering metadata-aware, or over-fetch the relevant storage categories and then apply metadata.memory_category filtering in application code. Add regression coverage through the real smart-extractor/store path for the smart categories, not only direct row stubs.

  1. Newly accepted smart category values are not normalized on write.

The tool schema now accepts smart category values such as profile, preferences, events, cases, and patterns for memory_store / memory_update. But the write path still passes the raw category into storage and buildSmartMetadata() without explicitly setting the normalized smart category. For non-legacy values, reverseMapLegacyCategory() cannot infer the intended category reliably and can default to patterns.

Please normalize tool input at the execution boundary into both:

  • the storage category to persist, and
  • metadata.memory_category / smart category metadata to preserve semantics.

Also add tests for manual memory_store and memory_update using schema-legal smart category values.

Also worth covering

The new test mainly exercises retriever/BM25-style filtering. Please add coverage for the SQL memory_list path and keep resolveCategoryFilterCandidates() / matchesMemoryCategoryFilter() behavior equivalent so these two filter surfaces do not drift.

@rwmjhb rwmjhb left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. The new commit addresses the blocking issues from the previous review: category filters are now metadata-aware for recall/list paths, and manual store/update normalize accepted smart category inputs into storage category plus smart metadata.

Verification I checked:

  • orchestrator targeted tests passed
  • orchestrator full npm test passed
  • npm run build --if-present passed on head 3beb2781b2dbe0121e506d3eafb68d6c81601e5c
  • node --test test/category-filter-normalization.test.mjs passed, 7/7

Non-blocking follow-ups worth considering:

  • patterns filtering can still differ between recall and list for legacy reflection rows, because the in-memory predicate maps reflection -> patterns but the SQL candidate expansion for list does not fetch reflection.
  • Metadata-less legacy fact rows remain ambiguous and are treated as cases, so profile-like legacy facts without memory_category metadata may not match category=profile.
  • Category-only memory_update changes memory_category, but preserves category-derived fields such as fact_key and memory_layer; re-deriving or clearing those would avoid stale metadata after recategorization.

@rickthomasjr rickthomasjr left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent Code Review: PR #876 — fix: normalize memory category filters

Verdict: APPROVE

Summary

Two commits introducing bidirectional smart/legacy category mapping. Core changes: memory-categories.ts adds LEGACY_MEMORY_CATEGORIES, TOOL_MEMORY_CATEGORIES, mapping tables, and helper functions (normalizeCategory, matchesMemoryCategoryFilter, resolveCategoryFilterCandidates, resolveToolMemoryCategory, extractMetadataMemoryCategory). Tools and retrieval now normalize incoming categories at the execution boundary, persisting both storageCategory and metadata.memory_category. SQL list() filtering uses candidate OR expansion + in-memory metadata-aware predicate. All retrieval paths use matchesMemoryCategoryFilter() instead of direct equality.

P1 — Supersede logic: APPROVED

Replacing hardcoded SUPERSEDE_ELIGIBLE with TEMPORAL_VERSIONED_CATEGORIES.has(memoryCategory) + metadata-aware matching prevents false-positive cross-supersedes between unrelated legacy categories. Correct.

P2 — Metadata-aware filtering: APPROVED

matchesMemoryCategoryFilter() three-tier approach (direct → normalize → metadata) handles the critical case: entries stored as fact with metadata.memory_category=profile are still found by category=profile filters. This was the core issue rwmjhb flagged.

P3 — SQL candidate expansion: APPROVED

resolveCategoryFilterCandidates() builds OR of all storage category values for a smart category. Set deduplication prevents duplication. Correct.

P4 — Write path normalization: APPROVED

Both memory_store and memory_update resolve storageCategory + memoryCategory at execution boundary. buildSmartMetadata() uses resolved categories. Category-only updates also update memory_category in metadata.

P5 — Response payloads: APPROVED

Adding rawCategory alongside category is useful for debugging.

P6 — Minor notes

  • isLegacyStableMemoryId regex in store.ts is minor scope creep in this PR — consider separate PR.
  • rwmjhb's non-blocking follow-ups (patterns/legacy reflection gap, ambiguous metadata-less legacy fact rows, stale fields on recategorization) are valid but out of scope.

Tests

7 test cases, 397 lines. Exercises smart-extractor rows through real store/tool paths. Covers normalization, metadata-aware retrieval, SQL list filtering, store tool writes, update tool writes. Good coverage.

Conclusion

Well-designed, well-tested fix for a real taxonomy mismatch. The dual-category normalization is the right approach. No blocking issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recall category filter silently empties results: tool schema enum in tools.ts disagrees with the write-side taxonomy in memory-categories.ts

3 participants