Skip to content

Latest commit

 

History

History
204 lines (167 loc) · 11.5 KB

File metadata and controls

204 lines (167 loc) · 11.5 KB

Architecture

Overview

iris-vector-graph is a knowledge graph engine built on InterSystems IRIS. All data lives in IRIS globals and SQL tables. All graph analytics and search run as pure ObjectScript with $vectorop SIMD. Python provides the API layer and build-time tooling (K-means for PLAID).

System Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        Client Layer                              │
├──────────────────────────────────────────────────────────────────┤
│  Neo4j Browser   │  neo4j Python    │  LangChain   │  curl/HTTP │
│  (/browser/)     │  Driver (bolt)   │  Neo4jGraph   │  /api/*    │
├──────────────────────────────────────────────────────────────────┤
│                      Protocol Layer                              │
├──────────────────────────────────────────────────────────────────┤
│  Bolt 5.4 WS     │  Bolt 5.4 TCP    │  HTTP API                 │
│  (port 8000)     │  (port 7687)     │  (port 8000)              │
│  bolt_server.py  │  bolt_server.py  │  cypher_api.py            │
├──────────────────────────────────────────────────────────────────┤
│                      Execution Layer                             │
├──────────────────────────────────────────────────────────────────┤
│  BM25Index.cls   │  VecIndex.cls    │  PLAIDSearch.cls           │
│  (BM25 lexical)  │  (RP-tree ANN)   │  (multi-vector)            │
│                  │                  │                             │
│  PageRank.cls    │  Algorithms.cls  │  Subgraph.cls              │
│  Traversal.cls   │  GraphIndex.cls  │  Cypher translator         │
│  (BFS/^KG build) │  (^NKG int idx)  │  (parser → SQL)            │
├──────────────────────────────────────────────────────────────────┤
│                       Storage Layer                              │
├──────────────────────────────────────────────────────────────────┤
│  ^KG         │  ^BM25Idx    │  ^VecIdx     │  ^PLAID   │  ^NKG  │
│  (graph)     │  (BM25 idx)  │  (RP-tree)   │  (PLAID)  │  (int) │
│              │              │              │           │         │
│  Graph_KG.*  │              │  HNSW VECTOR │  fhir_    │         │
│  (SQL tables) │             │  (SQL index) │  bridges  │         │
├──────────────────────────────────────────────────────────────────┤
│                  InterSystems IRIS 2024.1+                       │
└──────────────────────────────────────────────────────────────────┘

Global Structures

^KG — Knowledge Graph

^KG("out", source, predicate, target) = weight
^KG("in", target, predicate, source) = weight
^KG("tout", ts, source, predicate, target) = weight   — temporal outbound
^KG("tin",  ts, target, predicate, source) = weight   — temporal inbound
^KG("bucket", bucket_key, source) = count             — pre-aggregated 5-min bucket
^KG("tagg", bucket, source, predicate, key) = value   — COUNT/SUM/AVG/MIN/MAX/HLL
^KG("edgeprop", ts, s, p, o, key) = value             — rich edge attributes

Used by: PageRank, WCC, CDLP, PPR, Subgraph, BFS, TemporalIndex.

^BM25Idx — BM25 Lexical Search

^BM25Idx(name, "cfg", "N")           — integer: document count
^BM25Idx(name, "cfg", "avgdl")       — float: average document length
^BM25Idx(name, "cfg", "k1")          — float: BM25 k1 parameter
^BM25Idx(name, "cfg", "b")           — float: BM25 b parameter
^BM25Idx(name, "cfg", "vocab_size")  — integer: distinct token count
^BM25Idx(name, "idf",  term)         — float: Robertson IDF
^BM25Idx(name, "tf",   term, docId)  — integer: term frequency  ← term-first!
^BM25Idx(name, "len",  docId)        — integer: document token count

Term-first "tf" subscript order enables O(postings) posting-list traversal via $Order(^BM25Idx(name,"tf",term,"")).

^NKG — Integer-Encoded Graph (Arno Acceleration)

^NKG("$NI", stringId) = integerIdx       — node string→int
^NKG("$ND", integerIdx) = stringId       — node int→string
^NKG(-1, sIdx, -(pIdx+1), oIdx) = weight — out-edges
^NKG(-2, oIdx, -(pIdx+1), sIdx) = weight — in-edges
^NKG(-3, sIdx) = degree
^NKG("$meta", "nodeCount"|"edgeCount"|"version") = value

^VecIdx — VecIndex RP-Tree

^VecIdx(name, "cfg", "dim"|"metric"|"numTrees"|"leafSize") = config
^VecIdx(name, "vec", docId) = $vector
^VecIdx(name, "tree", treeId, nodeId, "plane") = $vector
^VecIdx(name, "tree", treeId, nodeId, "leaf", docId) = ""
^VecIdx(name, "meta", "count") = N

^PLAID — Multi-Vector Retrieval

^PLAID(name, "centroid", k) = $vector
^PLAID(name, "docPacked", docId) = $ListBuild   — packed token $vectors
^PLAID(name, "docCentroid", centroidId, docId) = ""
^PLAID(name, "meta", "nCentroids"|"nDocs"|"dim"|"totalTokens") = value

ObjectScript Classes

All classes in Graph.KG package. Pure ObjectScript + $vectorop — no Language = python.

Class Purpose Key Methods
BM25Index Okapi BM25 lexical search Build, Search, Insert, Drop, Info, SearchProc (kg_BM25)
VecIndex RP-tree ANN vector search Create, Search, SearchJSON, SearchMultiJSON, InsertJSON, InsertBatchJSON, Build, Drop
IVFIndex IVFFlat vector search (k-means quantized) Build, AddBatch, FinalizeIndex, Search, Drop, Info, SearchProc (kg_IVF)
PLAIDSearch PLAID multi-vector retrieval StoreCentroids, StoreDocTokens, BuildInvertedIndex, Search, Insert, Info, Drop
TemporalIndex Time-indexed edge store InsertEdge, BulkInsert, QueryWindow, QueryWindowInbound, GetAggregate, GetBucketGroups, GetDistinctCount, PurgeBefore
PageRank Personalized + Global PageRank RunJson, PageRankGlobalJson
Algorithms Graph analytics WCCJson, CDLPJson
Subgraph Bounded subgraph extraction SubgraphJson, PPRGuidedJson
Traversal Graph build + BFS + fast-path traversal BuildKG, BuildNKG, BFSFastJson, BFSFastJsonSorted, ReadBFSResults, ReadBFSPage, KHopCount, KHopNeighborIds, KHop2Count, KHop2NeighborIds, ShortestPathJson, DijkstraJson
EdgeScan Bulk edge ingestion BulkIngestEdges, MatchEdges, WriteAdjacency, DeleteAdjacency
GraphIndex Functional index for ^NKG InternNode, InternLabel, InsertIndex, DeleteIndex, UpdateStructuralHLL, EstimateHLL
NKGAccel Arno/integer-index BFS acceleration BFSJson, BFSFastCountDistinct, KHopNeighborsSorted, CountDistinctKHop, BuildNKGRust

Call Context Rule

Methods callable via classMethodValue() (native API bridge from Python) MUST be pure ObjectScript. Language = python methods using iris.gref() only work inside IRIS embedded Python contexts. All IVG ObjectScript classes follow this rule.

SQL Schema (Graph_KG)

Graph_KG.nodes          (node_id VARCHAR(256) PK)
Graph_KG.rdf_labels     (s, label — composite PK)
Graph_KG.rdf_props      (s, "key", val — composite PK)
Graph_KG.rdf_edges      (edge_id BIGINT IDENTITY PK, s, p, o_id)
Graph_KG.kg_NodeEmbeddings  (id, emb VECTOR(DOUBLE, 768) — HNSW index)
Graph_KG.fhir_bridges   (fhir_code, kg_node_id — composite PK, bridge_type, confidence)

No SQL table is created for BM25 — all state is in ^BM25Idx globals.

Cypher Translation

The Cypher parser is a hand-written recursive-descent parser that translates openCypher to IRIS SQL:

  • Patterns → JOINs on rdf_edges/rdf_labels/nodes
  • Named paths → JSON concatenation
  • CALL subqueries → CTEs (independent) or scalar subqueries (correlated)
  • ivg procedures → Stage CTEs via SQL stored procedures

Supported ivg procedures

Procedure SQL Stored Proc YIELD
ivg.vector.search Graph_KG.kg_KNN_VEC node, score
ivg.neighbors Graph_KG.kg_NEIGHBORS neighbor
ivg.ppr Graph_KG.kg_PPR node, score
ivg.bm25.search Graph_KG.kg_BM25 node, score

Global Structure

Global Purpose
^KG("out", 0, s, p, o) Knowledge graph — outbound edges
^KG("in", 0, o, p, s) Knowledge graph — inbound edges
^KG("tout", ts, s, p, o) Temporal index — outbound, ordered by timestamp
^KG("tin", ts, o, p, s) Temporal index — inbound, ordered by timestamp
^KG("bucket", bucket, s) Pre-aggregated edge count per 5-minute bucket
^KG("tagg", bucket, s, p, key) Pre-aggregated COUNT/SUM/MIN/MAX/HLL per bucket
^KG("edgeprop", ts, s, p, o, key) Rich edge attributes
^NKG Integer adjacency index — enables Rust-accelerated graph algorithms
^VecIdx VecIndex RP-tree ANN
^PLAID PLAID multi-vector
^BM25Idx BM25 lexical search index

SQL Schema (Graph_KG)

Table Purpose
nodes Node registry (node_id PK)
rdf_edges Edges (s, p, o_id)
rdf_labels Node labels (s, label)
rdf_props Node properties (s, key, val)
kg_NodeEmbeddings HNSW vector index (id, emb VECTOR)
kg_EdgeEmbeddings Triple embeddings (s, p, o_id, emb VECTOR)
fhir_bridges ICD-10→MeSH clinical code mappings

ObjectScript Classes

Class Key Methods
Graph.KG.TemporalIndex InsertEdge, BulkInsert, QueryWindow, GetVelocity, FindBursts, GetAggregate, GetBucketGroups, GetDistinctCount, Purge
Graph.KG.VecIndex Create, InsertJSON, Build, SearchJSON, SearchMultiJSON, InsertBatchJSON
Graph.KG.PLAIDSearch StoreCentroids, BuildInvertedIndex, Search
Graph.KG.PageRank RunJson, PageRankGlobalJson
Graph.KG.Algorithms WCCJson, CDLPJson
Graph.KG.Subgraph SubgraphJson, PPRGuidedJson
Graph.KG.Traversal BuildKG, BuildNKG, BFSFastJson, ShortestPathJson
Graph.KG.NKGAccel BetweennessGlobal, ClosenessGlobal, EigenvectorGlobal, Load, IsLoaded, WarmAdjCache
Graph.KG.BulkLoader BulkLoad
Graph.KG.BM25Index Build, Search, Insert, Drop
Graph.KG.IVFIndex Build, Search, Drop
Graph.KG.EdgeScan MatchEdges, WriteAdjacency, DeleteAdjacency
Note: IRIS xDBC protocol 65 does not support ? params inside WITH ... AS (...) CTE bodies. Temporal Cypher uses derived table subqueries instead.