UHSR is a next-generation hybrid text retrieval model that seamlessly integrates lexical search (BM25) and semantic search (FAISS/Pinecone) with spectral re-ranking to produce interpretable and normalized relevance scores in the [0,1] range.
- β Hybrid Search: Combines BM25 with dense embeddings.
- π Custom Similarity Metrics: Supports cosine, euclidean, mahalanobis, manhattan, chebyshev, jaccard, and hamming.
- π― Spectral Re-Ranking: Uses Graph Laplacian & Fiedler vector for robust ranking.
- π Interpretable Scores: Final scores are logistic-normalized in [0,1].
- π Scalable & Efficient: Built on FAISS (local) and Pinecone (cloud).
- π€ AI-powered Reranking: Integrates Hugging Face Cross-Encoders and OpenAI Rerankers.
UHSR unifies lexical and semantic retrieval into a single hybrid retrieval pipeline:
| Component | Functionality |
|---|---|
| πΉ Lexical Search | BM25 for keyword-based ranking |
| πΉ Semantic Search | FAISS (local) or Pinecone (cloud-based) vector search |
| πΉ Fusion | Logistic Normalization + Harmonic Fusion for score blending |
| πΉ Spectral Re-Ranking | Graph Laplacian + Fiedler vector for centrality-based refinement |
| πΉ AI-based Reranking | Hugging Face Cross-Encoder or OpenAI-based rerankers |
- π Multi-Metric Retrieval: cosine, euclidean, mahalanobis, manhattan, chebyshev, jaccard, hamming
- π Pinecone Support: seamless cloud-based semantic search
- π€ AI-Powered Reranking: Hugging Face or OpenAI models
- π Hybrid Fusion: BM25 + semantic scoring
- βΎοΈ Normalized Scores: interpretable
[0,1]relevance - π Spectral Graph Ranking: enhances candidate ranking stability
- π Scalable: FAISS for fast local retrieval
pip install uhsr[cpu]pip install uhsr[gpu]pip install pinecone-clientpip install openaifrom sentence_transformers import SentenceTransformer
from uhsr import UHSR
import numpy as np
# Sample documents
documents = [
"Apple releases new iPhone",
"Tesla's stock price surges",
"Google announces AI updates",
"Amazon introduces drone delivery",
"Microsoft acquires a gaming company"
]
# Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(documents, normalize_embeddings=True)
query_embedding = model.encode("Did Tesla's stock price go up?", normalize_embeddings=True)
# Initialize UHSR with OpenAI Reranker
retrieval_system = UHSR(
documents,
embeddings,
reranker_type="openai",
openai_api_key="your-openai-api-key"
)
# Retrieve results
retrieved_docs, scores = retrieval_system.retrieve(
"Did Tesla's stock price go up?",
query_embedding,
top_k=3,
metric='cosine',
rerank=True
)
for doc, score in zip(retrieved_docs, scores):
print(f"{doc} (Score: {score:.4f})")retrieval_system = UHSR(
documents,
embeddings,
use_pinecone=True,
pinecone_api_key="your_pinecone_api_key"
)
retrieved_docs, scores = retrieval_system.retrieve(
"Did Tesla's stock price go up?",
query_embedding,
top_k=3,
metric='cosine'
)retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='cosine') # β
Cosine
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='euclidean') # β
Euclidean
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='mahalanobis') # β
Mahalanobis
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='manhattan') # β
Manhattan
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='chebyshev') # β
Chebyshev
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='jaccard') # β
Jaccard
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='hamming') # β
Hamminguhsr-retrieval/
βββ uhsr/
β βββ core.py # Main retrieval logic
β βββ bm25.py # BM25 implementation
β βββ faiss_retrieval.py # FAISS backend
β βββ vector_db.py # Pinecone integration
β βββ similarity.py # Similarity metrics
β βββ reranker.py # AI-based reranking
β βββ utils.py # Utility functions
βββ examples/
β βββ example.py
βββ README.md
βββ setup.py
βββ requirements.txt
numpysentence-transformersfaiss-cpu/faiss-gpupinecone-clientopenai
pytestLearn more about UHSR on Medium.
π Try UHSR today & supercharge your search!
