Skip to content

vedaant00/uhsr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
UHSR Logo

Unified Hyperbolic Spectral Retrieval (UHSR)

UHSR is a next-generation hybrid text retrieval model that seamlessly integrates lexical search (BM25) and semantic search (FAISS/Pinecone) with spectral re-ranking to produce interpretable and normalized relevance scores in the [0,1] range.


⚑ Key Highlights

  • βœ… Hybrid Search: Combines BM25 with dense embeddings.
  • πŸ” Custom Similarity Metrics: Supports cosine, euclidean, mahalanobis, manhattan, chebyshev, jaccard, and hamming.
  • 🎯 Spectral Re-Ranking: Uses Graph Laplacian & Fiedler vector for robust ranking.
  • πŸ“ˆ Interpretable Scores: Final scores are logistic-normalized in [0,1].
  • πŸš€ Scalable & Efficient: Built on FAISS (local) and Pinecone (cloud).
  • πŸ€– AI-powered Reranking: Integrates Hugging Face Cross-Encoders and OpenAI Rerankers.

made-with-python

Python Version PyPI Version PyPI Status License
GitHub stars Profile views


πŸš€ What is UHSR?

UHSR unifies lexical and semantic retrieval into a single hybrid retrieval pipeline:

Component Functionality
πŸ”Ή Lexical Search BM25 for keyword-based ranking
πŸ”Ή Semantic Search FAISS (local) or Pinecone (cloud-based) vector search
πŸ”Ή Fusion Logistic Normalization + Harmonic Fusion for score blending
πŸ”Ή Spectral Re-Ranking Graph Laplacian + Fiedler vector for centrality-based refinement
πŸ”Ή AI-based Reranking Hugging Face Cross-Encoder or OpenAI-based rerankers

πŸ“Œ Features

  • πŸ” Multi-Metric Retrieval: cosine, euclidean, mahalanobis, manhattan, chebyshev, jaccard, hamming
  • 🌐 Pinecone Support: seamless cloud-based semantic search
  • πŸ€– AI-Powered Reranking: Hugging Face or OpenAI models
  • πŸ“Š Hybrid Fusion: BM25 + semantic scoring
  • ♾️ Normalized Scores: interpretable [0,1] relevance
  • πŸ“ˆ Spectral Graph Ranking: enhances candidate ranking stability
  • πŸš€ Scalable: FAISS for fast local retrieval

πŸ“¦ Installation

1️⃣ Install core package

pip install uhsr[cpu]

2️⃣ (Optional) GPU acceleration

pip install uhsr[gpu]

3️⃣ (Optional) Pinecone for cloud-based retrieval

pip install pinecone-client

4️⃣ (Optional) OpenAI-based reranking

pip install openai

⚑ Usage Example

from sentence_transformers import SentenceTransformer
from uhsr import UHSR
import numpy as np

# Sample documents
documents = [
    "Apple releases new iPhone",
    "Tesla's stock price surges",
    "Google announces AI updates",
    "Amazon introduces drone delivery",
    "Microsoft acquires a gaming company"
]

# Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(documents, normalize_embeddings=True)
query_embedding = model.encode("Did Tesla's stock price go up?", normalize_embeddings=True)

# Initialize UHSR with OpenAI Reranker
retrieval_system = UHSR(
    documents,
    embeddings,
    reranker_type="openai",
    openai_api_key="your-openai-api-key"
)

# Retrieve results
retrieved_docs, scores = retrieval_system.retrieve(
    "Did Tesla's stock price go up?",
    query_embedding,
    top_k=3,
    metric='cosine',
    rerank=True
)

for doc, score in zip(retrieved_docs, scores):
    print(f"{doc} (Score: {score:.4f})")

🌐 Using Pinecone for Scalable Search

retrieval_system = UHSR(
    documents,
    embeddings,
    use_pinecone=True,
    pinecone_api_key="your_pinecone_api_key"
)

retrieved_docs, scores = retrieval_system.retrieve(
    "Did Tesla's stock price go up?",
    query_embedding,
    top_k=3,
    metric='cosine'
)

πŸŽ›οΈ Supported Similarity Metrics

retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='cosine')      # βœ… Cosine
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='euclidean')   # βœ… Euclidean
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='mahalanobis') # βœ… Mahalanobis
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='manhattan')   # βœ… Manhattan
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='chebyshev')   # βœ… Chebyshev
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='jaccard')     # βœ… Jaccard
retrieved_docs, scores = retrieval_system.retrieve("query", query_embedding, metric='hamming')     # βœ… Hamming

πŸ“‚ Repository Structure

uhsr-retrieval/
β”œβ”€β”€ uhsr/
β”‚   β”œβ”€β”€ core.py             # Main retrieval logic
β”‚   β”œβ”€β”€ bm25.py             # BM25 implementation
β”‚   β”œβ”€β”€ faiss_retrieval.py  # FAISS backend
β”‚   β”œβ”€β”€ vector_db.py        # Pinecone integration
β”‚   β”œβ”€β”€ similarity.py       # Similarity metrics
β”‚   β”œβ”€β”€ reranker.py         # AI-based reranking
β”‚   β”œβ”€β”€ utils.py            # Utility functions
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ example.py
β”œβ”€β”€ README.md
β”œβ”€β”€ setup.py
β”œβ”€β”€ requirements.txt

🎯 Requirements

  • numpy
  • sentence-transformers
  • faiss-cpu / faiss-gpu
  • pinecone-client
  • openai

πŸ§ͺ Running Tests

pytest

Learn more about UHSR on Medium.

πŸš€ Try UHSR today & supercharge your search!

About

UHSR (Unified Hyperbolic Spectral Retrieval) is a next-generation hybrid text retrieval framework that combines BM25 (Lexical Search) with FAISS/Pinecone (Semantic Search), enhanced by Spectral Re-Ranking & AI-Powered Reranking. It supports multiple similarity metrics, provides interpretable normalized scores, & is designed for scalability & speed.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages