Uncertainty based selection of compatible inputs
-
Updated
Jun 12, 2026 - Python
Uncertainty based selection of compatible inputs
Behavioral Trust Clustering a thermodynamic governance layer that reduces LLM hallucination by 52% on HumanEval. Drop-in wrapper for any decoder. MIT.
Guardrails watch what AI says. REMORA governs what AI does. A pre-execution governance layer for AI agent tool calls: ACCEPT, VERIFY, ABSTAIN, ESCALATE, with policy, evidence, uncertainty, and an auditable DecisionEnvelope. Research-grade, open source.
We show that a model owner can artificially introduce uncertainty into their model and provide a corresponding detection mechanism.
DegradeRisk-Seg: risk-controlled semantic segmentation under degraded multi-modal remote-sensing observations
Reliable medical QA with Mistral-7B, QLoRA, selective prediction, and learned abstention via warm-start SFT + DPO.
Code and data release for the paper 'Cherry-pick Override: Unsafe Directional Commitment in LLM Judges under Mixed Evidence'
Code for our paper analyzing the looseness of the upper bound on selective classification performance.
Tsetlin Machines with a certificate on every answer: the exact number of feature flips a prediction survives, computed per sample, with predict-or-abstain when the radius is too small.
Reproducible MEDAI deferral simulation (AIRI 2026). Synthetic research code.
Investigation of how sampling strategies affect Selective Prediction performance in Multi Task Learning
Trustworthy medical image classification: noise-robust ConvNeXt-Tiny with 83.5% accuracy, calibrated selective prediction, HAM10000 + ISIC 2019.
A comprehensive library for uncertainty quantification in machine learning.
Deepfake detection with Bayesian uncertainty quantification, selective prediction, and an interactive Streamlit demo.
Free confidence gate for LLM correctness — logistic regression on (generation length, mean logprob), with cascade routing and split-conformal certificates. The pinned topo-confidence result.
Code Repository for SCoRE paper
Reproducible pipeline for silent-failure auditing in ECG accept-sets (MIT-BIH) with Newton–Puiseux onset scoring
Transform enrichment outputs into verifiable pathway claims via stability distillation, evidence modules, and mechanical PASS/ABSTAIN/FAIL audits.
Out-of-distribution detection and risk-calibrated confidence tiering for classifiers, scored in the model's own feature space.
A 3B model that knows when it's unsure and spends compute only where it pays. Reproducible, on a laptop. Built on SmolLM3.
Add a description, image, and links to the selective-prediction topic page so that developers can more easily learn about it.
To associate your repository with the selective-prediction topic, visit your repo's landing page and select "manage topics."