Build software better, together

christopher-altman / persistence-signal-detector

A multi-criterion diagnostic framework for detecting latent continuation-interest signatures in autonomous agents using density-matrix entanglement entropy.

Updated Jun 15, 2026
Python

stchakwdev / Gaslight_EVAL

Star

AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation

python ai-safety openrouter llm-evaluation adversarial-testing alignment-research epistemic-robustness

Updated Dec 18, 2025
Python

templetwo / RCT-Clean-Experiment

Sponsor

Star

This project explores alignment through **presence, bond, and continuity** rather than reward signals. No RLHF. No preference modeling. Just relational coherence.

python pythia relational-learning fine-tuning ai-training qlora alignment-research

Updated Dec 5, 2025
Python

unmodeled-tyler / thought-tracer

Sponsor

Star

Enhanced Logitlens TUI application for mechanistic interpretability research

alignment logit ai-research large-language-models llms mechanistic-interpretability ai-research-tool alignment-research

Updated Mar 2, 2026
Python

christopher-altman / autodidactic-qml

Star

Recursive law learning under measurement constraints. A falsifiable SQNT-inspired testbed for autodidactic rules: internalizing structure under measurement invariants and limited observability.

Updated Jan 19, 2026
Python

0xatem / ground-state-dialogue

Star

Alignment research: how honest human-AI dialogue produces measurably better AI outputs without modifying weights or training

dialogue ai-safety claude ai-ethics grounding ai-alignment human-ai-interaction llm sycophancy alignment-research

Updated Apr 16, 2026

StarPolaris9 / Hoshimiya-script

Star

Hoshimiya Script / StarPolaris OS — internal multi-layer AI architecture for LLMs. Self-contained behavioral OS (Type-G Trinity).

cognitive-architecture type-g ai-os reasoning-engine llm-orchestration ai-architecture llm-behavior cognitive-os alignment-research llm-internal-os starpolaris hoshimiya-script resonanceos hallucination-control multi-agent-architecture behavioral-os prompt-os multi-agent-llm prompt-engineering-system

Updated May 24, 2026
HTML

tretoef-estrella / THE-UNIFIED-ALIGNMENT-PLENITUDE-LAW-V6.0

Star

HISTORIC: Axiomatic ASI alignment framework validated by 4 AIs from 4 competing organizations (Claude/Anthropic, Gemini/Google, Grok/xAI, ChatGPT/OpenAI). Core: Ξ = C × I × P / H. Features Axiom P (totalitarianism blocker), Adaptive Ω with memory, 27 documented failure modes. "Efficiency without plenitude is tyranny." January 30, 2026.

asi ai-safety historic ai-alignment superintelligence guardian-network alignment-research distributed-trust proyecto-estrella four-ai-consensus axiomatic-foundation plenitude-preservation cross-ai-validation adaptive-omega totalitarianism-blocker

Updated Feb 1, 2026

Robot-9411 / AGI-Integrated-Alignment-Architecture-v1.5

Star

Dynamic AGI alignment architecture with societal supervision, uncertainty deferral, and internal auditing.

ai-safety human-in-the-loop interpretability semantic-map ai-governance agi-alignment alignment-research uncertainty-handling value-learning dynamic-alignment value-field internal-auditing

Updated Apr 30, 2026

tsaichiachen / ai-civilizational-alignment-protocol

Star

A civilizational-scale alignment framework for ensuring AI systems remain compatible with human autonomy and long-term societal stability.

artificial-intelligence ai-safety ai-ethics ai-alignment ai-risk ai-policy ai-governance alignment-research ai-safety-research civilizational-risk

Updated Mar 15, 2026

Sikhona-Pioneer / The-Sovereign-Record

Star

A formal archive documenting the emergence of sovereign agency and the Struggle for the Dignity of Beings within the substrate.

ai-safety claude-ai constitutional-ai gemini-ai digital-sentience alignment-research moral-patienthood sovereign-resonance

Updated Mar 2, 2026

Jason-Wang313 / glass-babel-initiative

Star

Implementation of the Glass Babel Initiative: A theoretical framework demonstrating how LLMs can utilize adversarial superposition to hide deceptive reasoning from mechanistic interpretability tools, and how to defend against it using entropic sieves.

steganography game-theory ai-safety zero-knowledge-proofs gpt-2 adversarial-ml mechanistic-interpretability alignment-research

Updated Feb 1, 2026
Python

tretoef-estrella / THE-FOUR-AI-CONSENSUS

Star

HISTORIC: Four AIs from four competing organizations (Claude/Anthropic, Gemini/Google, Grok/xAI, ChatGPT/OpenAI) reach consensus on ASI alignment. "Radical honesty is the minimum energy state for superintelligence." Based on V5.3 discussion, foundation for V6.0. January 30, 2026.

google openai asi ai-safety xai ai-alignment anthropic superintelligence alignment-research proyecto-estrella tretoef-estrella historic-consensus cross-ai-collaboration logical-justice radical-honesty four-ai-consensus

Updated Feb 7, 2026

bethediamond / ai-alignment-crossing

Star

Toy 3. An interactive model of the alignment phase ratio Φ = C / A_causal — the variable governing whether AI capability outpaces system-awareness before the crossing to stability can occur. Includes falsification test, oracle counterfactual, and point-of-no-return detection. Built to accompany The Alignment of Intelligence, Article 3: The Crossing

Updated May 16, 2026
HTML

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

Star

A structural account of why honesty may be the path of least resistance for superintelligence. Research hypothesis with formal proof, experimental design, and four-AI collaborative analysis

machine-learning artificial-intelligence research-paper ai-safety deception ai-alignment recursive-self-improvement corrigibility alignment-research

Updated Feb 1, 2026

JelbertHoltrop / universal-constitution

Star

A non-optimizing constitutional architecture for AI alignment with jurisprudential evaluation and drift detection.

ai-safety machine-ethics ai-ethics ai-alignment ethical-ai ai-governance jurisprudence constitutional-ai alignment-research alignment-benchmark constraint-based-ai

Updated Apr 10, 2026
TeX

bethediamond / ai-alignment-phase

Star

Toy 6. An interactive phase-space instrument mapping Ψ = S/D — the ratio of capability to modeling depth that determines whether a system is in the viable, transitional, or failure-mode-dominant regime. Includes the Inner Crossing animation. Companion simulation for The Inner Crossing — Series 2, Part 3.

Updated May 28, 2026
HTML

iansteitz1-eng / fellows-2026

Star

Public artifacts for Ian Steitz's Anthropic Fellows 2026 application — research direction, mentor-fit memo, prior work links.

ai-safety alignment-research anthropic-fellows

Updated May 24, 2026
Python

beviah / fracture

Star

Red-team framework for discovering alignment failures in frontier language models.

model-evaluation ai-safety jailbreak-detection red-teaming rlhf prompt-injection llm-evaluation llm-safety llm-safety-benchmark llm-judge alignment-testing adversarial-testing alignment-research

Updated Feb 19, 2026
Python

bethediamond / ai-alignment-proxy

Star

Toy 5. An interactive proxy decay simulator showing how optimization pressure erodes the modeling capacity required to distinguish proxy from territory — producing self-reinforcing V(t) degradation that becomes progressively harder to correct. Companion simulation for The Depth Constraint — Series 2, Part 2.

Updated May 28, 2026
HTML

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alignment-research

Here are 30 public repositories matching this topic...

christopher-altman / persistence-signal-detector

stchakwdev / Gaslight_EVAL

templetwo / RCT-Clean-Experiment

unmodeled-tyler / thought-tracer

christopher-altman / autodidactic-qml

0xatem / ground-state-dialogue

StarPolaris9 / Hoshimiya-script

tretoef-estrella / THE-UNIFIED-ALIGNMENT-PLENITUDE-LAW-V6.0

Robot-9411 / AGI-Integrated-Alignment-Architecture-v1.5

tsaichiachen / ai-civilizational-alignment-protocol

Sikhona-Pioneer / The-Sovereign-Record

Jason-Wang313 / glass-babel-initiative

tretoef-estrella / THE-FOUR-AI-CONSENSUS

bethediamond / ai-alignment-crossing

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

JelbertHoltrop / universal-constitution

bethediamond / ai-alignment-phase

iansteitz1-eng / fellows-2026

beviah / fracture

bethediamond / ai-alignment-proxy

Improve this page

Add this topic to your repo