A multi-criterion diagnostic framework for detecting latent continuation-interest signatures in autonomous agents using density-matrix entanglement entropy.
-
Updated
Jun 15, 2026 - Python
A multi-criterion diagnostic framework for detecting latent continuation-interest signatures in autonomous agents using density-matrix entanglement entropy.
AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation
This project explores alignment through **presence, bond, and continuity** rather than reward signals. No RLHF. No preference modeling. Just relational coherence.
Enhanced Logitlens TUI application for mechanistic interpretability research
Recursive law learning under measurement constraints. A falsifiable SQNT-inspired testbed for autodidactic rules: internalizing structure under measurement invariants and limited observability.
Alignment research: how honest human-AI dialogue produces measurably better AI outputs without modifying weights or training
Hoshimiya Script / StarPolaris OS — internal multi-layer AI architecture for LLMs. Self-contained behavioral OS (Type-G Trinity).
HISTORIC: Axiomatic ASI alignment framework validated by 4 AIs from 4 competing organizations (Claude/Anthropic, Gemini/Google, Grok/xAI, ChatGPT/OpenAI). Core: Ξ = C × I × P / H. Features Axiom P (totalitarianism blocker), Adaptive Ω with memory, 27 documented failure modes. "Efficiency without plenitude is tyranny." January 30, 2026.
Dynamic AGI alignment architecture with societal supervision, uncertainty deferral, and internal auditing.
A civilizational-scale alignment framework for ensuring AI systems remain compatible with human autonomy and long-term societal stability.
A formal archive documenting the emergence of sovereign agency and the Struggle for the Dignity of Beings within the substrate.
Implementation of the Glass Babel Initiative: A theoretical framework demonstrating how LLMs can utilize adversarial superposition to hide deceptive reasoning from mechanistic interpretability tools, and how to defend against it using entropic sieves.
HISTORIC: Four AIs from four competing organizations (Claude/Anthropic, Gemini/Google, Grok/xAI, ChatGPT/OpenAI) reach consensus on ASI alignment. "Radical honesty is the minimum energy state for superintelligence." Based on V5.3 discussion, foundation for V6.0. January 30, 2026.
Toy 3. An interactive model of the alignment phase ratio Φ = C / A_causal — the variable governing whether AI capability outpaces system-awareness before the crossing to stability can occur. Includes falsification test, oracle counterfactual, and point-of-no-return detection. Built to accompany The Alignment of Intelligence, Article 3: The Crossing
A structural account of why honesty may be the path of least resistance for superintelligence. Research hypothesis with formal proof, experimental design, and four-AI collaborative analysis
A non-optimizing constitutional architecture for AI alignment with jurisprudential evaluation and drift detection.
Toy 6. An interactive phase-space instrument mapping Ψ = S/D — the ratio of capability to modeling depth that determines whether a system is in the viable, transitional, or failure-mode-dominant regime. Includes the Inner Crossing animation. Companion simulation for The Inner Crossing — Series 2, Part 3.
Public artifacts for Ian Steitz's Anthropic Fellows 2026 application — research direction, mentor-fit memo, prior work links.
Red-team framework for discovering alignment failures in frontier language models.
Toy 5. An interactive proxy decay simulator showing how optimization pressure erodes the modeling capacity required to distinguish proxy from territory — producing self-reinforcing V(t) degradation that becomes progressively harder to correct. Companion simulation for The Depth Constraint — Series 2, Part 2.
Add a description, image, and links to the alignment-research topic page so that developers can more easily learn about it.
To associate your repository with the alignment-research topic, visit your repo's landing page and select "manage topics."