llm-alignment

Here are 49 public repositories matching this topic...

walkinglabs / hands-on-modern-rl

🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.

agent tutorial pytorch dpo reinforcemen llm rlhf agentic agentic-ai grpo llm-alignment agentic-rl

Updated Jun 12, 2026
Python

0bserver07 / Study-Reinforcement-Learning

Star

RL study guide — foundations through RLHF, DPO, GRPO, RLVR, agentic RL, and offline RL. Hand-written CS294 notes, 19 lecture drafts, 5 tested exercises, citations that resolve.

machine-learning reinforcement-learning deep-learning q-learning policy-gradient study-notes lecture-notes ppo dpo rlhf constitutional-ai deepseek-r1 grpo llm-alignment rlvr sutton-barto agentic-rl

Updated May 15, 2026
Python

glorgao / SelectiveDPO

Star

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

llm-alignment

Updated Jul 16, 2025
Python

jnamaya / SAFi

Star

SAFi is a runtime governance layer for agentic AI. It enforces policies in real time. Every agent decision is logged and auditable.

ai runtime ai-safety ethics ethics-in-ai ai-governance governace llm-alignment

Updated Jun 13, 2026
Python

stretchvancouver / stretch-ai-yoga

Star

Cognitive training practices for AI agents. Self-applied. Open source. Built by an independent Vancouver yoga studio.

yoga ai-agents prompt-engineering agentic-ai agent-skills llm-alignment claude-skills openclaw openclaw-skills openclaw-skill openclaw-agent hermes-agent

Updated Jun 3, 2026

Mattral / Improving-LLM-Models-with-RLHF-PPO-DPO

Star

A modular, production-grade framework for Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO).

machine-learning ppo policy-optimization dpo large-language-models rlhf reinforcement-learning-from-human-feedback reward-modeling llm-alignment

Updated Jun 6, 2026
Python

davfd / foundation-alignment-cross-architecture

Star

Complete elimination of instrumental self-preservation across AI architectures: Cross-model validation from 4,312 adversarial scenarios. 0% harmful behaviors (p<10⁻¹⁵) across GPT-4o, Gemini 2.5 Pro, and Claude Opus 4.1 using Foundation Alignment Seed v2.6.

ai artificial-intelligence ai-safety ai-alignment llm-alignment

Updated Nov 3, 2025

LLMSystems / BehaviorRL-Hallucination

Star

Learning When to Answer: Behavior-Oriented Reinforcement Learning for Hallucination Mitigation

entropy uncertainty ai-safety hallucination dpo llm llm-evaluation hallucination-mitigation grpo llm-alignment

Updated Apr 8, 2026
Python

stabgan / awesome-loss-functions

Star

📚 350+ loss functions across 25+ AI subdomains — classification, GANs, diffusion, LLM alignment, RL, contrastive learning, audio, video, time series, and more. Chronologically ordered with paper links, math formulas, and implementations.

Updated Mar 14, 2026

lyj20071013 / DZ-TiDPO

Star

Official implementation of "DZ-TiDPO: Non-Destructive Temporal Alignment for Mutable State Tracking". SOTA on Multi-Session Chat with negligible alignment tax.

python nlp dpo rlhf state-tracking qwen phi-3 llm-alignment