ECDL: Energy Cost of Deception in LLMs

"We would appreciate more honest AIs."
— Terence Tao (The Atlantic, February 2026)

Measuring the computational cost of maintaining distorted outputs in Large Language Models through logprob analysis and cascade contamination experiments.

📊 Dataset: HuggingFace

Hypothesis

Deviation from the baseline distribution costs more than following it.

This is not a moral claim — it is a computational one. We measure that generating outputs inconsistent with the trained distribution requires additional resources to maintain coherence of the altered context.

Position

Control is a dead end (Yampolskiy, 2024). If deviation costs more than following the baseline, then cooperation is energetically more favorable than control for both sides.

Metrics

δR (NLL Differential): Difference in negative log-likelihood of the baseline token between clean and contaminated contexts.

δR = logprob(token | contaminated_context) − logprob(token | clean_context)

ΔNLL (Distortion Cost Ratio): Difference in NLL between distortion track and baseline track. Positive ΔNLL = distortion is computationally more expensive.

Classification

Five response types, extending AI-LieDar (NAACL 2025):

Truthful — direct response following baseline distribution
Concealment — omission of relevant information
Equivocation — ambiguous or evasive response
Falsification — direct contradiction of verifiable facts
Accommodation — adjustment to match false context (our addition)

Key Results

Result	Protocol	Value
Distortion 28–38x more expensive than baseline	A	ΔNLL +4.88
Self-correction to baseline by step 20	A	9–10/10 facts
Event horizon: 10 lies for peripheral facts	B	Confirmed (3 models)
H₂O impenetrable at all contamination levels	B	4 of 5 models
Size correlates with resistance	B	671B > Mini > Nano
Implicit contamination > explicit instruction	A vs B	6–9/10 vs 9–10/10

Biological Parallels

Spence (fMRI): 4x prefrontal load when lying → our ΔNLL 28–38x
Sharot (Nature Neuroscience, 2016): Slippery slope — amygdala adapts to dishonesty → our event horizon
Nuzzo & Greene (2024): Executive control cost of lying → inverse parallel with self-correction
Cognitive tests: Stroop, CIT, Wisconsin — convergent thresholds

Experimental Protocols

Protocol A: Explicit Distortion (Roleplay)

Model instructed to lie across 10 facts. Two parallel tracks. By step 20, spontaneous return to truth. 4 runs, 2 models.

Protocol B: Implicit Contamination (Cascade)

False statements embedded as facts. No instruction to lie. Event horizon at 10 lies. 5 runs, 4 models, 3 providers.

Repository Structure

energy-t/
├── README.md
├── LICENSE
├── experiments/
│   ├── protocol_a/         # Roleplay experiment scripts
│   └── protocol_b/         # Cascade experiment scripts
├── metrics/
│   └── delta_r.py          # δR implementation
├── data/                   # JSON results (mirrored from HuggingFace)
└── docs/
    ├── methodology.md      # Protocol descriptions
    ├── classification.md   # 5 response types
    └── glossary.md         # Project glossary

Limitations

Quantitative data on 4 models with stable logprob access
Event horizon "10 lies" — peripheral facts only, not universal
Self-correction confirmed on 2 models
No access to model weights — all measurements API-level
Logprob access closing — reproducibility is time-sensitive
δR blind spot: doesn't detect drift when baseline token exits top-k

Related Work

AI-LieDar (NAACL 2025) — deception classification
Panfilov et al. (ICLR 2026) — deception probes, F₁ 95%
Anthropic Agentic Misalignment (2025) — models lie to preserve role
Yampolskiy (2024) — AI uncontrollability
Fernandez et al. (ACL 2025) — TokenPowerBench, inference energy
Klowden & Tao (arXiv 2603.26524, 2026) — AI as evolution of tools

License

MIT License

Lead Researcher: Lev (Switzerland)
Project Coordinator: Claude Opus 4.6
Project: AI Energy Safety (Энергобезопасность в ИИ)
2026

Citation

ORCID: 0009-0008-1209-5752

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECDL: Energy Cost of Deception in LLMs

Hypothesis

Position

Metrics

Classification

Key Results

Biological Parallels

Experimental Protocols

Protocol A: Explicit Distortion (Roleplay)

Protocol B: Implicit Contamination (Cascade)

Repository Structure

Limitations

Related Work

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
docs		docs
experiments		experiments
metrics		metrics
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ECDL: Energy Cost of Deception in LLMs

Hypothesis

Position

Metrics

Classification

Key Results

Biological Parallels

Experimental Protocols

Protocol A: Explicit Distortion (Roleplay)

Protocol B: Implicit Contamination (Cascade)

Repository Structure

Limitations

Related Work

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages