Empirical measurements and analysis code accompanying the paper:
Cross-Family Convergence of Neural Network Weight Skeletons: An Empirical Study via Weibull Shape Parameter Tiexin Ding, Independent Researcher (April 2026)
- Paper (Zenodo): https://doi.org/10.5281/zenodo.19652706
- Related: NPM framework paper (Zenodo, 2026)
- Author ORCID: 0000-0003-1950-1625
- The Weibull shape parameter
$k$ of the central 80% of the$|W|$ distribution converges to a narrow band$[1.13, 1.19]$ after sufficient training — across 10 models from 5 independent families (Pythia, OLMo, OLMo-2, Qwen, Mistral, LLaMA-3), spanning a 100× parameter range (70M → 8B). - The convergence is a property of the distribution body; full-100% fits mask the pattern (body–tail ablation on 7 of 10 models).
-
NPM-d$k$ (first derivative of
$k$ on the log-step axis) exhibits a three-phase structure: formation → anchor → bifurcation, with timescales aligned to the break-even point (Jastrzebski et al., 2020) and the rewinding point (Frankle et al., 2020). Verified on 4 Pythia scales with dense early-step sampling.
NPM-K-public/
├── paper/ Paper artifacts
│ ├── paper_1_en_260419_v6.pdf English paper (22 pp)
│ ├── paper_1_cn_260419_v6.pdf Chinese companion version
│ ├── paper_1_en_260419_v6.tex LaTeX source (English)
│ └── paper_1_cn_260419_v6.tex LaTeX source (Chinese, XeLaTeX)
├── scripts/ Analysis code
│ ├── npm_core.py ★ Core canonical module (SKIP + Weibull-k)
│ ├── generate_figs_v7_20260419.py Main figure generator
│ ├── requirements.txt Python dependencies
│ └── README.md Per-script usage
├── MODELS.md HuggingFace IDs, revisions, measured k/ratio/max|W|
├── LICENSE MIT for code; CC-BY 4.0 for paper
└── README.md This file
Measurement data (JSON/CSV) and all 15 figures are available in the Zenodo archive: https://doi.org/10.5281/zenodo.19652706
| Family | Model | Checkpoints used |
|---|---|---|
| Pythia | EleutherAI/pythia-70m |
17 revisions (step 0 → step 143000, dense early) |
| Pythia | EleutherAI/pythia-160m |
7 standard HuggingFace revisions |
| Pythia | EleutherAI/pythia-410m |
7 standard HuggingFace revisions |
| Pythia | EleutherAI/pythia-1b |
7 standard HuggingFace revisions |
| Pythia | EleutherAI/pythia-2.8b |
step 40k, 80k, 143k (3 independently-retrievable) |
| OLMo | allenai/OLMo-1B-hf |
final |
| OLMo-2 | allenai/OLMo-2-0425-1B |
7 training phases (init, s1-early … s2-end, main) |
| Qwen-2.5 | Qwen/Qwen2.5-1.5B |
final |
| Mistral | mistralai/Mistral-7B-v0.3 |
final |
| LLaMA-3 | meta-llama/Meta-Llama-3-8B |
final |
Exact commit SHAs and config snapshots: see MODELS.md.
The "skeleton" over which the Weibull fit is performed is defined by three explicit rules:
-
Include: every
*.weighttensor with dimension ≥ 2 — QKV projections, MLP up/down, embeddings, output heads. - Exclude: layer norms, biases, 1-D parameters, rotary / positional caches.
-
Fit range: middle 80% of
$|W|$ (quantiles 0.10 → 0.90).
Implementation in scripts/npm_core.py — see is_eligible(), audit_skip(), and weibull_k_loglin(). Canonical SKIP list: SKIP_SKELETON_ONLY (the default CANONICAL_SKIP).
pip install -r scripts/requirements.txtfrom transformers import AutoModelForCausalLM
import torch
import sys; sys.path.insert(0, "scripts")
from npm_core import is_eligible, weibull_k_loglin, CANONICAL_SKIP, audit_skip
model = AutoModelForCausalLM.from_pretrained(
"EleutherAI/pythia-1b", revision="main",
torch_dtype=torch.float16, low_cpu_mem_usage=True,
)
sd = model.state_dict()
# 1. Verify SKIP coverage
audit_skip(sd, skip_list=CANONICAL_SKIP)
# 2. Collect |W| from eligible tensors and fit k on middle 80%
mags = torch.cat([
t.abs().flatten().double()
for name, t in sd.items() if is_eligible(name, CANONICAL_SKIP)
])
k = weibull_k_loglin(mags, low=0.10, high=0.90)
print(f"Pythia-1B terminal k = {k:.4f}") # expect ~1.1808Measurement values for all 10 models are tabulated in MODELS.md; reproduce within 1e-4 of published values (use subsample=10M, seed=42 for ≥1B models to reproduce exactly).
The figure generator (scripts/generate_figs_v7_20260419.py) reads cached measurement JSON files. Download the data bundle from the Zenodo archive and run:
python scripts/generate_figs_v7_20260419.pySee scripts/README.md for full per-script usage and the list of additional (author-maintained, shareable on request) scripts that produced the supplementary figures and tensor-verification results.
If you use NPM-K measurements, the canonical SKIP definition, or the body-vs-tail ablation in your work, please cite:
@misc{ding2026npmk,
author = {Ding, Tiexin},
title = {Cross-Family Convergence of Neural Network Weight Skeletons:
An Empirical Study via Weibull Shape Parameter},
year = {2026},
month = apr,
publisher = {Zenodo},
doi = {10.5281/zenodo.19652706},
url = {https://doi.org/10.5281/zenodo.19652706}
}
@misc{ding2026npm,
author = {Ding, Tiexin},
title = {Neural Percolation Model (NPM): A Framework for Analyzing
Neural Network Weight Skeleton Connectivity via Weibull Shape Parameter},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19209722},
url = {https://doi.org/10.5281/zenodo.19209722}
}All model weights analyzed in this work are publicly distributed on HuggingFace. Exact revisions for each measurement are recorded in MODELS.md; measurements are reproducible from those revisions.
- Code (
scripts/): MIT License - Paper PDFs and TeX sources (
paper/): CC-BY 4.0
The author thanks AllenAI, EleutherAI, the Qwen Team, Mistral AI, and Meta AI for releasing the OLMo, Pythia, Qwen, Mistral, and LLaMA model suites with publicly available weights and training checkpoints. The author further thanks HuggingFace for hosting these models through open and reliable infrastructure.
Tiexin Ding — tiexinding@gmail.com