NPM-K: Cross-Family Convergence of Neural Network Weight Skeletons

Empirical measurements and analysis code accompanying the paper:

Cross-Family Convergence of Neural Network Weight Skeletons: An Empirical Study via Weibull Shape Parameter Tiexin Ding, Independent Researcher (April 2026)

Paper (Zenodo): https://doi.org/10.5281/zenodo.19652706
Related: NPM framework paper (Zenodo, 2026)
Author ORCID: 0000-0003-1950-1625

TL;DR

The Weibull shape parameter $k$ of the central 80% of the $|W|$ distribution converges to a narrow band $[1.13, 1.19]$ after sufficient training — across 10 models from 5 independent families (Pythia, OLMo, OLMo-2, Qwen, Mistral, LLaMA-3), spanning a 100× parameter range (70M → 8B).
The convergence is a property of the distribution body; full-100% fits mask the pattern (body–tail ablation on 7 of 10 models).
NPM-d$k$ (first derivative of $k$ on the log-step axis) exhibits a three-phase structure: formation → anchor → bifurcation, with timescales aligned to the break-even point (Jastrzebski et al., 2020) and the rewinding point (Frankle et al., 2020). Verified on 4 Pythia scales with dense early-step sampling.

Repository Structure

NPM-K-public/
├── paper/                              Paper artifacts
│   ├── paper_1_en_260419_v6.pdf        English paper (22 pp)
│   ├── paper_1_cn_260419_v6.pdf        Chinese companion version
│   ├── paper_1_en_260419_v6.tex        LaTeX source (English)
│   └── paper_1_cn_260419_v6.tex        LaTeX source (Chinese, XeLaTeX)
├── scripts/                            Analysis code
│   ├── npm_core.py                     ★ Core canonical module (SKIP + Weibull-k)
│   ├── generate_figs_v7_20260419.py    Main figure generator
│   ├── requirements.txt                Python dependencies
│   └── README.md                       Per-script usage
├── MODELS.md                           HuggingFace IDs, revisions, measured k/ratio/max|W|
├── LICENSE                             MIT for code; CC-BY 4.0 for paper
└── README.md                           This file

Measurement data (JSON/CSV) and all 15 figures are available in the Zenodo archive: https://doi.org/10.5281/zenodo.19652706

Models and HuggingFace Revisions

Family	Model	Checkpoints used
Pythia	`EleutherAI/pythia-70m`	17 revisions (step 0 → step 143000, dense early)
Pythia	`EleutherAI/pythia-160m`	7 standard HuggingFace revisions
Pythia	`EleutherAI/pythia-410m`	7 standard HuggingFace revisions
Pythia	`EleutherAI/pythia-1b`	7 standard HuggingFace revisions
Pythia	`EleutherAI/pythia-2.8b`	step 40k, 80k, 143k (3 independently-retrievable)
OLMo	`allenai/OLMo-1B-hf`	final
OLMo-2	`allenai/OLMo-2-0425-1B`	7 training phases (init, s1-early … s2-end, main)
Qwen-2.5	`Qwen/Qwen2.5-1.5B`	final
Mistral	`mistralai/Mistral-7B-v0.3`	final
LLaMA-3	`meta-llama/Meta-Llama-3-8B`	final

Exact commit SHAs and config snapshots: see MODELS.md.

Canonical SKIP (Skeleton Definition)

The "skeleton" over which the Weibull fit is performed is defined by three explicit rules:

Include: every *.weight tensor with dimension ≥ 2 — QKV projections, MLP up/down, embeddings, output heads.
Exclude: layer norms, biases, 1-D parameters, rotary / positional caches.
Fit range: middle 80% of $|W|$ (quantiles 0.10 → 0.90).

Implementation in scripts/npm_core.py — see is_eligible(), audit_skip(), and weibull_k_loglin(). Canonical SKIP list: SKIP_SKELETON_ONLY (the default CANONICAL_SKIP).

Reproducing Measurements

Install

pip install -r scripts/requirements.txt

Measure terminal $k$ for a single model

from transformers import AutoModelForCausalLM
import torch
import sys; sys.path.insert(0, "scripts")
from npm_core import is_eligible, weibull_k_loglin, CANONICAL_SKIP, audit_skip

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/pythia-1b", revision="main",
    torch_dtype=torch.float16, low_cpu_mem_usage=True,
)
sd = model.state_dict()

# 1. Verify SKIP coverage
audit_skip(sd, skip_list=CANONICAL_SKIP)

# 2. Collect |W| from eligible tensors and fit k on middle 80%
mags = torch.cat([
    t.abs().flatten().double()
    for name, t in sd.items() if is_eligible(name, CANONICAL_SKIP)
])
k = weibull_k_loglin(mags, low=0.10, high=0.90)
print(f"Pythia-1B terminal k = {k:.4f}")   # expect ~1.1808

Measurement values for all 10 models are tabulated in MODELS.md; reproduce within 1e-4 of published values (use subsample=10M, seed=42 for ≥1B models to reproduce exactly).

Regenerate paper figures

The figure generator (scripts/generate_figs_v7_20260419.py) reads cached measurement JSON files. Download the data bundle from the Zenodo archive and run:

python scripts/generate_figs_v7_20260419.py

See scripts/README.md for full per-script usage and the list of additional (author-maintained, shareable on request) scripts that produced the supplementary figures and tensor-verification results.

Citation

If you use NPM-K measurements, the canonical SKIP definition, or the body-vs-tail ablation in your work, please cite:

@misc{ding2026npmk,
  author    = {Ding, Tiexin},
  title     = {Cross-Family Convergence of Neural Network Weight Skeletons:
               An Empirical Study via Weibull Shape Parameter},
  year      = {2026},
  month     = apr,
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19652706},
  url       = {https://doi.org/10.5281/zenodo.19652706}
}

@misc{ding2026npm,
  author    = {Ding, Tiexin},
  title     = {Neural Percolation Model (NPM): A Framework for Analyzing
               Neural Network Weight Skeleton Connectivity via Weibull Shape Parameter},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19209722},
  url       = {https://doi.org/10.5281/zenodo.19209722}
}

Data Provenance

All model weights analyzed in this work are publicly distributed on HuggingFace. Exact revisions for each measurement are recorded in MODELS.md; measurements are reproducible from those revisions.

License

Code (scripts/): MIT License
Paper PDFs and TeX sources (paper/): CC-BY 4.0

Acknowledgements

The author thanks AllenAI, EleutherAI, the Qwen Team, Mistral AI, and Meta AI for releasing the OLMo, Pythia, Qwen, Mistral, and LLaMA model suites with publicly available weights and training checkpoints. The author further thanks HuggingFace for hosting these models through open and reliable infrastructure.

Contact

Tiexin Ding — tiexinding@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NPM-K: Cross-Family Convergence of Neural Network Weight Skeletons

TL;DR

Repository Structure

Models and HuggingFace Revisions

Canonical SKIP (Skeleton Definition)

Reproducing Measurements

Install

Measure terminal $k$ for a single model

Regenerate paper figures

Citation

Data Provenance

License

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
paper		paper
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
MODELS.md		MODELS.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

NPM-K: Cross-Family Convergence of Neural Network Weight Skeletons

TL;DR

Repository Structure

Models and HuggingFace Revisions

Canonical SKIP (Skeleton Definition)

Reproducing Measurements

Install

Measure terminal $k$ for a single model

Regenerate paper figures

Citation

Data Provenance

License

Acknowledgements

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages