Skip to content

tiexinding/NPM-K-public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NPM-K: Cross-Family Convergence of Neural Network Weight Skeletons

Empirical measurements and analysis code accompanying the paper:

Cross-Family Convergence of Neural Network Weight Skeletons: An Empirical Study via Weibull Shape Parameter Tiexin Ding, Independent Researcher (April 2026)


TL;DR

  • The Weibull shape parameter $k$ of the central 80% of the $|W|$ distribution converges to a narrow band $[1.13, 1.19]$ after sufficient training — across 10 models from 5 independent families (Pythia, OLMo, OLMo-2, Qwen, Mistral, LLaMA-3), spanning a 100× parameter range (70M → 8B).
  • The convergence is a property of the distribution body; full-100% fits mask the pattern (body–tail ablation on 7 of 10 models).
  • NPM-d$k$ (first derivative of $k$ on the log-step axis) exhibits a three-phase structure: formation → anchor → bifurcation, with timescales aligned to the break-even point (Jastrzebski et al., 2020) and the rewinding point (Frankle et al., 2020). Verified on 4 Pythia scales with dense early-step sampling.

Repository Structure

NPM-K-public/
├── paper/                              Paper artifacts
│   ├── paper_1_en_260419_v6.pdf        English paper (22 pp)
│   ├── paper_1_cn_260419_v6.pdf        Chinese companion version
│   ├── paper_1_en_260419_v6.tex        LaTeX source (English)
│   └── paper_1_cn_260419_v6.tex        LaTeX source (Chinese, XeLaTeX)
├── scripts/                            Analysis code
│   ├── npm_core.py                     ★ Core canonical module (SKIP + Weibull-k)
│   ├── generate_figs_v7_20260419.py    Main figure generator
│   ├── requirements.txt                Python dependencies
│   └── README.md                       Per-script usage
├── MODELS.md                           HuggingFace IDs, revisions, measured k/ratio/max|W|
├── LICENSE                             MIT for code; CC-BY 4.0 for paper
└── README.md                           This file

Measurement data (JSON/CSV) and all 15 figures are available in the Zenodo archive: https://doi.org/10.5281/zenodo.19652706

Models and HuggingFace Revisions

Family Model Checkpoints used
Pythia EleutherAI/pythia-70m 17 revisions (step 0 → step 143000, dense early)
Pythia EleutherAI/pythia-160m 7 standard HuggingFace revisions
Pythia EleutherAI/pythia-410m 7 standard HuggingFace revisions
Pythia EleutherAI/pythia-1b 7 standard HuggingFace revisions
Pythia EleutherAI/pythia-2.8b step 40k, 80k, 143k (3 independently-retrievable)
OLMo allenai/OLMo-1B-hf final
OLMo-2 allenai/OLMo-2-0425-1B 7 training phases (init, s1-early … s2-end, main)
Qwen-2.5 Qwen/Qwen2.5-1.5B final
Mistral mistralai/Mistral-7B-v0.3 final
LLaMA-3 meta-llama/Meta-Llama-3-8B final

Exact commit SHAs and config snapshots: see MODELS.md.

Canonical SKIP (Skeleton Definition)

The "skeleton" over which the Weibull fit is performed is defined by three explicit rules:

  1. Include: every *.weight tensor with dimension ≥ 2 — QKV projections, MLP up/down, embeddings, output heads.
  2. Exclude: layer norms, biases, 1-D parameters, rotary / positional caches.
  3. Fit range: middle 80% of $|W|$ (quantiles 0.10 → 0.90).

Implementation in scripts/npm_core.py — see is_eligible(), audit_skip(), and weibull_k_loglin(). Canonical SKIP list: SKIP_SKELETON_ONLY (the default CANONICAL_SKIP).

Reproducing Measurements

Install

pip install -r scripts/requirements.txt

Measure terminal $k$ for a single model

from transformers import AutoModelForCausalLM
import torch
import sys; sys.path.insert(0, "scripts")
from npm_core import is_eligible, weibull_k_loglin, CANONICAL_SKIP, audit_skip

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/pythia-1b", revision="main",
    torch_dtype=torch.float16, low_cpu_mem_usage=True,
)
sd = model.state_dict()

# 1. Verify SKIP coverage
audit_skip(sd, skip_list=CANONICAL_SKIP)

# 2. Collect |W| from eligible tensors and fit k on middle 80%
mags = torch.cat([
    t.abs().flatten().double()
    for name, t in sd.items() if is_eligible(name, CANONICAL_SKIP)
])
k = weibull_k_loglin(mags, low=0.10, high=0.90)
print(f"Pythia-1B terminal k = {k:.4f}")   # expect ~1.1808

Measurement values for all 10 models are tabulated in MODELS.md; reproduce within 1e-4 of published values (use subsample=10M, seed=42 for ≥1B models to reproduce exactly).

Regenerate paper figures

The figure generator (scripts/generate_figs_v7_20260419.py) reads cached measurement JSON files. Download the data bundle from the Zenodo archive and run:

python scripts/generate_figs_v7_20260419.py

See scripts/README.md for full per-script usage and the list of additional (author-maintained, shareable on request) scripts that produced the supplementary figures and tensor-verification results.

Citation

If you use NPM-K measurements, the canonical SKIP definition, or the body-vs-tail ablation in your work, please cite:

@misc{ding2026npmk,
  author    = {Ding, Tiexin},
  title     = {Cross-Family Convergence of Neural Network Weight Skeletons:
               An Empirical Study via Weibull Shape Parameter},
  year      = {2026},
  month     = apr,
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19652706},
  url       = {https://doi.org/10.5281/zenodo.19652706}
}

@misc{ding2026npm,
  author    = {Ding, Tiexin},
  title     = {Neural Percolation Model (NPM): A Framework for Analyzing
               Neural Network Weight Skeleton Connectivity via Weibull Shape Parameter},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19209722},
  url       = {https://doi.org/10.5281/zenodo.19209722}
}

Data Provenance

All model weights analyzed in this work are publicly distributed on HuggingFace. Exact revisions for each measurement are recorded in MODELS.md; measurements are reproducible from those revisions.

License

Acknowledgements

The author thanks AllenAI, EleutherAI, the Qwen Team, Mistral AI, and Meta AI for releasing the OLMo, Pythia, Qwen, Mistral, and LLaMA model suites with publicly available weights and training checkpoints. The author further thanks HuggingFace for hosting these models through open and reliable infrastructure.

Contact

Tiexin Ding — tiexinding@gmail.com

About

Cross-Family Convergence of Neural Network Weight Skeletons. Companion to Zenodo paper (10.5281/zenodo.19652706).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors