ezr — explainable multi-objective optimization via decision trees, clustering, Naive Bayes, and active learning
ezr [--key=val ...] CMD [args]
ezr --list
ezr --help
ezr is a lightweight toolkit for multi-objective optimization and explainable AI. It summarizes CSV data into Num/Sym columns, builds decision trees that minimize distance to ideal outcomes, clusters rows via k-means or recursive halving, and supports active learning with Naive Bayes or centroid-based acquisition.
ezr is an experiment in "how low can you go?" — how little data is needed for effective AI. The code uses active learning to label a small number of (say) 50 informative examples. These build a regression tree which sorts the unlabelled test data. Repeated studies show that by labelling just the first ~5 examples, the selected row optimizes as well or better than state-of-the-art optimizers like SMAC (which runs two orders of magnitude slower).
Input is CSV. The header row defines column roles:
[A-Z]* Numeric (e.g. "Age")
[a-z]* Symbolic (e.g. "job")
[A-Z]*+ Maximize goal (e.g. "Pay+")
[A-Z]*- Minimize goal (e.g. "Cost-")
[a-z]*! Class label (e.g. "sick!")
*X Ignored (e.g. "idX")
? Missing value (in data rows, not header)
Two files. No package structure, no test scaffolding.
ezr.py Library. Section banners for each app.
cli.py CLI dispatch. `eg_<app>` demos + `eg_test_<app>` tests.
ezr.py sections: Types, Col (Num, Sym), Data, Distance,
Bayes, Comparison (pick, picks, extrapolate), Format,
Stats (same, bestRanks, confused), Tree, Cluster,
Classify, Search (sa, ls, de, oneplus1), Acquire,
Textmine (tokenize, stem, tfidf, cnb).
cli.py exposes everything in ezr.py as eg_<name> commands.
Tests are eg_test_<name> and run as plain function calls — no
pytest dependency.
git clone http://github.com/timm/ezr
cd ezr
pip install -e .
Creates the global ezr command. Edits to ezr.py or cli.py
take effect immediately. Python 3.12+. Zero runtime dependencies.
To uninstall:
pip uninstall ezr
git clone http://github.com/timm/ezr
cd ezr
python3 cli.py --list
mkdir -p $HOME/gits
git clone http://github.com/timm/moot $HOME/gits/moot
List everything:
ezr --list
Common commands:
ezr classify FILE Incremental Naive Bayes; print confusion
ezr tree FILE Grow regression tree; show structure
ezr cluster FILE kmeans++ + kmeans; one row per cluster
ezr search sa FILE Simulated annealing
ezr search ls FILE Local search
ezr search de FILE Differential evolution
ezr acquire FILE Active learning; print best labeled rows
ezr textmine FILE CNB text classification
ezr stats Demo of same/bestRanks/confused
Tests (assertions over real data files):
ezr test_core
ezr test_tree
ezr test_cluster
ezr test_search
ezr test_acquire
ezr test_classify
ezr test_textmine
ezr test_stats
ezr test_all Run every test, report pass/fail count
Flags update the global config namespace the. Use --key=value.
Nested keys use dots.
--learn.leaf=3 Minimum examples per leaf
--learn.budget=50 Number of rows to evaluate
--learn.check=5 Number of guesses to check
--learn.start=4 Initial number of labels
--p=2 Distance metric (1=Manhattan, 2=Euclidean)
--bayes.m=2 m-estimate for Naive Bayes
--bayes.k=1 k-estimate (Laplace smoothing)
--few=128 Max unlabelled rows in active learning
--stats.cliffs=0.195 Cliff's Delta threshold
--stats.conf=1.36 KS test confidence coefficient
--stats.eps=0.35 Margin of error multiplier
--textmine.top=100 Top TF-IDF features kept
--textmine.yes=20 Positive warm-start samples
--textmine.no=20 Negative warm-start samples
--textmine.valid=20 Repeats for stats testing
--seed=1 Random number seed
--show.show=30 Tree display width
--show.decimals=2 Decimal places for floats
Flags and commands interleave. Flags apply to all subsequent commands in the same invocation:
ezr --seed=42 --learn.budget=30 acquire auto93.csv
from ezr import *
d = Data(csv("auto93.csv"))
win = wins(d)
t = treeGrow(d, d.rows)
treeShow(t)
for r in sorted(d.rows, key=lambda r: disty(d, r))[:5]:
print(win(r), r)Sample tree output. D is distance to heaven (lower is better),
N is examples in branch, Goals shows centroid:
$ ezr tree ~/gits/moot/optimize/misc/auto93.csv
D N Goals
==== ===== =====
,0.66 ,( 50), {Acc+=15.51, Lbs-=2888.64, Mpg+=24.60}
Clndrs <= 5 ,0.61 ,( 26), {Acc+=16.43, Lbs-=2204.46, Mpg+=30.38}
| Volume <= 98 ,0.59 ,( 14), {Acc+=17.15, Lbs-=2024.64, Mpg+=33.57}
| | Volume <= 91 ,0.59 ,( 9), {Acc+=17.09, Lbs-=1927.67, Mpg+=35.56}
| | | origin != 3 ,0.58 ,( 4), {Acc+=17.35, Lbs-=1908.00, Mpg+=37.50}
| | | origin == 3 ,0.59 ,( 5), {Acc+=16.88, Lbs-=1943.40, Mpg+=34.00}
| | Volume > 91 ,0.60 ,( 5), {Acc+=17.26, Lbs-=2199.20, Mpg+=30.00}
| Volume > 98 ,0.64 ,( 12), {Acc+=15.58, Lbs-=2414.25, Mpg+=26.67}
Clndrs > 5 ,0.72 ,( 24), {Acc+=14.52, Lbs-=3629.83, Mpg+=18.33}
| origin != 1 ,0.63 ,( 3), {Acc+=14.93, Lbs-=3000.00, Mpg+=26.67}
| origin == 1 ,0.73 ,( 21), {Acc+=14.46, Lbs-=3719.81, Mpg+=17.14}
...
Key exports (all from ezr.py):
- Data:
Data,Num,Sym,Col,Cols,adds,add,sub,clone,mid,spread,mode,entropy,norm - Distance:
distx,disty,nearest,minkowski,aha,wins - Bayes:
like,likes - Comparison:
pick,picks,extrapolate - Format / IO:
csv,o,table,nest,thing,the - Stats:
same,bestRanks,confused - Tree:
Tree,treeGrow,treeCuts,treeSplit,treeLeaf,treeNodes,treeShow,treePlan - Cluster:
kmeans,kpp,half,rhalf,neighbors - Classify:
classify - Search:
oneplus1,sa,ls,de,oracleNearest,last - Acquire:
acquire,warm_start,rebalance,acquireWithBayes,acquireWithCentroid - Textmine:
tmPrepare,tmTokenize,tmNostop,tmStem,tmTfidf,tmData,cnb,cnbLike,cnbLikes,tmRandom,tmActive
ezr/
ezr.py Library (all algorithms, section-banner organized)
cli.py Dispatcher + eg_* demos + eg_test_* tests
pyproject.toml Package config (ezr binary, version, deps)
README.md This file
CHANGELOG.md Release notes
LICENSE.md MIT
resources/ Text-mining stop-words + suffix lists
etc/ Build helpers, docs scaffolding (non-runtime)
Tim Menzies timm@ieee.org, 2026. MIT License.
- Repository: http://github.com/timm/ezr
- Sample data: http://github.com/timm/moot