A credit scoring system for Aave V2 wallets. It looks at how a wallet has behaved on-chain - deposits, borrows, repayments, liquidations - and assigns a score from 0 to 1000. Higher means more trustworthy, lower means the wallet has shown patterns you would not want to lend to.
The score has two parts:
1. Rule-based score (primary)
This is the main signal and it is fully transparent. Every point added or subtracted has a clear reason:
- Base: 500
- Repayment behavior: -200 to +200 (the most important factor)
- Liquidation history: 0 to -300
- Account age: 0 to +100
- Transaction activity: 0 to +50
- Action diversity: 0 to +50
If a wallet borrowed and paid back more than it borrowed, that is good. If it got liquidated multiple times and never repaid anything, that tanks the score. Simple logic that you can actually defend.
2. Anomaly detection (secondary)
An IsolationForest runs on the feature matrix and adjusts the score by up to +/-50 points depending on how unusual the wallet looks compared to others. This catches edge cases the rules do not fully cover. It does not drive the score, just nudges it.
| Score | Label |
|---|---|
| 800 - 1000 | Low Risk |
| 600 - 799 | Medium Risk |
| 400 - 599 | High Risk |
| 0 - 399 | Very High Risk |
src/
data_fetcher.py #fetch real Aave V2 data or generate synthetic
data_processor.py #clean and standardize raw transactions
feature_engineer.py #compute per-wallet features
model_trainer.py #train the anomaly detector
credit_scorer.py #rule-based scoring + anomaly adjustment
main.py #entry point, ties everything together
tests/
test_feature_engineer.py
test_credit_scorer.py
requirements.txt
git clone https://github.com/areychana/defi-credit-scoring.git
cd defi-credit-scoring
pip install -r requirements.txtRun it with synthetic data (default, no setup needed):
python src/main.pyRun with your own transaction file:
python src/main.py --input path/to/transactions.jsonRun with live Aave V2 data from The Graph:
python src/main.py --graph-api-key YOUR_KEYSkip the anomaly detector and use rule-based scoring only:
python src/main.py --no-anomalyForce retrain the anomaly detector:
python src/main.py --retrainThe system expects a JSON file with a list of transactions. Each transaction needs:
[
{
"user": "0xabc123...",
"action": "deposit",
"amount": "1000.0",
"timestamp": "2023-06-01T10:00:00Z",
"token": "USDC"
}
]Supported action types: deposit, borrow, repay, withdraw, liquidationcall, flashloan
If you do not have real data, the system will generate realistic synthetic wallets automatically. These cover a range of behavior types so the output is actually meaningful.
Results are saved to results/credit_scores.csv by default. Columns:
credit_score- final score (0-1000)rule_based_score- the rule-based portion before anomaly adjustmentanomaly_adjustment- how many points the anomaly detector added or removedrisk_category- Low / Medium / High / Very High Risktotal_transactions,account_age_days,liquidation_count,repay_to_borrow_ratio- key factors for the score
A summary JSON is also written to results/score_analysis.json.
By default, the system tries to fetch from the Aave V2 subgraph on The Graph. If that fails (no API key, endpoint unavailable), it falls back to generating synthetic data.
To use live data, get an API key from The Graph and either pass it with --graph-api-key or set it as an environment variable:
export GRAPH_API_KEY=your_key_here
python src/main.pypython -m pytest tests/ -vThe earlier version of this project trained XGBoost, Random Forest, and LightGBM on pseudo-labels it generated from the same features. That is circular - the model was just learning to reproduce a formula. The R-squared scores looked good because you were measuring how well the model replicated its own inputs, not whether it was actually predicting creditworthiness.
The current approach uses the heuristics directly as the score, which is more honest. The ML (IsolationForest) is only used for anomaly detection, which is a legitimate unsupervised use case.
MIT License - see LICENSE file.