DeFi Credit Scoring

A credit scoring system for Aave V2 wallets. It looks at how a wallet has behaved on-chain - deposits, borrows, repayments, liquidations - and assigns a score from 0 to 1000. Higher means more trustworthy, lower means the wallet has shown patterns you would not want to lend to.

How it works

The score has two parts:

1. Rule-based score (primary)

This is the main signal and it is fully transparent. Every point added or subtracted has a clear reason:

Base: 500
Repayment behavior: -200 to +200 (the most important factor)
Liquidation history: 0 to -300
Account age: 0 to +100
Transaction activity: 0 to +50
Action diversity: 0 to +50

If a wallet borrowed and paid back more than it borrowed, that is good. If it got liquidated multiple times and never repaid anything, that tanks the score. Simple logic that you can actually defend.

2. Anomaly detection (secondary)

An IsolationForest runs on the feature matrix and adjusts the score by up to +/-50 points depending on how unusual the wallet looks compared to others. This catches edge cases the rules do not fully cover. It does not drive the score, just nudges it.

Scores at a glance

Score	Label
800 - 1000	Low Risk
600 - 799	Medium Risk
400 - 599	High Risk
0 - 399	Very High Risk

Project structure

src/
  data_fetcher.py      #fetch real Aave V2 data or generate synthetic
  data_processor.py    #clean and standardize raw transactions
  feature_engineer.py  #compute per-wallet features
  model_trainer.py     #train the anomaly detector
  credit_scorer.py     #rule-based scoring + anomaly adjustment
  main.py              #entry point, ties everything together

tests/
  test_feature_engineer.py
  test_credit_scorer.py

requirements.txt

Getting started

git clone https://github.com/areychana/defi-credit-scoring.git
cd defi-credit-scoring
pip install -r requirements.txt

Run it with synthetic data (default, no setup needed):

python src/main.py

Run with your own transaction file:

python src/main.py --input path/to/transactions.json

Run with live Aave V2 data from The Graph:

python src/main.py --graph-api-key YOUR_KEY

Skip the anomaly detector and use rule-based scoring only:

python src/main.py --no-anomaly

Force retrain the anomaly detector:

python src/main.py --retrain

Input format

The system expects a JSON file with a list of transactions. Each transaction needs:

[
  {
    "user": "0xabc123...",
    "action": "deposit",
    "amount": "1000.0",
    "timestamp": "2023-06-01T10:00:00Z",
    "token": "USDC"
  }
]

Supported action types: deposit, borrow, repay, withdraw, liquidationcall, flashloan

If you do not have real data, the system will generate realistic synthetic wallets automatically. These cover a range of behavior types so the output is actually meaningful.

Output

Results are saved to results/credit_scores.csv by default. Columns:

credit_score - final score (0-1000)
rule_based_score - the rule-based portion before anomaly adjustment
anomaly_adjustment - how many points the anomaly detector added or removed
risk_category - Low / Medium / High / Very High Risk
total_transactions, account_age_days, liquidation_count, repay_to_borrow_ratio - key factors for the score

A summary JSON is also written to results/score_analysis.json.

Data source

By default, the system tries to fetch from the Aave V2 subgraph on The Graph. If that fails (no API key, endpoint unavailable), it falls back to generating synthetic data.

To use live data, get an API key from The Graph and either pass it with --graph-api-key or set it as an environment variable:

export GRAPH_API_KEY=your_key_here
python src/main.py

Running the tests

python -m pytest tests/ -v

Why not just use ML

The earlier version of this project trained XGBoost, Random Forest, and LightGBM on pseudo-labels it generated from the same features. That is circular - the model was just learning to reproduce a formula. The R-squared scores looked good because you were measuring how well the model replicated its own inputs, not whether it was actually predicting creditworthiness.

The current approach uses the heuristics directly as the score, which is more honest. The ML (IsolationForest) is only used for anomaly detection, which is a legitimate unsupervised use case.

License

MIT License - see LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
analysis.md		analysis.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeFi Credit Scoring

How it works

Scores at a glance

Project structure

Getting started

Input format

Output

Data source

Running the tests

Why not just use ML

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeFi Credit Scoring

How it works

Scores at a glance

Project structure

Getting started

Input format

Output

Data source

Running the tests

Why not just use ML

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages