Skip to content

areychana/defi-credit-scoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeFi Credit Scoring

A credit scoring system for Aave V2 wallets. It looks at how a wallet has behaved on-chain - deposits, borrows, repayments, liquidations - and assigns a score from 0 to 1000. Higher means more trustworthy, lower means the wallet has shown patterns you would not want to lend to.


How it works

The score has two parts:

1. Rule-based score (primary)

This is the main signal and it is fully transparent. Every point added or subtracted has a clear reason:

  • Base: 500
  • Repayment behavior: -200 to +200 (the most important factor)
  • Liquidation history: 0 to -300
  • Account age: 0 to +100
  • Transaction activity: 0 to +50
  • Action diversity: 0 to +50

If a wallet borrowed and paid back more than it borrowed, that is good. If it got liquidated multiple times and never repaid anything, that tanks the score. Simple logic that you can actually defend.

2. Anomaly detection (secondary)

An IsolationForest runs on the feature matrix and adjusts the score by up to +/-50 points depending on how unusual the wallet looks compared to others. This catches edge cases the rules do not fully cover. It does not drive the score, just nudges it.


Scores at a glance

Score Label
800 - 1000 Low Risk
600 - 799 Medium Risk
400 - 599 High Risk
0 - 399 Very High Risk

Project structure

src/
  data_fetcher.py      #fetch real Aave V2 data or generate synthetic
  data_processor.py    #clean and standardize raw transactions
  feature_engineer.py  #compute per-wallet features
  model_trainer.py     #train the anomaly detector
  credit_scorer.py     #rule-based scoring + anomaly adjustment
  main.py              #entry point, ties everything together

tests/
  test_feature_engineer.py
  test_credit_scorer.py

requirements.txt

Getting started

git clone https://github.com/areychana/defi-credit-scoring.git
cd defi-credit-scoring
pip install -r requirements.txt

Run it with synthetic data (default, no setup needed):

python src/main.py

Run with your own transaction file:

python src/main.py --input path/to/transactions.json

Run with live Aave V2 data from The Graph:

python src/main.py --graph-api-key YOUR_KEY

Skip the anomaly detector and use rule-based scoring only:

python src/main.py --no-anomaly

Force retrain the anomaly detector:

python src/main.py --retrain

Input format

The system expects a JSON file with a list of transactions. Each transaction needs:

[
  {
    "user": "0xabc123...",
    "action": "deposit",
    "amount": "1000.0",
    "timestamp": "2023-06-01T10:00:00Z",
    "token": "USDC"
  }
]

Supported action types: deposit, borrow, repay, withdraw, liquidationcall, flashloan

If you do not have real data, the system will generate realistic synthetic wallets automatically. These cover a range of behavior types so the output is actually meaningful.


Output

Results are saved to results/credit_scores.csv by default. Columns:

  • credit_score - final score (0-1000)
  • rule_based_score - the rule-based portion before anomaly adjustment
  • anomaly_adjustment - how many points the anomaly detector added or removed
  • risk_category - Low / Medium / High / Very High Risk
  • total_transactions, account_age_days, liquidation_count, repay_to_borrow_ratio - key factors for the score

A summary JSON is also written to results/score_analysis.json.


Data source

By default, the system tries to fetch from the Aave V2 subgraph on The Graph. If that fails (no API key, endpoint unavailable), it falls back to generating synthetic data.

To use live data, get an API key from The Graph and either pass it with --graph-api-key or set it as an environment variable:

export GRAPH_API_KEY=your_key_here
python src/main.py

Running the tests

python -m pytest tests/ -v

Why not just use ML

The earlier version of this project trained XGBoost, Random Forest, and LightGBM on pseudo-labels it generated from the same features. That is circular - the model was just learning to reproduce a formula. The R-squared scores looked good because you were measuring how well the model replicated its own inputs, not whether it was actually predicting creditworthiness.

The current approach uses the heuristics directly as the score, which is more honest. The ML (IsolationForest) is only used for anomaly detection, which is a legitimate unsupervised use case.


License

MIT License - see LICENSE file.

About

Credit scoring system for Aave V2 wallets using on-chain transaction behavior. Scores 0-1000 based on repayment history, liquidations and account activity.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages