Skip to content

KuchikiRenji/evidently-drift-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Data Drift Detection with Evidently AI & Valohai | MLOps Monitoring

Automated data drift detection for machine learning models using Evidently AI and Valohai — with conditional retraining when drift is detected.


Table of Contents


What This Project Does

This repository shows how to detect data drift in ML pipelines with Evidently AI on Valohai. It includes:

  • Data preprocessing and model training (scikit-learn)
  • Drift monitoring with Evidently AI reports (JSON, HTML)
  • Conditional retraining when drift is detected (with optional human approval)

Use it as a reference for ML monitoring, model reliability, and production ML pipelines.


What is Data Drift?

Data drift is when the distribution of input data (or the input–output relationship) changes over time. It can degrade model accuracy in production. Monitoring and detecting drift helps you decide when to retrain and keep models reliable.


Features

  • Evidently AI integration for data drift reports
  • Valohai pipelines: training and inference + drift detection
  • Conditional retraining triggered by drift (with approval step)
  • California Housing dataset example; works with your own data
  • Reports in JSON and HTML for analysis and dashboards

Tech Stack

Component Technology
Drift detection Evidently AI
Orchestration Valohai
ML framework scikit-learn
Data pandas
Language Python 3.9

Training Pipeline

Preprocesses data and trains the model.

Steps

  1. Data preprocessing

    • Load dataset from Valohai inputs or fetch California Housing if not provided.
    • Preprocess and save with a Valohai alias.
  2. Model training

    • Load preprocessed data, train with scikit-learn, save the model with a Valohai alias.

Pipeline in Valohai

Training pipeline in Valohai


Drift Detection Pipeline

Runs inference and drift analysis with Evidently AI.

Steps

  1. Inference and drift detection

    • Load reference data, current data, and trained model.
    • Run inference on current data.
    • Generate Evidently AI drift reports (e.g. Data Drift preset).
    • Save reports (JSON, HTML).
  2. Conditional retraining

    • Evaluate drift from reports.
    • If drift is detected: update status and trigger retraining (with approval).
    • If no drift: stop the pipeline.

Pipeline in Valohai

Drift detection pipeline in Valohai


Project Flow

  1. Preprocess and store data.
  2. Train and evaluate the model.
  3. Run inference on new data and detect drift with Evidently.
  4. If drift is detected → trigger retraining (with human approval).
  5. If no drift → stop the pipeline.

Flow overview

Project flow: preprocessing, training, drift detection, conditional retraining


Getting Started


Running on Valohai

1. Configure the repository

  1. Install Valohai CLI:
    pip install valohai-cli
  2. Log in:
    vh login
  3. Create and enter a project directory, then create a Valohai project:
    mkdir valohai-evidently-example
    cd valohai-evidently-example
    vh project create
  4. Clone this repository into that directory:
    git clone https://github.com/KuchikiRenji/evidently-drift-detection.git .

2. Run executions (single steps)

vh execution run <step-name> --adhoc

Example (preprocess):

vh execution run preprocess --adhoc

3. Run pipelines (full flow)

vh pipeline run <pipeline-name> --adhoc

Example (drift detection pipeline):

vh pipeline run inference-drift-detection-pipeline --adhoc

Secrets & Environment Variables

The step call-retrain.py uses the Valohai API and needs a private token. Do not commit the token; use Valohai secrets instead.

In Valohai you can:

  • Set environment variables when creating an execution (Create Execution → Environment Variables). They apply only to that execution.
  • Set project environment variables (Project Settings → Environment Variables, mark as Secret). They apply to all executions in the project.

Author & Contact

Author KuchikiRenji
Email KuchikiRenji@outlook.com
GitHub github.com/KuchikiRenji
Discord kuchiki_renji

This project demonstrates ML data drift detection and MLOps monitoring with Evidently AI and Valohai.

About

Data drift detection for machine learning using Evidently AI and Valohai. MLOps pipeline: preprocessing, training, drift monitoring and conditional retraining. Python, scikit-learn, California Housing example.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages