Automated data drift detection for machine learning models using Evidently AI and Valohai — with conditional retraining when drift is detected.
- What This Project Does
- What is Data Drift?
- Features
- Tech Stack
- Training Pipeline
- Drift Detection Pipeline
- Project Flow
- Getting Started
- Running on Valohai
- Secrets & Environment Variables
- Author & Contact
This repository shows how to detect data drift in ML pipelines with Evidently AI on Valohai. It includes:
- Data preprocessing and model training (scikit-learn)
- Drift monitoring with Evidently AI reports (JSON, HTML)
- Conditional retraining when drift is detected (with optional human approval)
Use it as a reference for ML monitoring, model reliability, and production ML pipelines.
Data drift is when the distribution of input data (or the input–output relationship) changes over time. It can degrade model accuracy in production. Monitoring and detecting drift helps you decide when to retrain and keep models reliable.
- Evidently AI integration for data drift reports
- Valohai pipelines: training and inference + drift detection
- Conditional retraining triggered by drift (with approval step)
- California Housing dataset example; works with your own data
- Reports in JSON and HTML for analysis and dashboards
| Component | Technology |
|---|---|
| Drift detection | Evidently AI |
| Orchestration | Valohai |
| ML framework | scikit-learn |
| Data | pandas |
| Language | Python 3.9 |
Preprocesses data and trains the model.
-
Data preprocessing
- Load dataset from Valohai inputs or fetch California Housing if not provided.
- Preprocess and save with a Valohai alias.
-
Model training
- Load preprocessed data, train with scikit-learn, save the model with a Valohai alias.
Runs inference and drift analysis with Evidently AI.
-
Inference and drift detection
- Load reference data, current data, and trained model.
- Run inference on current data.
- Generate Evidently AI drift reports (e.g. Data Drift preset).
- Save reports (JSON, HTML).
-
Conditional retraining
- Evaluate drift from reports.
- If drift is detected: update status and trigger retraining (with approval).
- If no drift: stop the pipeline.
- Preprocess and store data.
- Train and evaluate the model.
- Run inference on new data and detect drift with Evidently.
- If drift is detected → trigger retraining (with human approval).
- If no drift → stop the pipeline.
- Clone the repo and follow Running on Valohai.
- Ensure you have a Valohai account and Evidently is used as in the code (installed via
valohai.yaml). - For secrets (e.g. Valohai API token), see Secrets & Environment Variables.
- Install Valohai CLI:
pip install valohai-cli
- Log in:
vh login
- Create and enter a project directory, then create a Valohai project:
mkdir valohai-evidently-example cd valohai-evidently-example vh project create - Clone this repository into that directory:
git clone https://github.com/KuchikiRenji/evidently-drift-detection.git .
vh execution run <step-name> --adhocExample (preprocess):
vh execution run preprocess --adhocvh pipeline run <pipeline-name> --adhocExample (drift detection pipeline):
vh pipeline run inference-drift-detection-pipeline --adhocThe step call-retrain.py uses the Valohai API and needs a private token. Do not commit the token; use Valohai secrets instead.
In Valohai you can:
- Set environment variables when creating an execution (Create Execution → Environment Variables). They apply only to that execution.
- Set project environment variables (Project Settings → Environment Variables, mark as Secret). They apply to all executions in the project.
| Author | KuchikiRenji |
| KuchikiRenji@outlook.com | |
| GitHub | github.com/KuchikiRenji |
| Discord | kuchiki_renji |
This project demonstrates ML data drift detection and MLOps monitoring with Evidently AI and Valohai.


