Pima Indians Diabetes Prediction – Azure Machine Learning

Architecture Overview

Summary

This project uses the Pima Indians Diabetes Dataset to train and deploy a logistic regression model that predicts the onset of diabetes.
Two models were trained and compared:

Logistic Regression (LR)
Support Vector Machine (SVM)

The logistic regression model, having better performance, was deployed using:

Azure Container Instances (ACI) for quick testing
Azure ML Online Endpoints (v2) for production-grade deployment

Project Structure

File	Description
`pimadiabetes.csv`	Raw dataset from Kaggle
`data_diabetesPima_clean.csv`	Cleaned dataset ready for training
`preprocessing.py`	Python script for data cleaning and preparation
`submit_preprocessing.ipynb`	Notebook to run preprocessing as a job on Azure ML
`train_and_log.py`	Training script with MLflow logging
`submit_train.ipynb`	Notebook to submit training job to Azure ML
`score.py`	Script used for model inference (used by deployment endpoint)
`requirements.txt`	Pip dependencies for ACI deployment
`environment.yaml`	Conda environment for Online Endpoint (SDK v2)
`endpoint.yaml`	YAML file to define the Online Endpoint v2
`deployment.yaml`	YAML file to deploy the model to the endpoint
`deploiement_aci_pima.ipynb`	Notebook to deploy the model using ACI
`test.json`	JSON input file for testing the deployed model

Requirements

Python ≥ 3.7
Azure CLI installed and configured (az login)
Azure ML CLI extension (az extension add -n ml)
Active Azure subscription with an ML workspace

Dataset

Download the dataset from Kaggle:
https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

Rename it to:

pimadiabetes.csv

Place it in the root directory of the project.

Pipeline Overview

1. Preprocessing

python preprocessing.py --input_path ./ --output_path ./

Or run on Azure ML:

submit_preprocessing.ipynb

Output:

data_diabetesPima_clean.csv
correlation_matrix.csv

2. Model Training

python train_and_log.py

Or run on Azure ML:

submit_train.ipynb

Output:

metrics_summary.csv
roc_combined.csv
learning_curves_combined.csv

Model Deployment

A. Deployment to ACI

Run on Azure ML::

deploiement_aci_pima.ipynb

B. Production Deployment via Online Endpoint (SDK v2)

az login
az account set --subscription <YOUR_SUBSCRIPTION_ID>
az ml workspace set --name <YOUR_WORKSPACE_NAME> --resource-group <YOUR_RESOURCE_GROUP>

# Create the endpoint
az ml online-endpoint create --file endpoint.yaml --workspace-name <YOUR_WORKSPACE_NAME> --resource-group <YOUR_RESOURCE_GROUP>

# Deploy the model to the endpoint
az ml online-deployment create --file deployment.yaml   --workspace-name <YOUR_WORKSPACE_NAME>   --resource-group <YOUR_RESOURCE_GROUP>   --endpoint-name pima-diabetes-endpoint   --all-traffic

Testing the Endpoint

az ml online-endpoint invoke   --name pima-diabetes-endpoint   --request-file test.json   --resource-group <YOUR_RESOURCE_GROUP>   --workspace-name <YOUR_WORKSPACE_NAME>

Example test.json file:

{
  "data": [
    [6, 148, 72, 35, 0, 33.6, 0.627, 50]
  ]
}

Visualization (Power BI)

Use the following CSV files:

data_diabetesPima_clean.csv
correlation_matrix.csv
metrics_summary.csv
roc_combined.csv
learning_curves_combined.csv

Power BI Dashboard Preview

Reproducibility Notice

To keep this repository lightweight:

Only the original dataset pimadiabetes.csv is included.
Intermediate files like:

data_diabetesPima_clean.csv
correlation_matrix.csv
metrics_summary.csv
roc_combined.csv
learning_curves_combined.csv

... are generated by executing the scripts provided.

Author

Edoh Kodji
edoh.kodji@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pima Indians Diabetes Prediction – Azure Machine Learning

Architecture Overview

Table of Contents

Summary

Project Structure

Requirements

Dataset

Pipeline Overview

1. Preprocessing

2. Model Training

Model Deployment

A. Deployment to ACI

B. Production Deployment via Online Endpoint (SDK v2)

Testing the Endpoint

Visualization (Power BI)

Power BI Dashboard Preview

Reproducibility Notice

Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deploiement_aci_pima.ipynb		deploiement_aci_pima.ipynb
deployment.yaml		deployment.yaml
endpoint.yaml		endpoint.yaml
environment.yaml		environment.yaml
pimadiabetes.csv		pimadiabetes.csv
pipeline_architecture.gif		pipeline_architecture.gif
powerbi_dashboard.PNG		powerbi_dashboard.PNG
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
score.py		score.py
submit_preprocessing.ipynb		submit_preprocessing.ipynb
submit_train.ipynb		submit_train.ipynb
test.txt		test.txt
train_and_log.py		train_and_log.py

Folders and files

Latest commit

History

Repository files navigation

Pima Indians Diabetes Prediction – Azure Machine Learning

Architecture Overview

Table of Contents

Summary

Project Structure

Requirements

Dataset

Pipeline Overview

1. Preprocessing

2. Model Training

Model Deployment

A. Deployment to ACI

B. Production Deployment via Online Endpoint (SDK v2)

Testing the Endpoint

Visualization (Power BI)

Power BI Dashboard Preview

Reproducibility Notice

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages