This is unofficial implementation InstantID for StableDiffusion 1.5.
SD15 has a lot of finetuned models. So you can use all of this models with combination of instantid components to get awesome results.
Official InstantID works only with SDXL and contains code only for inference.
But this repository contains Training and Inference code.
Training process was used only 10M images from LAION-FACE 50M dataset (Original InstantID used 50M Laion-face + 10M custom images).
Feel free to adapt it for your personal purposes. I will be glad if somebody find it usefull.
Examples with epiCPhotoGasm model + styles from original InstantID.
Examples with Disney Pixar Cartoon Type A model + styles from original InstantID.InstantID SD1.5 components are not compatible with InstantID SDXL. In this work model has been trained with additional facial keypoints information.
Keypoints visualization:
Links:
- Code training/inference (gradio, jupyter notebooks, .py files)
- Checkpoints (Controlnet, resampler, ip-adapter)
- Insightface models for keypoints (antilopev2 and others)
Clone this repo and install requirements.
git clone https://github.com/TheDenk/InstantID-SD1.5.git
cd InstantID-SD1.5
pip install -r requirements.txt- clone StableDiffusion1.5 into
modelsdir:
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 ./models/stable-diffusion-v1-5
- clone Instantid-SD1.5 models from HuggingFace.
git clone https://huggingface.co/TheDenk/InstantID-SD1.5 ./models/instantid-components
- download antelopev2 archive from this post move to the models directory and unzip.
The folder tree should be like:
.
├── models
│ ├── stable-diffusion-v1-5/*
│ ├── antelopev2/*.onnx
│ ├── instantid-components/*.ckpt
│ └── additional-unets/*.safetensors (optional)
├── instantid
├── gradio
├── inference.py
├── inference.ipynb
└── README.md
CUDA_VISIBLE_DEVICES="0" python3 inference.py \
--image_path=examples/faces/rock.jpg \
--prompt="the professional high quality photo of the man, high quality, best quality, masterpeace" \
--style="Film Noir" \
--height=640 \
--width=768 \
--num_inference_steps=25 \
--guidance_scale=8.0 \
--num_images_per_prompt=4CUDA_VISIBLE_DEVICES="0" python3 inference.py --pretrained_model_path=models/stable-diffusion-v1-5 \
--adapter_ckpt_path=models/instantid-components/ip-state.ckpt \
--image_proj_ckpt_path=models/instantid-components/image_proj.ckpt \
--controlnet_ckpt_path=models/instantid-components/controlnet.ckpt \
--additional_unet_path=models/additional-unets/epicphotogasm_lastUnicorn.safetensors \
--image_path=examples/faces/rock.jpg \
--prompt="the professional high quality photo of the man, best quality, masterpeace" \
--style="Film Noir" \
--height=640 \
--width=768 \
--num_inference_steps=25 \
--guidance_scale=8.0 \
--num_images_per_prompt=4CUDA_VISIBLE_DEVICES="0" python3 gradio/app.py CUDA_VISIBLE_DEVICES="0" python3 gradio/app.py --pretrained_model_path=models/stable-diffusion-v1-5 \
--adapter_ckpt_path=models/instantid-components/ip-state.ckpt \
--image_proj_ckpt_path=models/instantid-components/image_proj.ckpt \
--controlnet_ckpt_path=models/instantid-components/controlnet.ckpt \
--additional_unet_path=models/additional-unets/epicphotogasm_lastUnicorn.safetensorsOr use code in jupyter-notebook (inference.ipynb file).
All models have been trained 780K steps on 3 GPU A6000 with batch_size=20, resolution=512, lr=1e-5 and using only 10M images from LAION-FACE dataset.
1 Dowloand data from LAION-FACE and prepare images using official instruction.
2 Filter dataset with train/process_laion_dataset.py script. It is using multiprocessing to increase processing speed. Example:
CUDA_VISIBLE_DEVICES="0" python3 process_laion_dataset.py \
--data_root={DATASET_ROOT} \
--split_name=split_00000 \
--n_jobs=4Replace {DATASET_ROOT} with your own path to LAION-Face dataset. For example ../LAION-Face.
It creates four directories in your {DATASET_ROOT}: extracted_images, extracted_keypoints, embeddings, csv.
extracted_imagescontains filtered and resized *.jpg images.extracted_keypointscontains *.jpg images with facial landmarks.embeddingscontains *.pt files with extracted facial embeddings, landmarks, boxes and some other information.csvcontains *.csv files with filtered images paths and textual descriptions.
The folder tree should be like:
.
└──{DATASET_ROOT}
├── extracted_images/*.jpg
├── extracted_keypoints/*.jpg
├── embeddings/*.pt
└── csv/*.csv
This script also filter data and skip images which contains too small faces and small images.
You can regulate it with min_h, min_w, min_head_coef parameters. Default min_head_coef=0.3, min_h=512 and min_w=512.
CUDA_VISIBLE_DEVICES="0" accelerate launch train.py \
--dataset_root="{DATASET_ROOT}" \
--pretrained_model_name_or_path="./models/stable-diffusion-v1-5" \
--output_dir="./output/instant_training" \
--resolution=512 \
--learning_rate=1e-5 \
--validation_prompt "the professional photo of a beautifull girl, high resolution, awesome detailed, 4k, 8k" "beautifull redhead girl, high resolution, awesome detailed, 4k, 8k" \
--validation_negative_prompt "lowres, worst quality, low quality" "lowres, worst quality, low quality" \
--validation_image "./examples/valid/valid_keypoints.png" \
--valid_embeddings "./examples/valid/valid_embeddings.pt" \
--train_batch_size=20 \
--dataloader_num_workers=32 \
--validation_steps=2500 \
--num_validation_images=4 \
--num_train_epochs=1 \
--checkpointing_steps=5000 \
--mixed_precision=bf16The validation image was taken from the LAION-Face dataset (just random image with extracted data).
Using only models without special style prompts.
Examples with Aniflatmix model + styles from original InstantID.- InstantID and InstantX Team.
- IP-Adapter and ControlNet.






