Skip to content

Commit 4923118

Browse files
committed
Merge branch 'development' of github.com:lstein/stable-diffusion into development
2 parents 16f6a67 + defafc0 commit 4923118

26 files changed

Lines changed: 926 additions & 1014 deletions

README.md

Lines changed: 53 additions & 810 deletions
Large diffs are not rendered by default.

CHANGELOG.md renamed to docs/CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,4 +134,4 @@
134134

135135
## Links
136136

137-
- **[Read Me](readme.md)**
137+
- **[Read Me](../readme.md)**

docs/CONTRIBUTORS.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Contributors
2+
3+
The list of all the amazing people who have contributed to the various features that you get to experience in this fork.
4+
5+
We thank them for all of their time and hard work.
6+
7+
_Original Author:_
8+
9+
- Lincoln D. Stein <lincoln.stein@gmail.com>
10+
11+
_Contributions by:_
12+
13+
- [Sean McLellan](https://github.com/Oceanswave)
14+
- [Kevin Gibbons](https://github.com/bakkot)
15+
- [Tesseract Cat](https://github.com/TesseractCat)
16+
- [blessedcoolant](https://github.com/blessedcoolant)
17+
- [David Ford](https://github.com/david-ford)
18+
- [yunsaki](https://github.com/yunsaki)
19+
- [James Reynolds](https://github.com/magnusviri)
20+
- [David Wager](https://github.com/maddavid123)
21+
- [Jason Toffaletti](https://github.com/toffaletti)
22+
- [tildebyte](https://github.com/tildebyte)
23+
- [Cragin Godley](https://github.com/cgodley)
24+
- [BlueAmulet](https://github.com/BlueAmulet)
25+
- [Benjamin Warner](https://github.com/warner-benjamin)
26+
- [Cora Johnson-Roberson](https://github.com/corajr)
27+
- [veprogames](https://github.com/veprogames)
28+
- [JigenD](https://github.com/JigenD)
29+
- [Niek van der Maas](https://github.com/Niek)
30+
- [Henry van Megen](https://github.com/hvanmegen)
31+
- [Håvard Gulldahl](https://github.com/havardgulldahl)
32+
- [greentext2](https://github.com/greentext2)
33+
- [Simon Vans-Colina](https://github.com/simonvc)
34+
- [Gabriel Rotbart](https://github.com/gabrielrotbart)
35+
- [Eric Khun](https://github.com/erickhun)
36+
- [Brent Ozar](https://github.com/BrentOzar)
37+
- [nderscore](https://github.com/nderscore)
38+
- [Mikhail Tishin](https://github.com/tishin)
39+
- [Tom Elovi Spruce](https://github.com/ilovecomputers)
40+
- [spezialspezial](https://github.com/spezialspezial)
41+
- [Yosuke Shinya](https://github.com/shinya7y)
42+
- [Andy Pilate](https://github.com/Cubox)
43+
- [Muhammad Usama](https://github.com/SMUsamaShah)
44+
- [Arturo Mendivil](https://github.com/artmen1516)
45+
- [Paul Sajna](https://github.com/sajattack)
46+
- [Samuel Husso](https://github.com/shusso)
47+
- [nicolai256](https://github.com/nicolai256)
48+
49+
_Original CompVis Authors:_
50+
51+
- [Robin Rombach](https://github.com/rromb)
52+
- [Patrick von Platen](https://github.com/patrickvonplaten)
53+
- [ablattmann](https://github.com/ablattmann)
54+
- [Patrick Esser](https://github.com/pesser)
55+
- [owenvincent](https://github.com/owenvincent)
56+
- [apolinario](https://github.com/apolinario)
57+
- [Charles Packer](https://github.com/cpacker)
58+
59+
---
60+
61+
_If you have contributed and don't see your name on the list of contributors, please let one of the collaborators know about the omission, or feel free to make a pull request._
Lines changed: 41 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# Original README from CompViz/stable-diffusion
2-
*Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:*
2+
3+
_Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:_
34

45
[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)<br/>
56
[Robin Rombach](https://github.com/rromb)\*,
@@ -12,16 +13,15 @@
1213

1314
which is available on [GitHub](https://github.com/CompVis/latent-diffusion). PDF at [arXiv](https://arxiv.org/abs/2112.10752). Please also visit our [Project page](https://ommer-lab.com/research/latent-diffusion-models/).
1415

15-
![txt2img-stable2](assets/stable-samples/txt2img/merged-0006.png)
16+
![txt2img-stable2](../assets/stable-samples/txt2img/merged-0006.png)
1617
[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
1718
model.
18-
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
19-
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),
19+
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
20+
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),
2021
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
2122
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
2223
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).
2324

24-
2525
## Requirements
2626

2727
A suitable [conda](https://conda.io/) environment named `ldm` can be created
@@ -44,16 +44,16 @@ pip install -e .
4444

4545
Stable Diffusion v1 refers to a specific configuration of the model
4646
architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet
47-
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and
47+
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and
4848
then finetuned on 512x512 images.
4949

50-
*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present
51-
in its training data.
50+
\*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present
51+
in its training data.
5252
Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion).
5353
Research into the safe deployment of general text-to-image models is an ongoing effort. To prevent misuse and harm, we currently provide access to the checkpoints only for [academic research purposes upon request](https://stability.ai/academia-access-form).
54-
**This is an experiment in safe and community-driven publication of a capable and general text-to-image model. We are working on a public release with a more permissive license that also incorporates ethical considerations.***
54+
**This is an experiment in safe and community-driven publication of a capable and general text-to-image model. We are working on a public release with a more permissive license that also incorporates ethical considerations.\***
5555

56-
[Request access to Stable Diffusion v1 checkpoints for academic research](https://stability.ai/academia-access-form)
56+
[Request access to Stable Diffusion v1 checkpoints for academic research](https://stability.ai/academia-access-form)
5757

5858
### Weights
5959

@@ -64,36 +64,37 @@ which were trained as follows,
6464
194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
6565
- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
6666
515k steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
67-
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
67+
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
6868
- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-improved-aesthetics" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
6969

7070
Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
7171
5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
7272
steps show the relative improvements of the checkpoints:
73-
![sd evaluation results](assets/v1-variants-scores.jpg)
74-
75-
73+
![sd evaluation results](../assets/v1-variants-scores.jpg)
7674

7775
### Text-to-Image with Stable Diffusion
78-
![txt2img-stable2](assets/stable-samples/txt2img/merged-0005.png)
79-
![txt2img-stable2](assets/stable-samples/txt2img/merged-0007.png)
8076

81-
Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder.
77+
![txt2img-stable2](../assets/stable-samples/txt2img/merged-0005.png)
78+
![txt2img-stable2](../assets/stable-samples/txt2img/merged-0007.png)
8279

80+
Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder.
8381

8482
#### Sampling Script
8583

8684
After [obtaining the weights](#weights), link them
85+
8786
```
8887
mkdir -p models/ldm/stable-diffusion-v1/
89-
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
88+
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
9089
```
90+
9191
and sample with
92+
9293
```
93-
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
94+
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
9495
```
9596

96-
By default, this uses a guidance scale of `--scale 7.5`, [Katherine Crowson's implementation](https://github.com/CompVis/latent-diffusion/pull/51) of the [PLMS](https://arxiv.org/abs/2202.09778) sampler,
97+
By default, this uses a guidance scale of `--scale 7.5`, [Katherine Crowson's implementation](https://github.com/CompVis/latent-diffusion/pull/51) of the [PLMS](https://arxiv.org/abs/2202.09778) sampler,
9798
and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type `python scripts/txt2img.py --help`).
9899

99100
```commandline
@@ -131,73 +132,72 @@ optional arguments:
131132
evaluate at this precision
132133
133134
```
134-
Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints.
135+
136+
Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints.
135137
For this reason `use_ema=False` is set in the configuration, otherwise the code will try to switch from
136138
non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints
137139
which contain both types of weights. For these, `use_ema=False` will load and use the non-EMA weights.
138140

139-
140141
#### Diffusers Integration
141142

142143
Another way to download and sample Stable Diffusion is by using the [diffusers library](https://github.com/huggingface/diffusers/tree/main#new--stable-diffusion-is-now-fully-compatible-with-diffusers)
144+
143145
```py
144146
# make sure you're logged in with `huggingface-cli login`
145147
from torch import autocast
146148
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
147149

148150
pipe = StableDiffusionPipeline.from_pretrained(
149-
"CompVis/stable-diffusion-v1-3-diffusers",
151+
"CompVis/stable-diffusion-v1-3-diffusers",
150152
use_auth_token=True
151153
)
152154

153155
prompt = "a photo of an astronaut riding a horse on mars"
154156
with autocast("cuda"):
155-
image = pipe(prompt)["sample"][0]
156-
157+
image = pipe(prompt)["sample"][0]
158+
157159
image.save("astronaut_rides_horse.png")
158160
```
159161

160-
161-
162162
### Image Modification with Stable Diffusion
163163

164-
By using a diffusion-denoising mechanism as first proposed by [SDEdit](https://arxiv.org/abs/2108.01073), the model can be used for different
165-
tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script,
166-
we provide a script to perform image modification with Stable Diffusion.
164+
By using a diffusion-denoising mechanism as first proposed by [SDEdit](https://arxiv.org/abs/2108.01073), the model can be used for different
165+
tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script,
166+
we provide a script to perform image modification with Stable Diffusion.
167167

168168
The following describes an example where a rough sketch made in [Pinta](https://www.pinta-project.com/) is converted into a detailed artwork.
169+
169170
```
170171
python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8
171172
```
172-
Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image.
173+
174+
Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image.
173175
Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.
174176

175177
**Input**
176178

177-
![sketch-in](assets/stable-samples/img2img/sketch-mountains-input.jpg)
179+
![sketch-in](../assets/stable-samples/img2img/sketch-mountains-input.jpg)
178180

179181
**Outputs**
180182

181-
![out3](assets/stable-samples/img2img/mountains-3.png)
182-
![out2](assets/stable-samples/img2img/mountains-2.png)
183+
![out3](../assets/stable-samples/img2img/mountains-3.png)
184+
![out2](../assets/stable-samples/img2img/mountains-2.png)
183185

184186
This procedure can, for example, also be used to upscale samples from the base model.
185187

186-
187-
## Comments
188+
## Comments
188189

189190
- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
190-
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
191-
Thanks for open-sourcing!
192-
193-
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
191+
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
192+
Thanks for open-sourcing!
194193

194+
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
195195

196196
## BibTeX
197197

198198
```
199199
@misc{rombach2021highresolution,
200-
title={High-Resolution Image Synthesis with Latent Diffusion Models},
200+
title={High-Resolution Image Synthesis with Latent Diffusion Models},
201201
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
202202
year={2021},
203203
eprint={2112.10752},
@@ -206,5 +206,3 @@ Thanks for open-sourcing!
206206
}
207207
208208
```
209-
210-
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)