Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512×512 images from a subset of the LAION-5B database. Similar to Google’s Imagen, this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.

In this tutorial I’ll go through everything to get you started with #stablediffusion from installation to finished image. We’ll talk about txt2img, img2img, prompting, sampling methods, inpainting, upscalers and more! Start using stable diffusion today and forget about midjourney and dalle2.

Author: Sebastian Kamph

SD Resources