Stable Diffusion - Model Versions



The Stability Diffusion model has undergone significant improvement since its release, with each version building up the lessons from the previous version. This chapter compares the functionality between the versions of stable diffusion.

Stable Diffusion 1.x

The first generation of Stable Diffusion models, known as the 1.x series, includes 1.1, 1.2, 1.3, 1.4, and 1.5 versions. They are capable enough to generate a wide range of styles and require a limited amount of computational power and resources.

Stable Diffusion 2.x

The 2.x series includes 2.0 and 2.1. This series has been developed to create high-resolution images, along with the ability to interpret expressive and complex prompts.

Stable Diffusion XL 1.0

Stable Diffusion XL 1.0 is the most used open-source version that creates high-resolution images with improved color grading and composition. Also, this version can understand complex prompts and concepts.

Stable Diffusion XL Turbo (SDXL Turbo) is the extension of SDXL 1.0 that is developed for rapid generation of images in a single step.

Stable Diffusion 3

Stable Diffusion 3 is the latest version announced by Stability AI in March 2024, with improved performance in features like interpreting prompts, image quality and resolution, and spelling abilities. The model is still in its preview stage and still not available to the public.

Comparing Stable Diffusion Models

The following table summarizes the features and improvements across the versions of Stable Diffusion −

Features SD 1.5 SD 2.0 SD 2.1 SD XL 1.0
Release Date October 2022 November 2022 December 2022 July 2023
Resolution 512x512 768x768 768x768 1024x1024
Prompt Technology OpenAI's CLIP Vit-L/14 LAION's OpenCLIP-ViT/H LAION's OpenCLIP-ViT/H OpenCLIP-ViT/G and CLIP-ViT/L
Strength Beginner friendly, better performance on landscape and architectural subjects Improved handling and interpretation of complex prompts, better image resolution Improved conceptual understanding, better color grading, and image quality Better portraits, high resolution and image quality, shorted prompts
Limitations Poor prompt interpretation More restrictive in generations, NSFW filtering More "censored," especially with generating celebrities and art styles. Requires computational resources to run locally
Advertisements