Stable Diffusion 2.1

Stable Diffusion is a type of latent diffusion model that can generate images from text. It was created by a team of researchers and engineers from CompVis, Stability AI, and LAION. Stable Diffusion v2 is a specific version of the model architecture. It utilizes a downsampling-factor 8 autoencoder with an 865M UNet and OpenCLIP ViT-H/14 text encoder for the diffusion model. When using the SD 2-v model, it produces 768x768 px images. It uses the penultimate text embeddings from a CLIP ViT-H/14 text encoder to condition the generation process.


API

If you're looking for an API, you can choose from your desired programming language.

POST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 import requests import base64 # Use this function to convert an image file from the filesystem to base64 def image_file_to_base64(image_path): with open(image_path, 'rb') as f: image_data = f.read() return base64.b64encode(image_data).decode('utf-8') # Use this function to fetch an image from a URL and convert it to base64 def image_url_to_base64(image_url): response = requests.get(image_url) image_data = response.content return base64.b64encode(image_data).decode('utf-8') api_key = "YOUR_API_KEY" url = "https://api.segmind.com/v1/sd2.1-txt2img" # Request payload data = { "samples": 1, "prompt": "calico cat wearing a cosmonaut suit, 3d render, pixar style, 8k, high resolution", "negative_prompt": "None", "scheduler": "DDIM", "num_inference_steps": 25, "guidance_scale": 7.5, "strength": 1, "seed": 17123564234, "img_width": 512, "img_height": 512, "base64": False } headers = {'x-api-key': api_key} response = requests.post(url, json=data, headers=headers) print(response.content) # The response is the generated image
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


samplesint ( default: 1 ) Affects Pricing

Number of samples to generate.

min : 1,

max : 10


promptstr ( default: 1 )

Prompt to render


negative_promptstr ( default: None )

Prompts to exclude, eg. 'bad anatomy, bad hands, missing fingers'


schedulerenum:str ( default: 1 )

Type of scheduler.

Allowed values:


num_inference_stepsint ( default: 1 ) Affects Pricing

Number of denoising steps.

min : 10,

max : 40


guidance_scalefloat ( default: 1 )

Scale for classifier-free guidance

min : 1,

max : 15


strengthfloat ( default: 1 )

How much to transform the reference image


seedint ( default: 1 )

Seed for image generation.


img_widthint ( default: 1 ) Affects Pricing

Image resolution.

min : 512,

max : 1024


img_heightint ( default: 1 ) Affects Pricing

Image resolution.

min : 512,

max : 1024


base64boolean ( default: 1 )

Base64 encoding of the output image.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Stable Diffusion 2.1

Stable Diffusion 2.1 is a state-of-the-art machine learning model introduced in 2022, primarily designed for High-Resolution Image Synthesis with Latent Diffusion. The model excels in generating detailed and high-quality images conditioned on text descriptions. In addition to this, it can be effectively utilized for a variety of tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.

Stable Diffusion v2 denotes a particular setup of the model's structure that employs an autoencoder with a downsampling factor of 8, an 865M UNet, and an OpenCLIP ViT-H/14 text encoder for the diffusion process. This configuration of the Stable Diffusion v2 model is capable of generating output images with a resolution of 768x768 pixels. The model's architecture is divided into three main components: an encoder module, a u-net architecture, and an autoencoder. The encoder module is the first step in the process, where it interprets the input text and generates tokens for the downstream models. These tokens are a representation of the input text that the rest of the model can understand and use to generate the final image.

The tokens generated by the encoder module are then passed to the u-net architecture, which is where the diffusion process takes place. This process involves searching for an image in a latent space, a high-dimensional space where similar images are close together. The model starts with a noisy image and gradually refines it through 20 to 30 steps of the diffusion process, resulting in a sensible image. Finally, the tokens generated by the u-net are passed to an autoencoder, which generates the final image. The autoencoder takes the tokens as an array and uses them to generate the final image that you see on the screen. This multi-step process allows Stable Diffusion 2.1 to generate high-quality images from text descriptions, making it a powerful tool for a variety of applications.

Its ability to generate high-resolution images from text descriptions opens up a world of possibilities for content creation. The model's architecture, combining an encoder, a u-net, and an autoencoder, allows for a robust and versatile image generation process. Furthermore, the model's diffusion process ensures the generation of sensible and high-quality images, even from initially noisy inputs.

Applications and use caes of Stable Diffusion 2.1

Stable Diffusion 2.1's ability to generate high-quality images from text descriptions has a wide range of applications across various industries. Here are some of the key use cases:

  1. Character Creation for Visual Effects: In the film and gaming industries, Stable Diffusion 2.1 can be used to create detailed and unique characters based on text descriptions. This can significantly speed up the character design process and allow for the creation of characters that might be difficult to design manually.

  2. E-Commerce Product Imaging: For e-commerce, Stable Diffusion 2.1 can generate different views of a product based on a few images and text descriptions. This can eliminate the need for costly and time-consuming product photo shoots.

  3. Image Editing: Stable Diffusion 2.1 can be used to edit images in a variety of ways, such as changing the color of objects, adding or removing elements, or changing the background. This can be particularly useful for photo editing apps or services.

  4. Fashion: In the fashion industry, Stable Diffusion 2.1 can be used to generate images of clothing items in different colors or styles, or to show what a person would look like wearing a particular item of clothing. This can provide a more interactive and personalized shopping experience for customers.

  5. Gaming Asset Creation: Stable Diffusion 2.1 can be used to generate assets for video games, such as characters, environments, or items. This can significantly speed up the game development process and allow for the creation of unique and detailed game assets.

  6. Web Design: For web design, Stable Diffusion 2.1 can generate images for website layouts or themes based on text descriptions. This can make the web design process more efficient and allow for the creation of unique and personalized website designs.

Stable Diffusion 2.1 license

The license for the Stable Diffusion 2.1 model, known as the "CreativeML Open RAIL-M" license, is designed to promote both open and responsible use of the model. You may add your own copyright statement to your modifications and provide additional or different license terms for your modifications. You are accountable for the output you generate using the model, and no use of the output can contravene any provision as stated in the license.