Stable Diffusion XL 1.0

The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software

API

If you're looking for an API, you can choose from your desired programming language.

POST

import requests
import base64

# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
    with open(image_path, 'rb') as f:
        image_data = f.read()
    return base64.b64encode(image_data).decode('utf-8')

# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
    response = requests.get(image_url)
    image_data = response.content
    return base64.b64encode(image_data).decode('utf-8')

api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/sdxl1.0-txt2img"

# Request payload
data = {
  "prompt": "cinematic film still, 4k, realistic, ((cinematic photo:1.3)) of panda wearing a blue spacesuit, sitting in a bar, Fujifilm XT3, long shot, ((low light:1.4)), ((looking straight at the camera:1.3)), upper body shot, somber, shallow depth of field, vignette, highly detailed, high budget Hollywood movie, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
  "negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
  "style": "base",
  "samples": 1,
  "scheduler": "UniPC",
  "num_inference_steps": 25,
  "guidance_scale": 8,
  "strength": 0.2,
  "high_noise_fraction": 0.8,
  "seed": 468685,
  "img_width": 896,
  "img_height": 1152,
  "refiner": True,
  "base64": False
}

headers = {'x-api-key': api_key}

response = requests.post(url, json=data, headers=headers)
print(response.content)  # The response is the generated image

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

promptstr *

Prompt to render

negative_promptstr ( default: None )

Prompts to exclude, eg. 'bad anatomy, bad hands, missing fingers'

styleenum:str ( default: base )

Styles for Stable Diffusion.

Allowed values:

samplesint ( default: 1 ) Affects Pricing

Number of samples to generate.

min : 1,

max : 4

schedulerenum:str ( default: UniPC )

Type of scheduler.

Allowed values:

num_inference_stepsint ( default: 25 ) Affects Pricing

Number of denoising steps.

min : 20,

max : 100

guidance_scalefloat ( default: 7.5 )

Scale for classifier-free guidance

min : 1,

max : 25

strengthfloat ( default: 0.2 )

How much to transform the reference image

min : 0.1,

max : 1

high_noise_fractionfloat ( default: 0.8 )

Number of inference steps to be run on each expert

min : 0,

max : 1

seedint ( default: -1 )

Seed for image generation.

min : -1,

max : 999999999999999

img_widthint ( default: 1024 ) Affects Pricing

Image width can be between 512 and 2048 in multiples of 8

img_heightint ( default: 1024 ) Affects Pricing

Image height can be between 512 and 2048 in multiples of 8

refinerboolean ( default: true )

If yes, improves the quality of the output. Note: Does not work when high noise fraction is 1.

base64boolean ( default: 1 )

Base64 encoding of the output image.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Stable Diffusion SDXL 1.0

Stable Diffusion SDXL 1.0, a product of Stability AI, is a groundbreaking development in the realm of image generation. It's a quantum leap from its predecessor, Stable Diffusion 1.5 and 2.1, boasting superior advancements in image and facial composition. This revolutionary tool leverages a latent diffusion model for text-to-image synthesis, rendering it an essential asset in the visual arts landscape in 2023. The real magic lies in its ability to create descriptive images from succinct prompts and generate words within images, setting a new standard for AI-generated visuals.

In terms of its technical architecture, SDXL deploys a larger UNet backbone, housing more attention blocks and an extended cross-attention context, thanks to its second text encoder. SDXL operates a mixture-of-experts pipeline for latent diffusion. It first uses the base model to generate noisy latents, which are then refined for the final denoising steps. SDXL also employs a two-stage pipeline with a high-resolution model, applying a technique called SDEdit, or "img2img", to the latents generated from the base model, a process that enhances the quality of the output image but may take a bit more time. It was trained at 1024x1024 resolution images vs. 512x512 for SD 1.5 version.

It outperforms its predecessors and stands tall among current state-of-the-art image generators. The model exhibits significant improvements in visual fidelity, rendering stunning visuals and realistic aesthetics. The introduction of a refinement model has been a game-changer, improving the quality of the output generated by SDXL. The training on multiple aspect ratios contributes to the versatility of SDXL, making it a preferred tool in diverse visual settings.

Stable Diffusion SDXL 1.0 use cases

Art and Design: Create stunning visuals and graphics for digital media.
Marketing and Advertising: Generate attention-grabbing imagery for campaigns.
Entertainment and Gaming: Develop detailed graphics for video games and interactive content.
Education: Simplify complex concepts with easy-to-understand visuals.
Research: Visualize data and research findings for better comprehension.

Stable Diffusion SDXL 1.0 license

As for licensing, the Stable Diffusion SDXL 1.0 operates under the OpenRail++ license. While not traditionally classified as open source, this license is comprehensive and accommodative for a wide variety of uses. It allows for the distribution, sublicensing, and commercial utilization of the model, thereby promoting its widespread adoption. This makes it a versatile tool, encouraging innovation while upholding the rights of the creators.

Other Popular Models

fooocus

Fooocus enables high-quality image generation effortlessly, combining the best of Stable Diffusion and Midjourney.

sdxl-inpaint

This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

sdxl1.0-txt2img