1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/sdxl1.0-txt2img"
# Request payload
data = {
"prompt": "cinematic film still, 4k, realistic, ((cinematic photo:1.3)) of panda wearing a blue spacesuit, sitting in a bar, Fujifilm XT3, long shot, ((low light:1.4)), ((looking straight at the camera:1.3)), upper body shot, somber, shallow depth of field, vignette, highly detailed, high budget Hollywood movie, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
"negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
"style": "base",
"samples": 1,
"scheduler": "UniPC",
"num_inference_steps": 25,
"guidance_scale": 8,
"strength": 0.2,
"high_noise_fraction": 0.8,
"seed": 468685,
"img_width": 896,
"img_height": 1152,
"refiner": True,
"base64": False
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Prompt to render
Prompts to exclude, eg. 'bad anatomy, bad hands, missing fingers'
Styles for Stable Diffusion.
Allowed values:
Number of samples to generate.
min : 1,
max : 4
Type of scheduler.
Allowed values:
Number of denoising steps.
min : 20,
max : 100
Scale for classifier-free guidance
min : 1,
max : 25
How much to transform the reference image
min : 0.1,
max : 1
Number of inference steps to be run on each expert
min : 0,
max : 1
Seed for image generation.
min : -1,
max : 999999999999999
Image width can be between 512 and 2048 in multiples of 8
Image height can be between 512 and 2048 in multiples of 8
If yes, improves the quality of the output. Note: Does not work when high noise fraction is 1.
Base64 encoding of the output image.
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Stable Diffusion SDXL 1.0, a product of Stability AI, is a groundbreaking development in the realm of image generation. It's a quantum leap from its predecessor, Stable Diffusion 1.5 and 2.1, boasting superior advancements in image and facial composition. This revolutionary tool leverages a latent diffusion model for text-to-image synthesis, rendering it an essential asset in the visual arts landscape in 2023. The real magic lies in its ability to create descriptive images from succinct prompts and generate words within images, setting a new standard for AI-generated visuals.
In terms of its technical architecture, SDXL deploys a larger UNet backbone, housing more attention blocks and an extended cross-attention context, thanks to its second text encoder. SDXL operates a mixture-of-experts pipeline for latent diffusion. It first uses the base model to generate noisy latents, which are then refined for the final denoising steps. SDXL also employs a two-stage pipeline with a high-resolution model, applying a technique called SDEdit, or "img2img", to the latents generated from the base model, a process that enhances the quality of the output image but may take a bit more time. It was trained at 1024x1024 resolution images vs. 512x512 for SD 1.5 version.
It outperforms its predecessors and stands tall among current state-of-the-art image generators. The model exhibits significant improvements in visual fidelity, rendering stunning visuals and realistic aesthetics. The introduction of a refinement model has been a game-changer, improving the quality of the output generated by SDXL. The training on multiple aspect ratios contributes to the versatility of SDXL, making it a preferred tool in diverse visual settings.
Art and Design: Create stunning visuals and graphics for digital media.
Marketing and Advertising: Generate attention-grabbing imagery for campaigns.
Entertainment and Gaming: Develop detailed graphics for video games and interactive content.
Education: Simplify complex concepts with easy-to-understand visuals.
Research: Visualize data and research findings for better comprehension.
As for licensing, the Stable Diffusion SDXL 1.0 operates under the OpenRail++ license. While not traditionally classified as open source, this license is comprehensive and accommodative for a wide variety of uses. It allows for the distribution, sublicensing, and commercial utilization of the model, thereby promoting its widespread adoption. This makes it a versatile tool, encouraging innovation while upholding the rights of the creators.