Kandinsky 2.2

Kandinsky inherits best practicies from Dall-E 2 and Latent diffusion, while introducing some new ideas.

API

If you're looking for an API, you can choose from your desired programming language.

POST

import requests
import base64

# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
    with open(image_path, 'rb') as f:
        image_data = f.read()
    return base64.b64encode(image_data).decode('utf-8')

# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
    response = requests.get(image_url)
    image_data = response.content
    return base64.b64encode(image_data).decode('utf-8')

api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/kandinsky2.2-txt2img"

# Request payload
data = {
  "prompt": "masterpiece, best quality, portrait of an old man, 50mm, solo, natural skin texture, realistic eye and face details, dark, deep shadow, darkness, moonlight, award winning photo, extremely detailed, fine detail, highly detailed, extremely detailed eyes and face, piercing red eyes, detailed clothes, skinny, gothic, native american clothing, analog film, stock photograph,",
  "negative_prompt": "lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands",
  "samples": 1,
  "num_inference_steps": 25,
  "img_width": 512,
  "img_height": 768,
  "prior_steps": 25,
  "seed": 9863172,
  "base64": False
}

headers = {'x-api-key': api_key}

response = requests.post(url, json=data, headers=headers)
print(response.content)  # The response is the generated image

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

promptstr *

Prompt to render

negative_promptstr ( default: None )

Prompts to exclude, eg. 'bad anatomy, bad hands, missing fingers'

samplesint ( default: 1 ) Affects Pricing

Number of samples to generate.

min : 1,

max : 4

num_inference_stepsint ( default: 20 ) Affects Pricing

Number of denoising steps.

min : 20,

max : 100

img_widthenum:int ( default: 768 ) Affects Pricing

Image resolution.

Allowed values:

img_heightenum:int ( default: 768 ) Affects Pricing

Image resolution.

Allowed values:

prior_stepsint ( default: 25 )

Number of denoising steps.

min : 1,

max : 100

seedint ( default: -1 )

Seed for image generation.

base64boolean ( default: 1 )

Base64 encoding of the output image.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Kandinsky 2.2

Kandinsky 2.2, a groundbreaking advancement over its predecessor, Kandinsky 2.1. With the integration of the powerful CLIP-ViT-G image encoder and the innovative ControlNet support, Kandinsky 2.2 is set to redefine the boundaries of aesthetic image creation and text comprehension.

At the heart of Kandinsky 2.2 lies the state-of-the-art CLIP-ViT-G image encoder, a transformative addition that amplifies the model's ability to craft visually stunning images while enhancing its text understanding capabilities. Complementing this is the ControlNet mechanism, a strategic inclusion designed to offer users unparalleled control over the image generation process.

Advantages

Enhanced Image Aesthetics: The CLIP-ViT-G encoder ensures the generation of visually richer and more captivating images.
Superior Text Understanding: With the new encoder, the model boasts an improved comprehension of text, bridging the gap between textual prompts and visual outputs.
Precision Control: The ControlNet support empowers users to guide the image generation process, ensuring outputs that align with their vision.
Optimized Performance: The combined power of CLIP-ViT-G and ControlNet results in a significant boost in the model's overall performance.

Use Cases

Digital Art Creation: Artists can harness Kandinsky 2.2 to craft digital artworks that resonate with depth and detail.
Content Generation: Ideal for content creators seeking to generate visuals based on textual prompts or narratives.
Interactive Design: Designers can iteratively shape their designs, making real-time adjustments guided by text.
Educational Tools: Can be integrated into learning platforms, allowing students to explore the interplay between text and visuals.
Gaming and AR: Enhance user immersion in games or AR experiences by generating visuals based on in-game narratives or user prompts.

Kandinsky 2.2 License

Kandinsky 2.2's permissive license ensures that users, be they individual creators, businesses, or developers, can utilize the model for a myriad of commercial purposes without the constraints typically associated with restrictive licenses.

Other Popular Models

fooocus

Fooocus enables high-quality image generation effortlessly, combining the best of Stable Diffusion and Midjourney.

faceswap-v2

Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

sdxl1.0-txt2img

The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software

codeformer

CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.