SadTalker

Audio-based Lip Synchronization for Talking Head Video

API

If you're looking for an API, you can choose from your desired programming language.

POST

import requests
import base64

# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
    with open(image_path, 'rb') as f:
        image_data = f.read()
    return base64.b64encode(image_data).decode('utf-8')

# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
    response = requests.get(image_url)
    image_data = response.content
    return base64.b64encode(image_data).decode('utf-8')

# Use this function to convert a list of image URLs to base64
def image_urls_to_base64(image_urls):
    return [image_url_to_base64(url) for url in image_urls]

api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/sadtalker"

# Request payload
data = {
  "input_image": image_url_to_base64("https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad-talker-input.png"),  # Or use image_file_to_base64("IMAGE_PATH")
  "input_audio": "https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad_talker_audio_input.mp3",
  "pose_style": 4,
  "expression_scale": 1.4,
  "preprocess": "full",
  "image_size": 256,
  "enhancer": True,
  "base64": False
}

headers = {'x-api-key': api_key}

response = requests.post(url, json=data, headers=headers)
print(response.content)  # The response is the generated image

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

input_imageimage *

Input image of a talking-head.

input_audiostr *

Input audio file. Avoid special symbol in the filename as it may cause ffmpeg erros.

pose_styleint ( default: 4 ) Affects Pricing

Pose Style

min : 0,

max : 45

expression_scalefloat ( default: 1.4 ) Affects Pricing

A larger value will make the expression motion stronger

min : 1,

max : 3

preprocessstr ( default: full ) Affects Pricing

Method to preprocess the image

image_sizeenum:str ( default: 256 ) Affects Pricing

The image size of the facerender

Allowed values:

enhancerboolean ( default: true )

Enhance the output video

base64boolean ( default: 1 )

Base64 encoding of the output image.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

SadTalker

SadTalker generates natural-looking, 3D facial expressions synchronized with audio input. It takes a single image of a face and, based on the audio it receives, animates the face with realistic movements that correspond to the spoken words. This has the potential to revolutionize various fields, from filmmaking and animation to video conferencing and education.

Here's what sets SadTalker apart:

Unmatched Realism: SadTalker directly learns the connection between audio and facial expressions. This results in incredibly natural and nuanced animations that capture the subtle details of human speech.
Stylized Output: SadTalker offers the flexibility to create stylized animations. Imagine generating videos with exaggerated expressions for comedic effect or subtle movements for a more dramatic tone.
Single Image Sufficiency: SadTalker can work wonders with just a single image, making it incredibly user-friendly and adaptable.

Applications of SadTalker

Film and Animation: Bring characters to life with unparalleled emotional depth and authenticity. SadTalker can animate characters in real-time, allowing for more efficient animation workflows.
Video Conferencing: Enhance video calls with lifelike facial expressions, fostering a more engaging and interactive experience. Imagine video meetings where avatars mirror your emotions, creating a more natural connection.
Education: Create engaging and interactive educational content. SadTalker can be used to animate historical figures, language tutors, or even educational mascots, making learning more fun and immersive.
Gaming: Develop next-generation in-game characters with dynamic facial expressions that react to gameplay events, creating a deeper sense of immersion for gamers.

Other Popular Models

sdxl-img2img

SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers