If you're looking for an API, you can choose from your desired programming language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/video-captioner"
# Request payload
data = {
"MaxChars": 10,
"bg_blur": False,
"bg_color": "None",
"color": "white",
"font": "Poppins/Poppins-ExtraBold.ttf",
"fontsize": 7,
"highlight_color": "yellow",
"input_video": "https://segmind-sd-models.s3.amazonaws.com/display_images/hallo_output.mp4",
"kerning": -2,
"opacity": 0,
"right_to_left": False,
"stroke_color": "black",
"stroke_width": 2,
"subs_position": "bottom75"
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Max characters space for subtitles. 20 is good for videos, 10 is good for reels
Blur the background colour of the subtitle
Background color of the subtitles
Allowed values:
Color of the subtitles
Allowed values:
Font for the subtitles
Allowed values:
Font size. 7.0 is good for videos, 4.0 is good for reels
Highlight color for subtitles
Allowed values:
URL of the input video to be captioned
Kerning (spacing between individual letters or characters) for the subtitles
Opacity for the subtitles background
Right to left subtitles, for right to left languages. Only Arial fonts are supported.
Stroke color for the subtitles
Allowed values:
Stroke width for the subtitles
Position of the subtitles
Allowed values:
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
The Video Captioner model is engineered to revolutionize the way you handle video subtitle integration, enhancing both accessibility and viewer engagement. Leveraging state-of-the-art algorithms, this tool provides a seamless process for generating precise video captions with customized stylistic options.
Dynamic Subtitle Positioning: Configure subtitles to display at your desired position, providing optimal readability by setting preferences such as bottom,top, left, right etc .
Customizable Aesthetics: Tailor subtitle appearance with comprehensive settings including color adjustments (e.g., white subtitles, yellow highlight, black stroke), font selection (such as Poppins ExtraBold), and precise font sizing.
Background and Opacity Control: Adjust transparency levels for subtitles and background color flexibility to ensure clarity and visibility on various video backgrounds.
Text Handling and Kerning: Fine-tune your text with a maximum character setting and kerning adjustments to achieve precise alignment and spacing for multilingual subtitles that adhere to right-to-left language requirements.
Educational Content: Enhance online courses, lectures, and tutorials with clear and accurate subtitles, improving comprehension and accessibility for diverse learners.
Corporate Training: Facilitate employee training programs by providing captioned videos that cater to multilingual staff and those with hearing impairments.
Social Media Marketing: Boost engagement on platforms like YouTube, Instagram, and Facebook by adding eye-catching captions to videos, ensuring content is accessible even when muted.
Film and TV Production: Streamline the post-production process by efficiently generating subtitles, enabling faster distribution across different languages and regions.
E-Learning Platforms: Offer inclusive learning experiences by integrating subtitles in courses, allowing institutions to cater to global audiences.
SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers
SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training
The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software