1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/sts-eleven-labs"
# Request payload
data = {
"input_audio": "https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad_talker_audio_input.mp3",
"voice": "Sarah"
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Input Audio URL
Voice name
Allowed values:
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Eleven Labs Speech-to-Speech (STS) leverages deep learning technology to offer a powerful and versatile voice conversion solution. It enables users to modify various aspects of audio speech, catering to diverse applications in content creation, media production, and accessibility.
Speaker Identity Conversion: Transform the speaker's voice in an audio file while preserving the original content. Choose from a library of diverse voice styles and genders for a customized output.
Emotional Style Transfer: Infuse the converted speech with desired emotions, such as happiness, anger, or sadness. This functionality enhances the expressiveness and impact of audio content.
Language Translation with Voice Conversion: Achieve seamless audio translation while maintaining a natural-sounding voice in the target language. This feature expands the reach and accessibility of multilingual content.
Real-time Voice Cloning: Generate a synthetic voice clone that replicates a specific speaker's voice characteristics. This allows for voiceover creation or speech modification tasks.
Advanced Audio Editing: Utilize functionalities like noise reduction, silence removal, and audio mixing for professional-grade audio editing within the Eleven Labs platform.
Content Personalization: Enhance the engagement of your audience by tailoring the voice and emotional delivery of audio content.
Accessibility Improvements: Create multilingual audio content with natural-sounding voices, removing language barriers for global audiences.
Streamlined Content Creation: Generate voiceovers or modify existing audio speech efficiently, accelerating production workflows.
Preserving Speaker Identity: Maintain the speaker's voice characteristics while enhancing audio quality or modifying language for broader reach.
Creative Voice Exploration: Experiment with diverse voice styles and emotions to inject new life into your audio projects.