POST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import requests import base64 # Use this function to convert an image file from the filesystem to base64 def image_file_to_base64(image_path): with open(image_path, 'rb') as f: image_data = f.read() return base64.b64encode(image_data).decode('utf-8') # Use this function to fetch an image from a URL and convert it to base64 def image_url_to_base64(image_url): response = requests.get(image_url) image_data = response.content return base64.b64encode(image_data).decode('utf-8') api_key = "YOUR_API_KEY" url = "https://api.segmind.com/v1/sts-eleven-labs" # Request payload data = { "input_audio": "https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad_talker_audio_input.mp3", "voice": "Sarah" } headers = {'x-api-key': api_key} response = requests.post(url, json=data, headers=headers) print(response.content) # The response is the generated image
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


input_audiostr *

Input Audio URL


voiceenum:str ( default: Sarah )

Voice name

Allowed values:

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Elevenlabs Speech To Speech

Eleven Labs Speech-to-Speech (STS) leverages deep learning technology to offer a powerful and versatile voice conversion solution. It enables users to modify various aspects of audio speech, catering to diverse applications in content creation, media production, and accessibility.

Core Functionalities of Eleven Labs Speech-to-Speech

  • Speaker Identity Conversion: Transform the speaker's voice in an audio file while preserving the original content. Choose from a library of diverse voice styles and genders for a customized output.

  • Emotional Style Transfer: Infuse the converted speech with desired emotions, such as happiness, anger, or sadness. This functionality enhances the expressiveness and impact of audio content.

  • Language Translation with Voice Conversion: Achieve seamless audio translation while maintaining a natural-sounding voice in the target language. This feature expands the reach and accessibility of multilingual content.

  • Real-time Voice Cloning: Generate a synthetic voice clone that replicates a specific speaker's voice characteristics. This allows for voiceover creation or speech modification tasks.

  • Advanced Audio Editing: Utilize functionalities like noise reduction, silence removal, and audio mixing for professional-grade audio editing within the Eleven Labs platform.

Benefits of Utilizing Eleven Labs Speech-to-Speech

  • Content Personalization: Enhance the engagement of your audience by tailoring the voice and emotional delivery of audio content.

  • Accessibility Improvements: Create multilingual audio content with natural-sounding voices, removing language barriers for global audiences.

  • Streamlined Content Creation: Generate voiceovers or modify existing audio speech efficiently, accelerating production workflows.

  • Preserving Speaker Identity: Maintain the speaker's voice characteristics while enhancing audio quality or modifying language for broader reach.

  • Creative Voice Exploration: Experiment with diverse voice styles and emotions to inject new life into your audio projects.