POST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import requests import base64 # Use this function to convert an image file from the filesystem to base64 def image_file_to_base64(image_path): with open(image_path, 'rb') as f: image_data = f.read() return base64.b64encode(image_data).decode('utf-8') # Use this function to fetch an image from a URL and convert it to base64 def image_url_to_base64(image_url): response = requests.get(image_url) image_data = response.content return base64.b64encode(image_data).decode('utf-8') api_key = "YOUR_API_KEY" url = "https://api.segmind.com/v1/openvoice" # Request payload data = { "input_audio": "https://segmind-sd-models.s3.amazonaws.com/display_images/openvoice-ip.mp3", "language": "EN_NEWEST", "speed": 1, "text": "Did you ever hear a folk tale about a giant turtle?" } headers = {'x-api-key': api_key} response = requests.post(url, json=data, headers=headers) print(response.content) # The response is the generated image
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


input_audiostr *

Input reference audio (5-120 seconds) of a person speaking, used for training an audio model to capture voice characteristics


languageenum:str ( default: EN_NEWEST )

The language of the audio to be generated British English, American English, Indian English, French ,Chinese, Japanese, Korean are supported

Allowed values:


speedfloat *

Speed at which the output audio is generated

min : 0.5,

max : 2


textstr *

Text to be spoken or processed

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

OpenVoice Model: Instant Voice Cloning with Multi-Lingual Support

The OpenVoice model is a state-of-the-art voice cloning technology developed by MyShell and MIT. This versatile model excels in replicating the tone and style of a reference speaker’s voice using just a short audio clip. OpenVoice supports multiple languages, including English, Spanish, French, Chinese, Japanese, and Korean, making it a powerful tool for global applications.

Key Features of OpenVoice

  • Accurate Tone Color Cloning: OpenVoice can precisely replicate the reference speaker’s tone, ensuring high fidelity in voice cloning.

  • Flexible Voice Style Control: Users can adjust various voice style parameters such as emotion, accent, rhythm, pauses, and intonation.

  • Zero-Shot Cross-Lingual Voice Cloning: OpenVoice can generate speech in languages not present in the training dataset, offering unparalleled flexibility.

  • High-Quality Audio Output: The model adopts advanced training strategies to deliver superior audio quality. Free for Commercial Use: Both OpenVoice V1 and V2 are released under the MIT License, allowing free commercial use.

Use Cases

  • Media Content Creation: Enhance videos, podcasts, and other media with high-quality voiceovers.

  • Interactive AI Interfaces: Improve the user experience in chatbots and virtual assistants with natural-sounding voices.

  • Voice Preservation: Preserve the voice of loved ones or historical figures for future generations.