Openvoice

OpenVoice is a versatile voice cloning model that supports multiple languages and offers precise tone replication, flexible style control, and zero-shot cross-lingual capabilities


API

If you're looking for an API, you can choose from your desired programming language.

POST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import requests import base64 # Use this function to convert an image file from the filesystem to base64 def image_file_to_base64(image_path): with open(image_path, 'rb') as f: image_data = f.read() return base64.b64encode(image_data).decode('utf-8') # Use this function to fetch an image from a URL and convert it to base64 def image_url_to_base64(image_url): response = requests.get(image_url) image_data = response.content return base64.b64encode(image_data).decode('utf-8') api_key = "YOUR_API_KEY" url = "https://api.segmind.com/v1/openvoice" # Request payload data = { "input_audio": "https://segmind-sd-models.s3.amazonaws.com/display_images/openvoice-ip.mp3", "language": "EN_NEWEST", "speed": 1, "text": "Did you ever hear a folk tale about a giant turtle?" } headers = {'x-api-key': api_key} response = requests.post(url, json=data, headers=headers) print(response.content) # The response is the generated image
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


input_audiostr *

Input reference audio (5-120 seconds) of a person speaking, used for training an audio model to capture voice characteristics


languageenum:str ( default: EN_NEWEST )

The language of the audio to be generated British English, American English, Indian English, French ,Chinese, Japanese, Korean are supported

Allowed values:


speedfloat *

Speed at which the output audio is generated

min : 0.5,

max : 2


textstr *

Text to be spoken or processed

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

OpenVoice Model: Instant Voice Cloning with Multi-Lingual Support

The OpenVoice model is a state-of-the-art voice cloning technology developed by MyShell and MIT. This versatile model excels in replicating the tone and style of a reference speaker’s voice using just a short audio clip. OpenVoice supports multiple languages, including English, Spanish, French, Chinese, Japanese, and Korean, making it a powerful tool for global applications.

Key Features of OpenVoice

  • Accurate Tone Color Cloning: OpenVoice can precisely replicate the reference speaker’s tone, ensuring high fidelity in voice cloning.

  • Flexible Voice Style Control: Users can adjust various voice style parameters such as emotion, accent, rhythm, pauses, and intonation.

  • Zero-Shot Cross-Lingual Voice Cloning: OpenVoice can generate speech in languages not present in the training dataset, offering unparalleled flexibility.

  • High-Quality Audio Output: The model adopts advanced training strategies to deliver superior audio quality. Free for Commercial Use: Both OpenVoice V1 and V2 are released under the MIT License, allowing free commercial use.

Use Cases

  • Media Content Creation: Enhance videos, podcasts, and other media with high-quality voiceovers.

  • Interactive AI Interfaces: Improve the user experience in chatbots and virtual assistants with natural-sounding voices.

  • Voice Preservation: Preserve the voice of loved ones or historical figures for future generations.

Cookie settings

We use cookies to enhance your browsing experience, analyze site traffic, and personalize content. By clicking "Accept all", you consent to our use of cookies.