1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/openvoice"
# Request payload
data = {
"input_audio": "https://segmind-sd-models.s3.amazonaws.com/display_images/openvoice-ip.mp3",
"language": "EN_NEWEST",
"speed": 1,
"text": "Did you ever hear a folk tale about a giant turtle?"
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Input reference audio (5-120 seconds) of a person speaking, used for training an audio model to capture voice characteristics
The language of the audio to be generated British English, American English, Indian English, French ,Chinese, Japanese, Korean are supported
Allowed values:
Speed at which the output audio is generated
min : 0.5,
max : 2
Text to be spoken or processed
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
The OpenVoice model is a state-of-the-art voice cloning technology developed by MyShell and MIT. This versatile model excels in replicating the tone and style of a reference speaker’s voice using just a short audio clip. OpenVoice supports multiple languages, including English, Spanish, French, Chinese, Japanese, and Korean, making it a powerful tool for global applications.
Accurate Tone Color Cloning: OpenVoice can precisely replicate the reference speaker’s tone, ensuring high fidelity in voice cloning.
Flexible Voice Style Control: Users can adjust various voice style parameters such as emotion, accent, rhythm, pauses, and intonation.
Zero-Shot Cross-Lingual Voice Cloning: OpenVoice can generate speech in languages not present in the training dataset, offering unparalleled flexibility.
High-Quality Audio Output: The model adopts advanced training strategies to deliver superior audio quality. Free for Commercial Use: Both OpenVoice V1 and V2 are released under the MIT License, allowing free commercial use.
Media Content Creation: Enhance videos, podcasts, and other media with high-quality voiceovers.
Interactive AI Interfaces: Improve the user experience in chatbots and virtual assistants with natural-sounding voices.
Voice Preservation: Preserve the voice of loved ones or historical figures for future generations.