MyShell Text To Speech

MyShell's Voice Cloning and Text to Speech - Transform your audio content with realistic, personalized voices. Experience high-quality, efficient, and cost-effective audio synthesis.

Playground API Pricing

API

If you're looking for an API, you can choose from your desired programming language.

POST

import requests
import base64

# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
    with open(image_path, 'rb') as f:
        image_data = f.read()
    return base64.b64encode(image_data).decode('utf-8')

# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
    response = requests.get(image_url)
    image_data = response.content
    return base64.b64encode(image_data).decode('utf-8')

api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/myshell-tts"

# Request payload
data = {
  "voice": "michael",
  "language": "EN_NEWEST",
  "text": "Did you ever hear a folk tale about a giant turtle?",
  "speed": 1
}

headers = {'x-api-key': api_key}

response = requests.post(url, json=data, headers=headers)
print(response.content)  # The response is the generated image

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

voiceenum:str *

Select the name of the voice to generate output audio.

Allowed values:

languageenum:str ( default: EN_NEWEST )

Language of the text or audio

Allowed values:

textstr *

Text to be spoken or processed

speedfloat *

Speed at which the text or audio is processed

min : 0.5,

max : 2

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

MyShell Voice Cloning and Text-to-Speech Technology

MyShell Voice Cloning and Text-to-Speech (TTS) technology represents a significant advancement in audio synthesis. By leveraging state-of-the-art deep learning techniques, it offers exceptional realism, flexibility, and cost-effectiveness.

Key Features

Advanced TTS: TTS engine converts written text into natural-sounding speech, mimicking human vocal characteristics with high fidelity.
State-of-the-Art Voice Cloning: With just a brief voice sample, the model can accurately replicate a speaker's unique vocal identity, enabling the creation of highly personalized and realistic audio content.
Efficiency and Cost-Effectiveness: MyShell's technology offers substantial cost reductions compared to traditional TTS methods, making advanced audio synthesis accessible to a wider range of users and applications.

Use Cases

Content Creation: Generate realistic voiceovers for videos, podcasts, and audiobooks.
Gaming and Virtual Assistants: Develop engaging and personalized virtual characters.
Accessibility: Provide audio alternatives for text-based content, making it accessible to individuals with visual impairments.
Business and Marketing: Create branded voice experiences for advertising, customer service, and interactive campaigns.

Other Popular Models

illusion-diffusion-hq

Monster Labs QrCode ControlNet on top of SD Realistic Vision v5.1

face-to-many

Turn a face into 3D, emoji, pixel art, video game, claymation or toy

codeformer

CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.

sd2.1-faceswapper

Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training