ElevenLabs Dubbing

Instantly dubs audio and video into 29 languages while preserving each speaker's original voice.

~92.92s
~$0.25

Inputs

URL of the source video/audio file to be dubbed. Supports MP3, MP4, and other common formats.

The target language to dub the content into. Use 'hi' for Hindi, 'es' for Spanish, 'fr' for French.

Language of the source audio. Use 'auto' to detect automatically or specify for faster processing.

Examples

0:00 / 0:00
--

ElevenLabs Dubbing: AI-Powered Audio & Video Translation API

What is ElevenLabs Dubbing?

ElevenLabs Dubbing is an AI-powered dubbing API that translates audio and video content into 29 languages while preserving each speaker's original voice, tone, emotion, and timing. Unlike traditional localization workflows that require re-recording with human voice actors, ElevenLabs Dubbing automates the entire pipeline — from speaker separation and transcription to translation, speech synthesis, and audio re-sync — in a single API call.

Built on ElevenLabs' Multilingual v2 model, it handles complex real-world content: overlapping dialogue, background music, ambient noise, whispers, and shouted lines. The result is natural-sounding, voice-cloned multilingual audio that maintains your original speaker's identity across languages.


Key Features

  • 29-language support — English, Hindi, Spanish, Japanese, Arabic, French, German, Korean, Tamil, and more
  • Automatic speaker separation — detects and isolates multiple overlapping voices with zero manual configuration
  • Voice cloning with emotion retention — preserves accent, tone, and emotional nuance using Multilingual v2
  • Background audio preservation — music, SFX, and ambient noise survive the dubbing process intact
  • Segment-level dubbing — use start_time and end_time parameters to dub specific clips within longer files
  • Auto language detection — set source_lang: auto for hands-free source identification
  • Advanced controls — profanity filtering, highest-resolution output, voice cloning toggle, and CSV-based manual mode
  • Python and TypeScript SDKs — production-ready async API with straightforward status polling

Best Use Cases

  • Content creators and YouTubers localizing videos for Hindi, Spanish, or Arabic-speaking audiences
  • Podcast producers generating multilingual versions of long-form audio without re-recording
  • Media studios dubbing trailers, courses, or documentary content at scale
  • EdTech platforms delivering educational video in regional languages without hiring voice actors
  • App developers building programmatic translation pipelines for UGC platforms or streaming products
  • Corporate teams localizing training videos and product demos for global rollouts

Prompt Tips and Output Quality

Start with source_lang: auto unless you know the source language precisely — auto-detection is accurate and simplifies your workflow. For content with a known fixed language, specifying it directly speeds up processing.

Set num_speakers manually for dense dialogue. The default auto-detection works well for 1–3 speakers, but for panel discussions, interviews, or multi-character audio, providing an explicit count improves speaker separation quality significantly.

Use start_time and end_time for iteration. When testing output quality on long-form video, dub a representative 2–3 minute segment first before committing to full-file processing.

Keep drop_background_audio: false for most content. ElevenLabs Dubbing's ability to retain background music is a core differentiator — disabling it is best reserved for clean voiceover or podcast-only content.

Enable highest_resolution: true when dubbing video destined for broadcast, YouTube, or professional distribution.

Avoid CSV/manual mode in production. The manual mode with custom CSV transcripts is experimental and better suited for testing edge cases, not live pipelines.


FAQs

How long does dubbing take for a 30-minute video? Processing is asynchronous and scales with content length. A 30-minute video can take several minutes to process. Use the status polling endpoint to check job completion — avoid setting fixed timeouts.

Which audio and video formats are supported? The API accepts MP3, MP4, and most common audio/video formats via URL. You can also pass direct URLs from YouTube, TikTok, or cloud storage buckets.

Does voice cloning work for all 29 languages? Yes — voice cloning is applied by default across all supported languages using the Multilingual v2 model. Set disable_voice_cloning: true if you prefer generic ElevenLabs library voices instead.

What happens to background music during dubbing? By default, background audio (music, ambient sound, SFX) is separated and re-layered into the dubbed output. Set drop_background_audio: true only if you want a clean speech-only track.

Can I target a specific accent for dubbed voices? The target_accent parameter (e.g., "american", "british") is available but experimental. It's not recommended for production use and may produce inconsistent results across languages.

Is there a character or length limit? The API applies a character limit of approximately 3,000 characters per minute of content. Plan your content segmentation accordingly for very long or text-dense files.