What is Text-to-Speech (TTS)?

Technology that converts written text into spoken audio. Modern TTS systems produce natural-sounding voices with emotion, pacing, and accent control. Puppetry offers 500+ AI voices across 65+ languages.

Try AI voice generation →

← AI Avatar / AI Presenter Voice Cloning →

Related Terms

Neural Voice

A synthetic voice generated by deep neural networks (as opposed to older concatenative TTS). Neural voices sound significantly more natural, with proper intonation, breathing, and emotional range. Leading providers produce voices.

Voice Cloning

Creating a synthetic replica of a specific person's voice using AI. Users record a short sample (30 seconds to 5 minutes), and the AI learns to reproduce their speech patterns, tone, and accent. Used for personalized video content.

Speech-to-Text (STT)

The reverse of TTS: converting spoken audio into written text. Modern STT systems like OpenAI's Whisper handle accents, background noise, and many languages. Puppetry uses STT internally to align speech to visemes for accurate lip sync.

AI Dubbing

Automatically replacing the original audio in a video with a synthesized translation, while keeping mouth movement convincingly aligned. Puppetry supports AI dubbing across 65+ languages — paste a script in a new language and the lip sync re-renders for that audio.

← Back to full glossary