AI Video Glossary
Clear definitions of the terms you'll encounter when creating AI-powered talking head videos. From lip sync to voice cloning — we explain it all.
Talking Head Video
A video format featuring a person (or AI-generated character) speaking directly to the camera. Commonly used in education, marketing, and social media content. Puppetry turns any photo into a talking head video using AI lip-sync technology.
Lip Sync / Lip Syncing
The process of matching mouth movements to audio speech. In AI video, lip sync algorithms analyze audio waveforms and generate realistic mouth shapes frame-by-frame. Puppetry uses LivePortrait + Wav2Lip for production-quality lip sync across 29 languages.
AI Avatar / AI Presenter
A digital character animated by artificial intelligence. Unlike deepfakes (which impersonate real people), AI avatars are created from photos with the owner's consent for legitimate use cases like education, marketing, and accessibility.
Text-to-Speech (TTS)
Technology that converts written text into spoken audio. Modern TTS systems like ElevenLabs and Kokoro produce natural-sounding voices with emotion, pacing, and accent control. Puppetry offers 500+ AI voices across 29 languages.
Voice Cloning
Creating a synthetic replica of a specific person's voice using AI. Users record a short sample (30 seconds to 5 minutes), and the AI learns to reproduce their speech patterns, tone, and accent. Used for personalized video content.
AI Puppet
A still image (photo, illustration, or 3D render) that can be animated to speak using AI. Unlike traditional puppets, AI puppets require no physical manipulation — you upload a photo and the AI handles lip sync, head movement, and expressions.
LivePortrait
An open-source AI model for portrait animation. It generates natural head movements, facial expressions, and eye blinks from a single photo. Combined with Wav2Lip for lip sync, it forms the core of Puppetry's animation pipeline.
Wav2Lip
A neural network that generates accurate lip movements from audio input. It takes a face image and audio waveform, then produces video frames with perfectly synced mouth movements. Known for high accuracy across languages and accents.
Text-to-Video
The process of generating video content from text input. In Puppetry, this means typing a script, selecting a voice, and getting a fully animated talking head video — no camera, studio, or editing skills needed.
Photo-to-Video
Converting a static photograph into an animated video. AI analyzes facial features in the photo and generates realistic motion including lip sync, head turns, and expressions. Works with real photos, illustrations, and 3D renders.
Neural Voice
A synthetic voice generated by deep neural networks (as opposed to older concatenative TTS). Neural voices sound significantly more natural, with proper intonation, breathing, and emotional range. ElevenLabs and OpenAI are leading providers.
Deepfake vs AI Avatar
Deepfakes impersonate real people without consent, often for deception. AI avatars are created from photos with owner permission for legitimate purposes (education, marketing, accessibility). Puppetry is designed for ethical AI avatar creation — users animate their own photos or stock characters.
Ready to create your first AI video?
Upload a photo, type a script, pick a voice. Your talking head video is ready in under 90 seconds.
🎭 Try Puppetry Free →