Question 1

What is Talking Head Video?

Accepted Answer

A video format featuring a person (or AI-generated character) speaking directly to the camera. Commonly used in education, marketing, and social media content. Puppetry turns any photo into a talking head video using AI lip-sync technology.

Question 2

What is Lip Sync / Lip Syncing?

Accepted Answer

The process of matching mouth movements to audio speech. In AI video, lip sync algorithms analyze audio waveforms and generate realistic mouth shapes frame-by-frame. Puppetry uses LivePortrait + Wav2Lip for production-quality lip sync across 65+ languages.

Question 3

What is AI Avatar / AI Presenter?

Accepted Answer

A digital character animated by artificial intelligence. Unlike deepfakes (which impersonate real people), AI avatars are created from photos with the owner's consent for legitimate use cases like education, marketing, and accessibility.

Question 4

What is Text-to-Speech (TTS)?

Accepted Answer

Technology that converts written text into spoken audio. Modern TTS systems produce natural-sounding voices with emotion, pacing, and accent control. Puppetry offers 500+ AI voices across 65+ languages.

Question 5

What is Voice Cloning?

Accepted Answer

Creating a synthetic replica of a specific person's voice using AI. Users record a short sample (30 seconds to 5 minutes), and the AI learns to reproduce their speech patterns, tone, and accent. Used for personalized video content.

Question 6

What is AI Puppet?

Accepted Answer

A still image (photo, illustration, or 3D render) that can be animated to speak using AI. Unlike traditional puppets, AI puppets require no physical manipulation — you upload a photo and the AI handles lip sync, head movement, and expressions.

Question 7

What is LivePortrait?

Accepted Answer

An open-source AI model for portrait animation. It generates natural head movements, facial expressions, and eye blinks from a single photo. Combined with Wav2Lip for lip sync, it forms the core of Puppetry's animation pipeline.

Question 8

What is Wav2Lip?

Accepted Answer

A neural network that generates accurate lip movements from audio input. It takes a face image and audio waveform, then produces video frames with perfectly synced mouth movements. Known for high accuracy across languages and accents.

Question 9

What is Text-to-Video?

Accepted Answer

The process of generating video content from text input. In Puppetry, this means typing a script, selecting a voice, and getting a fully animated talking head video — no camera, studio, or editing skills needed.

Question 10

What is Photo-to-Video?

Accepted Answer

Converting a static photograph into an animated video. AI analyzes facial features in the photo and generates realistic motion including lip sync, head turns, and expressions. Works with real photos, illustrations, and 3D renders.

Question 11

What is Neural Voice?

Accepted Answer

A synthetic voice generated by deep neural networks (as opposed to older concatenative TTS). Neural voices sound significantly more natural, with proper intonation, breathing, and emotional range. Leading providers produce voices.

Question 12

What is Deepfake vs AI Avatar?

Accepted Answer

Deepfakes impersonate real people without consent, often for deception. AI avatars are created from photos with owner permission for legitimate purposes (education, marketing, accessibility). Puppetry is designed for ethical AI avatar creation — users animate their own photos or stock characters.

Question 13

What is Viseme?

Accepted Answer

The visual mouth shape that corresponds to a phoneme (a unit of sound). English uses roughly a dozen distinct visemes — for example, the "OO" lip-rounding shape covers several different phonemes that look the same on camera. AI lip-sync engines map audio to visemes frame-by-frame to produce believable mouth movement.

Question 14

What is Phoneme?

Accepted Answer

The smallest unit of sound that distinguishes one word from another in a language. Speech-recognition and lip-sync systems first decompose audio into phonemes, then map each phoneme to a viseme (mouth shape) for animation. English has roughly 44 phonemes.

Question 15

What is AI Dubbing?

Accepted Answer

Automatically replacing the original audio in a video with a synthesized translation, while keeping mouth movement convincingly aligned. Puppetry supports AI dubbing across 65+ languages — paste a script in a new language and the lip sync re-renders for that audio.

Question 16

What is Video Translation?

Accepted Answer

Translating a spoken-video script into another language and re-rendering the speaker so their lips match the new audio. Differs from subtitles (which just overlay text) — translated video keeps the speaker on camera and feels native to the target audience.

Question 17

What is AI Spokesperson?

Accepted Answer

A digital presenter generated and animated by AI for marketing, training, or product videos. Replaces the cost and scheduling of hiring on-camera talent. Companies typically use a single AI spokesperson identity across many videos for brand consistency.

Question 18

What is Digital Human?

Accepted Answer

A photorealistic, AI-driven digital character designed to interact with people in video, web, or real-time settings. Sometimes used interchangeably with "AI avatar," though digital human usually implies higher fidelity and full-body presence.

Question 19

What is Speech-to-Text (STT)?

Accepted Answer

The reverse of TTS: converting spoken audio into written text. Modern STT systems like OpenAI's Whisper handle accents, background noise, and many languages. Puppetry uses STT internally to align speech to visemes for accurate lip sync.

Question 20

What is Synthetic Media?

Accepted Answer

Any media (image, audio, video) generated or substantially modified by AI rather than captured from the real world. Encompasses AI avatars, AI voice, generated music, and AI-translated video. The legitimate, consent-based end of the spectrum is sometimes called "synthetic content."

AI Video Glossary

Talking Head Video

Lip Sync / Lip Syncing

AI Avatar / AI Presenter

Text-to-Speech (TTS)

Voice Cloning

AI Puppet

LivePortrait

Wav2Lip

Text-to-Video

Photo-to-Video

Neural Voice

Deepfake vs AI Avatar

Viseme

Phoneme

AI Dubbing

Video Translation

AI Spokesperson

Digital Human

Speech-to-Text (STT)

Synthetic Media

Ready to create your first AI video?