What is Lip Sync / Lip Syncing?

The process of matching mouth movements to audio speech. In AI video, lip sync algorithms analyze audio waveforms and generate realistic mouth shapes frame-by-frame. Puppetry uses LivePortrait + Wav2Lip for production-quality lip sync across 65+ languages.

Learn about AI lip sync →

← Talking Head Video AI Avatar / AI Presenter →

Related Terms

Viseme

The visual mouth shape that corresponds to a phoneme (a unit of sound). English uses roughly a dozen distinct visemes — for example, the "OO" lip-rounding shape covers several different phonemes that look the same on camera. AI lip-sync engines map audio to visemes frame-by-frame to produce believable mouth movement.

Phoneme

The smallest unit of sound that distinguishes one word from another in a language. Speech-recognition and lip-sync systems first decompose audio into phonemes, then map each phoneme to a viseme (mouth shape) for animation. English has roughly 44 phonemes.

Wav2Lip

A neural network that generates accurate lip movements from audio input. It takes a face image and audio waveform, then produces video frames with perfectly synced mouth movements. Known for high accuracy across languages and accents.

AI Dubbing

Automatically replacing the original audio in a video with a synthesized translation, while keeping mouth movement convincingly aligned. Puppetry supports AI dubbing across 65+ languages — paste a script in a new language and the lip sync re-renders for that audio.

← Back to full glossary