What is Wav2Lip?

A neural network that generates accurate lip movements from audio input. It takes a face image and audio waveform, then produces video frames with perfectly synced mouth movements. Known for high accuracy across languages and accents.

See lip sync in action →

← LivePortrait Text-to-Video →

Related Terms

Lip Sync / Lip Syncing

The process of matching mouth movements to audio speech. In AI video, lip sync algorithms analyze audio waveforms and generate realistic mouth shapes frame-by-frame. Puppetry uses LivePortrait + Wav2Lip for production-quality lip sync across 65+ languages.

Viseme

The visual mouth shape that corresponds to a phoneme (a unit of sound). English uses roughly a dozen distinct visemes — for example, the "OO" lip-rounding shape covers several different phonemes that look the same on camera. AI lip-sync engines map audio to visemes frame-by-frame to produce believable mouth movement.

Phoneme

The smallest unit of sound that distinguishes one word from another in a language. Speech-recognition and lip-sync systems first decompose audio into phonemes, then map each phoneme to a viseme (mouth shape) for animation. English has roughly 44 phonemes.

LivePortrait

An open-source AI model for portrait animation. It generates natural head movements, facial expressions, and eye blinks from a single photo. Combined with Wav2Lip for lip sync, it forms the core of Puppetry's animation pipeline.

← Back to full glossary