What is Text-to-Video?

The process of generating video content from text input. In Puppetry, this means typing a script, selecting a voice, and getting a fully animated talking head video — no camera, studio, or editing skills needed.

Try text-to-video →

← Wav2Lip Photo-to-Video →

Related Terms

Text-to-Speech (TTS)

Technology that converts written text into spoken audio. Modern TTS systems produce natural-sounding voices with emotion, pacing, and accent control. Puppetry offers 500+ AI voices across 65+ languages.

Talking Head Video

A video format featuring a person (or AI-generated character) speaking directly to the camera. Commonly used in education, marketing, and social media content. Puppetry turns any photo into a talking head video using AI lip-sync technology.

Photo-to-Video

Converting a static photograph into an animated video. AI analyzes facial features in the photo and generates realistic motion including lip sync, head turns, and expressions. Works with real photos, illustrations, and 3D renders.

AI Spokesperson

A digital presenter generated and animated by AI for marketing, training, or product videos. Replaces the cost and scheduling of hiring on-camera talent. Companies typically use a single AI spokesperson identity across many videos for brand consistency.

← Back to full glossary