Explore the six core HeyGen APIs and learn when to use Video Generation, Video Agent, Template, Translation, Proofread, and Text to Speech for scalable AI video workflows.
The HeyGen API is a suite of REST endpoints for generating, translating, and voicing AI video without cameras, studios, or editing teams. It is organized into six core APIs: Video Generation, Video Agent, Template, Video Translation, Proofread, and Text to Speech. This guide explains what each API does, which endpoint it calls, which parameters matter, and the jobs it is built for, so you can pick the right one for your use case.
If you are evaluating HeyGen for programmatic video, the short version is this: use Video Generation or Video Agent to create a video; the Template API to scale variations of it; Video Translation and Proofread to localize it; and Text to Speech when you only need the audio.
The 6 HeyGen APIs at a glance
Video Generation
What it does: Creates an avatar-led video from a script or audio file
Endpoint: POST /v3/videos
Best for: Onboarding, L&D, product how-tos
Video Agent
What it does: Turns a text prompt into a finished video end to end
Endpoint: POST /v3/video-agents
Best for: Wiki or knowledge base to video, fast drafts
Template
What it does: Generates on-brand video variations from a reusable template
Endpoint: POST /v2/template/{template_id}/generate
Best for: Personalized video at scale
Video Translation
What it does: Translates and dubs a video into 175+ languages with lip-sync
Endpoint: POST /v3/video-translations
Best for: Localizing launches and training
Proofread
What it does: Extracts an editable transcript to review before translating
Endpoint: POST /v3/video-translations/proofreads
Best for: Accuracy control before localization
Text to Speech
What it does: Synthesizes natural speech audio from text
Endpoint: POST /v3/voices/speech
Best for: Voiceovers, narration, audio tracks
All six use the same authentication and async conventions, covered in the common conventions section below.
What is the HeyGen Video Generation API?
The HeyGen Video Generation API creates an avatar-led video from a text script or a pre-recorded audio file, with no camera or studio required. It is the foundational way to produce a single avatar delivering a message, and it is built to automate onboarding and L&D videos.
What it does:
- Drives a HeyGen avatar, including studio avatars, digital twins, and photo avatars, by either a text script paired with a
voice_id, or an uploaded audio track for lip-sync viaaudio_urloraudio_asset_id. Script and audio are mutually exclusive. - Supports the Avatar IV and Avatar V engines. Avatar IV is the default, and you set the
enginefield to select Avatar V for eligible avatars. Avatar III generation uses the legacy v1 or v2 API. - Outputs at 4k, 1080p, or 720p, in aspect ratios including 16:9, 9:16, 4:5, 1:1, and auto, as either an MP4 or a WebM with a transparent background.
- Adds backgrounds and background removal, burned-in or sidecar captions, and a custom watermark for select Enterprise customers.
- For photo avatars on Avatar IV, it accepts a
motion_promptand anexpressivenesslevel to control body motion.
When to use it: You have a script or audio track and want one avatar to deliver it programmatically, at volume, for onboarding, training, or product walkthroughs.
Endpoint: POST /v3/videos
What is the HeyGen Video Agent API?
The HeyGen Video Agent API turns a single text prompt into a finished video, handling scripting, avatar selection, scene composition, and automatic rendering. It is the fastest path from an idea or a document to a watchable first draft.
What it does:
- Runs in two modes.
generateis one-shot and fire-and-forget, auto-proceeding through the storyboard to produce a video. Meanwhile,chatis multi-turn, pausing for real decisions such as picking a voice, and allowing revisions and follow-up videos. - Accepts up to 20 file attachments, so you can ground a video in an internal wiki, a product doc, or a knowledge base article.
- Takes optional
avatar_id,voice_id,style_id, andbrand_kit_idto apply specific avatars, voices, curated visual styles, and brand colors, fonts, and logos. - Auto-detects orientation from the content when
orientationis not provided.
When to use it: You want a video from a prompt or a piece of internal documentation, you want a fast first draft, or you want non-technical teammates to create a video from a brief.
Endpoint: POST /v3/video-agents
What is the HeyGen Template API?
The HeyGen Template API generates video variations from a reusable template by swapping placeholder variables. You define the avatar, voice, layout, and branding once, then produce many on-brand versions at scale.
What it does:
- Replaces template placeholders through a
variablesmap. Each variable is typed as text, image, video, audio, voice, or character, and carries a type-specificpropertiespayload, for example, replacement copy, a media URL or asset ID, avoice_id, or a character_id for an avatar or talking photo. - Restricts a render to a subset of scenes with
scene_ids, overrides outputdimensionandfps, and adds burned-in subtitles. - Applies a brand glossary for translation and pronunciation rules, organizes output into a folder, and can render in
testmode at lower quality without deducting quota.
When to use it: You need personalized video at scale, such as account-based sales videos or localized variants of one layout, while keeping brand consistency across every render.
Endpoint: POST /v2/template/{template_id}/generate. Note this is a v2 endpoint.
What is the HeyGen Video Translation API?
The HeyGen Video Translation API translates and dubs an existing video into one or more target languages, with voice cloning and lip-sync. It localizes training and product launches in 175+ languages and dialects with 99% lip-sync accuracy.
What it does:
- Returns one
video_translation_idper language. Pass a single language for one translation, or several for a batch. - Offers two quality modes.
speedis the default for fast turnaround.precisionproduces higher lip-sync quality using avatar inference. - Includes controls for
translate_audio_only, captions, speaker separation viaspeaker_num, partial translation withstart_timeandend_time, background music removal, and speech enhancement. - Applies a brand glossary so custom terms translate correctly, for example, treating "Reformer" as the Pilates equipment rather than a political activist.
When to use it: You have a finished source video and need faithful, lip-synced versions in other markets for launches, training, or product education.
Endpoint: POST /v3/video-translations
What is the HeyGen Proofread API?
The HeyGen Proofread API extracts editable subtitles from a video, enabling you to review and correct the transcript before final translation and rendering. It is the quality-assurance step before Video Translation.
What it does:
- Creates a proofread session that surfaces the source transcript as editable subtitles, so you can fix names, jargon, brand terms, or transcription errors before any languages are produced.
- Carries the same localization controls as the Translation API, including brand glossary,
speaker_num, thespeedandprecisionmodes, music removal, and speech enhancement. - Accepts one or more
output_languages, so you can prepare a single proofread or batch several at once.
When to use it: Transcript accuracy matters before you localize, for example, with technical terminology, regulated content, or brand names that must not be mistranslated.
Endpoint: POST /v3/video-translations/proofreads
What is the HeyGen Text to Speech API?
The HeyGen Text to Speech API synthesizes speech audio from text using a chosen voice. It is a standalone voice engine for narration and audio tracks, with strong consistency, low latency, and emotional control.
What it does:
- Uses voices that support the starfish engine. Find compatible voices with
GET /v3/voices?engine=starfish. - Accepts plain text or SSML markup, synthesizes up to 5000 characters per request, and supports a speed multiplier from 0.5 to 2.0x.
- Auto-detects the language from the text, or lets you set it explicitly with a
languageor a BCP-47localetag. - Returns a URL to the generated audio file along with its duration and optional word-level timestamps.
When to use it: You need a voiceover or narration track on its own, or audio that you then feed into the Video Generation API for lip-sync. The word-level timestamps are useful for captioning and precise synchronization.
Endpoint: POST /v3/voices/speech
Which HeyGen API should you use?
Match the job to the API:
- I want a video of an avatar reading my script. Use the Video Generation API.
- I want a video from just a prompt or a wiki article. Use the Video Agent API.
- I need many on-brand variations of the same video. Use the Template API.
- I have a finished video and need it in other languages. Use the Video Translation API.
- I want to correct the transcript before translating. Use the Proofread API.
- I only need an audio voiceover. Use the Text to Speech API.
How the HeyGen APIs work together
The six APIs are designed to chain into pipelines:
- Prompt to localized video: Draft with the Video Agent API or script with the Video Generation API, run the Proofread API to verify the transcript, then dub into target languages with the Video Translation API.
- Audio-first production: Generate a track with the Text to Speech API, then lip-sync an avatar to it through the Video Generation API using
audio_urloraudio_asset_id. - Scaled personalization: Build a layout once and render hundreds of variants with the Template API, optionally supplying voices from Text to Speech.
They also share building blocks. Brand glossary IDs are reused across Translation, Proofread, and Template to keep terminology consistent, and async completion is reported through callback_url webhooks on the endpoints that support them.
Common conventions across the HeyGen API
- Authentication: Every endpoint requires your HeyGen API key in the
x-api-keyheader. Obtain it from your HeyGen dashboard. - Safe retries: Mutation endpoints accept an optional
Idempotency-Keyheader. A retry within 24 hours that reuses the key replays the original response, so you can retry safely without creating duplicate jobs. - Asynchronous results: Video jobs are long-running. Provide a
callback_url, and optionally acallback_id, to receive a webhook when rendering completes instead of polling.
Getting started
Pick the API that matches your job from the table above, authenticate with your x-api-key header, and provide a callback_url to receive results when rendering finishes. Full request and response schemas are in the HeyGen API documentation for every endpoint.
Frequently asked questions
What is the HeyGen API?
The HeyGen API is a set of REST endpoints for creating, translating, and voicing AI video programmatically, without cameras or studios. It spans six core APIs covering generation, prompt-to-video, templated video, translation, transcript proofreading, and text to speech.
How many APIs does HeyGen have, and what are they?
HeyGen offers six core video APIs: Video Generation, Video Agent, Template, Video Translation, Proofread, and Text to Speech.
Which HeyGen API translates video?
The HeyGen Video Translation API translates and dubs video into other languages with voice cloning and lip-sync, at POST /v3/video-translations.
What is the difference between the Video Generation API and the Video Agent API?
The Video Generation API renders an avatar speaking a script or audio you provide, giving you direct control over the avatar, voice, and output. The Video Agent API takes a single prompt and handles scripting, avatar selection, scene composition, and rendering for you, which is faster but less manual.
Does HeyGen have a text to speech API?
Yes. The HeyGen Text to Speech API synthesizes speech from text using starfish-engine voices, supports plain text or SSML, a 0.5 to 2.0x speed range, and returns an audio URL with duration and optional word-level timestamps, at POST /v3/voices/speech.
How many languages does HeyGen Video Translation support?
The HeyGen Video Translation API supports 175+ languages and dialects with 99% lip-sync accuracy.
What is the HeyGen Proofread API for?
The HeyGen Proofread API extracts an editable transcript from a video, enabling you to correct names, jargon, and errors before translating. This improves the accuracy of the final localized videos.
Can I generate personalized videos at scale with HeyGen?
Yes. The HeyGen Template API lets you define avatar, voice, layout, and branding once, then render many variations by passing values into typed template variables, at POST /v2/template/{template_id}/generate.
How do I authenticate with the HeyGen API?
Send your HeyGen API key in the x-api-key request header. You can obtain the key from your HeyGen dashboard.







