All guides
AI Video 11 min read June 2026

The Complete Guide to Prompting LTX-2

LTX-2 rewards long, detailed prompts — the official 7-step formula, text / image / audio modes, camera vocabulary, audio & dialogue, the settings that belong outside the prompt, and the mistakes to avoid.

LTX-2 (Lightricks) is a DiT-based foundation model that generates synchronised audio and video together — up to 4K, with fast and high-fidelity modes. It has one strong, well-documented preference that sets it apart: it wants long, detailed prompts.

This guide follows Lightricks' official LTX-2 README and prompt guide, plus fal.ai's LTX-2 image-to-video guidance. Short, vague prompts under-perform here — the detail is the control.

Think like a cinematographer
Lightricks' own words: write a single flowing paragraph of detailed, chronological description — specific movements, appearances, camera angles, and environmental details. Start with the action, keep it literal and precise, and stay within ~200 words.
01Section

The golden rule

“LTX responds best to long, detailed prompts — the more specific you are about subject, action, lighting, camera movement, and audio, the closer the output matches your vision.” Longer, descriptive prompts consistently beat short ones.

One flowing paragraph

Present-tense verbs, literal and precise. Start directly with the action — no preamble.

Match length to duration

A 10-word prompt can't fill a 10-second clip. For 8–10s, write enough detail to fill the time.

Let the model expand a sketch
LTX pipelines support automatic prompt enhancement (an enhance_prompt option) that uses an LLM to expand a short prompt — handy when you want detail without writing it all by hand.
02Section

The 7-step formula

Lightricks' official ordering for building that paragraph:

01

Main action

Open with the core action in a single sentence.

e.g. “A lighthouse keeper walks along the rocky shore at sunset.

02

Movements & gestures

Add specific detail about how things move.

e.g. “he steps carefully over wet stones, coat flaring

03

Appearances

Describe characters/objects precisely.

e.g. “weathered face, heavy oilskin coat, lantern in hand

04

Environment

Background and setting detail.

e.g. “waves crashing in the foreground, seabirds overhead

05

Camera

Angles and movement.

e.g. “the camera tracks alongside him with smooth lateral movement

06

Lighting & colour

Light, grade, atmosphere.

e.g. “golden light gradually intensifies toward the horizon

07

Changes / events

Note shifts or sudden events over time.

e.g. “the sun dips and the sky shifts to deep amber

03Section

Text / image / audio

The mode decides where your detail goes:

ModeWhere to put the detail
Text-to-videoDescribe everything — subject, action, environment, lighting, camera, audio. Detail is your primary lever.
Image-to-videoDescribe the motion and what happens next — not the static elements already visible in the image.
KeyframeProvide guiding keyframes; describe the transition between them for smoother interpolation.
Audio-to-videoThe audio anchors the timing; use the prompt to describe the visual interpretation of that audio.
Image-to-video: prompt the change, not the picture
fal.ai frames an image-to-video prompt as instructions for temporal evolution in three layers: subject action (what moves and how) → camera movement (perspective shifts, in cinematographic terms) → environmental dynamics (atmospheric elements that sell time passing). Re-describing the input image just wastes the prompt.
04Section

Camera & motion

LTX understands standard cinematographic terms. Pair a move with an intensity word.

MoveWhat it does
DollyCamera moves toward or away from the subject
TrackCamera moves laterally alongside the subject
PanCamera rotates horizontally on a fixed axis
TiltCamera rotates vertically on a fixed axis
OrbitCamera moves in an arc around the subject

Intensity qualifiers

subtlegentleslightsteadygradualsmoothdramaticrapidsweeping

Useful vocabulary the guide cites: shot types like macro lens, tracking shot, wide establishing shot, over-the-shoulder; movement verbs like follows, circles around, pushes in, pulls back; and depth/light cues like shallow depth of field, backlighting, rim light.

05Section

Audio & dialogue

Because LTX-2 generates sound with the picture, audio deserves real prompt attention. Describe the acoustic environment, the character's voice qualities, and any ambient sounds you want.

Dialogue formatting
Break dialogue into short phrases with acting directions between each line, and use physical cues rather than emotional labels to direct performance — e.g. “he pauses, looks to the side, then continues speaking with a cracking voice.” Put spoken lines in quotes and specify language/accent when needed.
06Section

Settings — keep these out of the prompt text

Numbers belong in settings, not the prose
A documented LTX mistake is “numerical specifications instead of natural language”. Keep resolution, fps, and step counts in the generation settings — write the prompt in plain descriptive language.
SettingDocumented value / guidance
Resolution / FPSUp to 4K; 24 / 25 / 48 / 50 FPS (longer clips may require 25 FPS + 1080p).
DurationHosted flows support up to ~20 seconds.
Frame countMust follow an 8k+1 format (e.g. 97, 121, 193).
Steps≈40 default; distilled variants run far fewer for speed.
CFG scaleTypical 2.0–5.0 (1.0 disables).
STG scaleSpatio-temporal guidance for coherence, typical 0.5–1.5.
Negative promptDocumented default: “worst quality, low quality, blurry, distorted”.

Performance variants: a Fast flow for quick feedback loops and a Pro flow for high-fidelity, stable, detailed results across longer sequences.

07Section

Common mistakes

Too vague

Symptom — “A nice video of nature.”

  • Add specific subject, action, environment, camera, and lighting detail — vagueness gives the model nothing to hold onto.

Over-constrained

Symptom — Numerical specs jammed into the prompt text.

  • Move numbers to settings; write the prompt in natural descriptive language.

Mismatched duration

Symptom — A 10-word prompt for a 10-second video.

  • Scale prompt detail to clip length so the model has enough direction to fill the time.

Conflicting directions

Symptom — “A still, peaceful lake with dramatic waves crashing.”

  • Keep the description internally consistent — contradictory cues fight each other.

Redundant description (image-to-video)

Symptom — Re-describing what's already in the input image.

  • Describe only what changes — motion, camera, light — not the static frame.

Vague camera language

Symptom — “The camera moves around.”

  • Name the move (dolly / track / pan / tilt / orbit) and an intensity (gentle / steady / sweeping).
08Section

Example prompts

Quoted from fal.ai's official LTX-2 image-to-video guide — note the level of detail.

Image-to-video — lighthouse

Do this

A lighthouse keeper walks along the rocky shore at sunset. The camera tracks alongside him with smooth lateral movement. Waves crash against the rocks in the foreground while seabirds circle overhead, and the golden light gradually intensifies as the sun dips toward the horizon.

Image-to-video — subtle portrait

Do this

The subject's eyes slowly shift to look directly at the camera. A gentle breeze causes strands of hair to move softly across her face. Natural light from a window creates subtle shadows that shift imperceptibly.

Your turn

Write the whole shot — in one detailed paragraph.

LTX-2 lives inside Ekly. Describe subject, motion, camera, light, and sound — and let the detail do the work.