The Complete Guide to Prompting LTX-2

LTX-2 (Lightricks) is a DiT-based foundation model that generates synchronised audio and video together — up to 4K, with fast and high-fidelity modes. It has one strong, well-documented preference that sets it apart: it wants long, detailed prompts.

This guide follows Lightricks' official LTX-2 README and prompt guide, plus fal.ai's LTX-2 image-to-video guidance. Short, vague prompts under-perform here — the detail is the control.

Think like a cinematographer

Lightricks' own words: write a single flowing paragraph of detailed, chronological description — specific movements, appearances, camera angles, and environmental details. Start with the action, keep it literal and precise, and stay within ~200 words.

01Section

The golden rule

“LTX responds best to long, detailed prompts — the more specific you are about subject, action, lighting, camera movement, and audio, the closer the output matches your vision.” Longer, descriptive prompts consistently beat short ones.

One flowing paragraph

Present-tense verbs, literal and precise. Start directly with the action — no preamble.

Match length to duration

A 10-word prompt can't fill a 10-second clip. For 8–10s, write enough detail to fill the time.

Let the model expand a sketch

LTX pipelines support automatic prompt enhancement (an enhance_prompt option) that uses an LLM to expand a short prompt — handy when you want detail without writing it all by hand.

02Section

The 7-step formula

Lightricks' official ordering for building that paragraph:

Main action

Open with the core action in a single sentence.

e.g. “A lighthouse keeper walks along the rocky shore at sunset.”

Movements & gestures

Add specific detail about how things move.

e.g. “he steps carefully over wet stones, coat flaring”

Appearances

Describe characters/objects precisely.

e.g. “weathered face, heavy oilskin coat, lantern in hand”

Environment

Background and setting detail.

e.g. “waves crashing in the foreground, seabirds overhead”

Camera

Angles and movement.

e.g. “the camera tracks alongside him with smooth lateral movement”

Lighting & colour

Light, grade, atmosphere.

e.g. “golden light gradually intensifies toward the horizon”

Changes / events

Note shifts or sudden events over time.

e.g. “the sun dips and the sky shifts to deep amber”

03Section

Text / image / audio

The mode decides where your detail goes:

Mode	Where to put the detail
Text-to-video	Describe everything — subject, action, environment, lighting, camera, audio. Detail is your primary lever.
Image-to-video	Describe the motion and what happens next — not the static elements already visible in the image.
Keyframe	Provide guiding keyframes; describe the transition between them for smoother interpolation.
Audio-to-video	The audio anchors the timing; use the prompt to describe the visual interpretation of that audio.

Image-to-video: prompt the change, not the picture

fal.ai frames an image-to-video prompt as instructions for temporal evolution in three layers: subject action (what moves and how) → camera movement (perspective shifts, in cinematographic terms) → environmental dynamics (atmospheric elements that sell time passing). Re-describing the input image just wastes the prompt.

04Section

Camera & motion

LTX understands standard cinematographic terms. Pair a move with an intensity word.

Move	What it does
Dolly	Camera moves toward or away from the subject
Track	Camera moves laterally alongside the subject
Pan	Camera rotates horizontally on a fixed axis
Tilt	Camera rotates vertically on a fixed axis
Orbit	Camera moves in an arc around the subject

Intensity qualifiers

subtlegentleslightsteadygradualsmoothdramaticrapidsweeping

Useful vocabulary the guide cites: shot types like macro lens, tracking shot, wide establishing shot, over-the-shoulder; movement verbs like follows, circles around, pushes in, pulls back; and depth/light cues like shallow depth of field, backlighting, rim light.

05Section

Audio & dialogue

Because LTX-2 generates sound with the picture, audio deserves real prompt attention. Describe the acoustic environment, the character's voice qualities, and any ambient sounds you want.

Dialogue formatting

Break dialogue into short phrases with acting directions between each line, and use physical cues rather than emotional labels to direct performance — e.g. “he pauses, looks to the side, then continues speaking with a cracking voice.” Put spoken lines in quotes and specify language/accent when needed.

06Section

Settings — keep these out of the prompt text

Numbers belong in settings, not the prose

A documented LTX mistake is “numerical specifications instead of natural language”. Keep resolution, fps, and step counts in the generation settings — write the prompt in plain descriptive language.

Setting	Documented value / guidance
Resolution / FPS	Up to 4K; 24 / 25 / 48 / 50 FPS (longer clips may require 25 FPS + 1080p).
Duration	Hosted flows support up to ~20 seconds.
Frame count	Must follow an 8k+1 format (e.g. 97, 121, 193).
Steps	≈40 default; distilled variants run far fewer for speed.
CFG scale	Typical 2.0–5.0 (1.0 disables).
STG scale	Spatio-temporal guidance for coherence, typical 0.5–1.5.
Negative prompt	Documented default: “worst quality, low quality, blurry, distorted”.

Performance variants: a Fast flow for quick feedback loops and a Pro flow for high-fidelity, stable, detailed results across longer sequences.

07Section

Common mistakes

Too vague

Symptom — “A nice video of nature.”

Add specific subject, action, environment, camera, and lighting detail — vagueness gives the model nothing to hold onto.

Over-constrained

Symptom — Numerical specs jammed into the prompt text.

Move numbers to settings; write the prompt in natural descriptive language.

Mismatched duration

Symptom — A 10-word prompt for a 10-second video.

Scale prompt detail to clip length so the model has enough direction to fill the time.

Conflicting directions

Symptom — “A still, peaceful lake with dramatic waves crashing.”

Keep the description internally consistent — contradictory cues fight each other.

Redundant description (image-to-video)

Symptom — Re-describing what's already in the input image.

Describe only what changes — motion, camera, light — not the static frame.

Vague camera language

Symptom — “The camera moves around.”

Name the move (dolly / track / pan / tilt / orbit) and an intensity (gentle / steady / sweeping).

08Section

Example prompts

Quoted from fal.ai's official LTX-2 image-to-video guide — note the level of detail.

Image-to-video — lighthouse

✓ Do this

A lighthouse keeper walks along the rocky shore at sunset. The camera tracks alongside him with smooth lateral movement. Waves crash against the rocks in the foreground while seabirds circle overhead, and the golden light gradually intensifies as the sun dips toward the horizon.

Image-to-video — subtle portrait

✓ Do this

The subject's eyes slowly shift to look directly at the camera. A gentle breeze causes strands of hair to move softly across her face. Natural light from a window creates subtle shadows that shift imperceptibly.

Built from official sources

Your turn

Write the whole shot — in one detailed paragraph.

LTX-2 lives inside Ekly. Describe subject, motion, camera, light, and sound — and let the detail do the work.

Start creating with Ekly