LTX-2 (Lightricks) is a DiT-based foundation model that generates synchronised audio and video together — up to 4K, with fast and high-fidelity modes. It has one strong, well-documented preference that sets it apart: it wants long, detailed prompts.
This guide follows Lightricks' official LTX-2 README and prompt guide, plus fal.ai's LTX-2 image-to-video guidance. Short, vague prompts under-perform here — the detail is the control.
The golden rule
“LTX responds best to long, detailed prompts — the more specific you are about subject, action, lighting, camera movement, and audio, the closer the output matches your vision.” Longer, descriptive prompts consistently beat short ones.
One flowing paragraph
Present-tense verbs, literal and precise. Start directly with the action — no preamble.
Match length to duration
A 10-word prompt can't fill a 10-second clip. For 8–10s, write enough detail to fill the time.
enhance_prompt option) that uses an LLM to expand a short prompt — handy when you want detail without writing it all by hand.The 7-step formula
Lightricks' official ordering for building that paragraph:
Main action
Open with the core action in a single sentence.
e.g. “A lighthouse keeper walks along the rocky shore at sunset.”
Movements & gestures
Add specific detail about how things move.
e.g. “he steps carefully over wet stones, coat flaring”
Appearances
Describe characters/objects precisely.
e.g. “weathered face, heavy oilskin coat, lantern in hand”
Environment
Background and setting detail.
e.g. “waves crashing in the foreground, seabirds overhead”
Camera
Angles and movement.
e.g. “the camera tracks alongside him with smooth lateral movement”
Lighting & colour
Light, grade, atmosphere.
e.g. “golden light gradually intensifies toward the horizon”
Changes / events
Note shifts or sudden events over time.
e.g. “the sun dips and the sky shifts to deep amber”
Text / image / audio
The mode decides where your detail goes:
| Mode | Where to put the detail |
|---|---|
| Text-to-video | Describe everything — subject, action, environment, lighting, camera, audio. Detail is your primary lever. |
| Image-to-video | Describe the motion and what happens next — not the static elements already visible in the image. |
| Keyframe | Provide guiding keyframes; describe the transition between them for smoother interpolation. |
| Audio-to-video | The audio anchors the timing; use the prompt to describe the visual interpretation of that audio. |
Camera & motion
LTX understands standard cinematographic terms. Pair a move with an intensity word.
| Move | What it does |
|---|---|
| Dolly | Camera moves toward or away from the subject |
| Track | Camera moves laterally alongside the subject |
| Pan | Camera rotates horizontally on a fixed axis |
| Tilt | Camera rotates vertically on a fixed axis |
| Orbit | Camera moves in an arc around the subject |
Intensity qualifiers
Useful vocabulary the guide cites: shot types like macro lens, tracking shot, wide establishing shot, over-the-shoulder; movement verbs like follows, circles around, pushes in, pulls back; and depth/light cues like shallow depth of field, backlighting, rim light.
Audio & dialogue
Because LTX-2 generates sound with the picture, audio deserves real prompt attention. Describe the acoustic environment, the character's voice qualities, and any ambient sounds you want.
Settings — keep these out of the prompt text
| Setting | Documented value / guidance |
|---|---|
| Resolution / FPS | Up to 4K; 24 / 25 / 48 / 50 FPS (longer clips may require 25 FPS + 1080p). |
| Duration | Hosted flows support up to ~20 seconds. |
| Frame count | Must follow an 8k+1 format (e.g. 97, 121, 193). |
| Steps | ≈40 default; distilled variants run far fewer for speed. |
| CFG scale | Typical 2.0–5.0 (1.0 disables). |
| STG scale | Spatio-temporal guidance for coherence, typical 0.5–1.5. |
| Negative prompt | Documented default: “worst quality, low quality, blurry, distorted”. |
Performance variants: a Fast flow for quick feedback loops and a Pro flow for high-fidelity, stable, detailed results across longer sequences.
Common mistakes
Too vague
Symptom — “A nice video of nature.”
- Add specific subject, action, environment, camera, and lighting detail — vagueness gives the model nothing to hold onto.
Over-constrained
Symptom — Numerical specs jammed into the prompt text.
- Move numbers to settings; write the prompt in natural descriptive language.
Mismatched duration
Symptom — A 10-word prompt for a 10-second video.
- Scale prompt detail to clip length so the model has enough direction to fill the time.
Conflicting directions
Symptom — “A still, peaceful lake with dramatic waves crashing.”
- Keep the description internally consistent — contradictory cues fight each other.
Redundant description (image-to-video)
Symptom — Re-describing what's already in the input image.
- Describe only what changes — motion, camera, light — not the static frame.
Vague camera language
Symptom — “The camera moves around.”
- Name the move (dolly / track / pan / tilt / orbit) and an intensity (gentle / steady / sweeping).
Example prompts
Quoted from fal.ai's official LTX-2 image-to-video guide — note the level of detail.
Image-to-video — lighthouse
A lighthouse keeper walks along the rocky shore at sunset. The camera tracks alongside him with smooth lateral movement. Waves crash against the rocks in the foreground while seabirds circle overhead, and the golden light gradually intensifies as the sun dips toward the horizon.
Image-to-video — subtle portrait
The subject's eyes slowly shift to look directly at the camera. A gentle breeze causes strands of hair to move softly across her face. Natural light from a window creates subtle shadows that shift imperceptibly.
Built from official sources
Your turn
Write the whole shot — in one detailed paragraph.
LTX-2 lives inside Ekly. Describe subject, motion, camera, light, and sound — and let the detail do the work.