Prompt execution model

Summary

A prompt run applies one prompt to one target with explicit execution settings. Segment-level extraction and video-level synthesis are related but distinct stages.

A prompt run is the execution boundary for prompt processing. It ties together the target, segmentation settings, optional transcription and image embeddings, and the final results.

Segment-driven video analysis keeps timestamped segment evidence as the primary review layer. Video-level synthesis can then summarize selected segment fields without replacing the segment records that search, filters, recovery, and exports depend on.

Targets

Prompt runs accept three public target modes:

index: run against all eligible media in an index
videos: run against a selected list of media IDs, optionally scoped to an index
playground: run against playground media

Use index when the collection itself is the workflow boundary. Use videos when an operator or upstream system has already chosen the exact media items to process.

Segment-level extraction

The segment-level prompt is always driven by:

prompt_text
json_schema
segmentation settings in the run request

Every processed segment produces a structured result plus metadata_text, which is used throughout the public search surface.

Video-level synthesis

Video-level synthesis is optional and only runs when the prompt definition includes video_level.

The video-level step:

runs after segment-level results exist for a media item
receives the selected included_segment_fields
produces a single media-wide output for that item
does not replace segment results

Use video-level synthesis for whole-program or whole-asset rollups. Use segment-level output for precise evidence and retrieval.

Segmentation modes

Video segmentation

Video prompt runs support:

smart
fixed
content_aware

fixed also requires video_segment_duration.

Audio segmentation

Audio prompt runs support:

content_aware
fixed

fixed also requires audio_segment_duration.

Images

Images are processed as single-item media with image segmentation semantics rather than time-based segment selection.

Transcription and image embeddings

Prompt run requests also control two important side behaviors:

enable_transcription
enable_image_embedding

These flags affect public search and run outputs:

transcription contributes searchable text and transcription success/failure state
image embeddings enable visual retrieval workflows

Note

Transcription and image embeddings are run settings, not prompt-definition settings. Different runs of the same prompt can choose different values for those flags.

Lifecycle and retry behavior

A run moves through terminal and non-terminal states such as pending, processing, completed, completed_with_failures, failed, and cancelled.

Public lifecycle controls include:

estimate a run without starting it
execute the run
poll or stream status
cancel the run
inspect failed segments
retry a failed segment without creating a replacement run

Example run request

{
  "prompt_id": "prompt_episode_extract",
  "target": {
    "type": "videos",
    "index_id": "idx_archive",
    "video_ids": ["vid_001", "vid_002"]
  },
  "video_segmentation_type": "smart",
  "audio_segmentation_type": "content_aware",
  "processing_model": "gemini-2.5-flash",
  "enable_transcription": true,
  "enable_image_embedding": true
}

Choosing segment-level versus video-level output

Use segment-level output when:

search precision matters
downstream review needs timestamps
the output should be filterable at evidence level

Use video-level output when:

the user needs one answer per media item
the result depends on combining segment evidence
the downstream consumer wants a rollup, not the raw evidence set

Prompt execution model

Search documentation

Related documentation