Prompt schema design

Summary

Prompt schemas define what gets extracted, what becomes searchable, and what can be reused in video-level synthesis. This page covers the public schema rules and field-path conventions.

VideoVector uses JSON Schema with a root object to define the output contract for prompt extraction.

Schema-aware metadata extraction means the prompt output is shaped by your JSON schema before analysis runs. Video, audio, and image workflows can return nested schema outputs, repeated object paths, and metadata_text that later supports search, filters, exports, and video-level synthesis.

Segment-level schema

The segment-level schema lives in json_schema.

The root type must be object.
Fields can be primitive values, objects, or arrays.
Nested objects and repeated objects are supported and surfaced in search and filter tooling.
The schema is part of the prompt definition, not part of the prompt run request.

{
  "type": "object",
  "properties": {
    "summary": { "type": "string" },
    "scene": {
      "type": "object",
      "properties": {
        "location": { "type": "string" },
        "people": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": { "type": "string" },
              "emotion": { "type": "string" }
            }
          }
        }
      }
    }
  }
}

Nested fields and repeated object paths

Nested data is addressable in public search and filter tooling through canonical field paths:

scene.location
scene.people[].name
scene.people[].emotion

Repeated objects use [] in field paths. That same path form appears in condition-style filtering and in search field selection.

Tip

Use stable, explicit property names. Search, filter, and semantic indexing configuration are easier to manage when field paths remain predictable across prompt iterations.

Reserved field-name rules

Prompt field names are part of the public query contract. Avoid special characters that break field-path parsing.

Do not use ., [ or ] inside field names.
Do not use reserved internal field names such as __pydantic_extra__.
Dictionary-like object fields that disallow additional properties should still define at least one nested field.

Semantic indexing controls

Semantic indexing is configured at the prompt level:

disabled_segment_fields
disabled_video_level_fields

Use those lists when you want fields to remain in structured output but stay out of semantic embedding.

Typical reasons to disable a field:

the field is high-volume but low-value for semantic retrieval
the field contains internal bookkeeping
the field is dynamic or sparse enough that embedding it would add noise

Video-level schema

The optional video_level block adds a second schema for media-wide synthesis:

instructions_text: the video-wide instruction
included_segment_fields: segment fields supplied to the synthesis step
json_schema: the video-level output contract

included_segment_fields can reference declared segment fields and certain system-provided fields such as transcription and metadata_text. Timing context such as start_time and end_time is automatically included where applicable.

{
  "instructions_text": "Summarize the full program and identify the primary incident timeline.",
  "included_segment_fields": ["summary", "scene.people[].emotion", "transcription"],
  "json_schema": {
    "type": "object",
    "properties": {
      "program_summary": { "type": "string" },
      "incident_timeline": {
        "type": "array",
        "items": { "type": "string" }
      }
    }
  }
}

Schema testing

The public surface includes schema validation endpoints and matching SDK/MCP methods so you can validate sample data against a schema before storing the prompt. Use that validation step whenever the schema includes nested objects, repeated arrays, or constrained types.

Design recommendations

Put evidence-level facts in segment output.
Put rollups, totals, and cross-segment conclusions in video_level.
Keep field names operationally useful because they appear later in search, filters, and exports.
Treat schema changes as contract changes. Existing prompt runs keep their own historical output shape.

Platform model

Go to previous page

Prompt execution model

Go to next page