VectorMethods

Docs / Concepts

Prompt schema design

Design VideoVector prompt schemas with nested objects, repeated fields, semantic indexing controls, and video-level synthesis contracts.

sdk/videovector/resources/prompts.pyfrontend/src/components/prompt-schema/schemaFields.tsmcp-server/src/tools/definitions.ts

Search documentation

Search pages, API reference sections, and guide headings.

Summary

Prompt schemas define what gets extracted, what becomes searchable, and what can be reused in video-level synthesis. This page covers the public schema rules and field-path conventions.

VideoVector uses JSON Schema with a root object to define the output contract for prompt extraction.

Schema-aware metadata extraction means the prompt output is shaped by your JSON schema before analysis runs. Video, audio, and image workflows can return nested schema outputs, repeated object paths, and metadata_text that later supports search, filters, exports, and video-level synthesis.

Segment-level schema

The segment-level schema lives in json_schema.

  • The root type must be object.
  • Fields can be primitive values, objects, or arrays.
  • Nested objects and repeated objects are supported and surfaced in search and filter tooling.
  • The schema is part of the prompt definition, not part of the prompt run request.
{
  "type": "object",
  "properties": {
    "summary": { "type": "string" },
    "scene": {
      "type": "object",
      "properties": {
        "location": { "type": "string" },
        "people": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": { "type": "string" },
              "emotion": { "type": "string" }
            }
          }
        }
      }
    }
  }
}

Nested fields and repeated object paths

Nested data is addressable in public search and filter tooling through canonical field paths:

  • scene.location
  • scene.people[].name
  • scene.people[].emotion

Repeated objects use [] in field paths. That same path form appears in condition-style filtering and in search field selection.

Reserved field-name rules

Prompt field names are part of the public query contract. Avoid special characters that break field-path parsing.

  • Do not use ., [ or ] inside field names.
  • Do not use reserved internal field names such as __pydantic_extra__.
  • Dictionary-like object fields that disallow additional properties should still define at least one nested field.

Semantic indexing controls

Semantic indexing is configured at the prompt level:

  • disabled_segment_fields
  • disabled_video_level_fields

Use those lists when you want fields to remain in structured output but stay out of semantic embedding.

Typical reasons to disable a field:

  • the field is high-volume but low-value for semantic retrieval
  • the field contains internal bookkeeping
  • the field is dynamic or sparse enough that embedding it would add noise

Video-level schema

The optional video_level block adds a second schema for media-wide synthesis:

  • instructions_text: the video-wide instruction
  • included_segment_fields: segment fields supplied to the synthesis step
  • json_schema: the video-level output contract

included_segment_fields can reference declared segment fields and certain system-provided fields such as transcription and metadata_text. Timing context such as start_time and end_time is automatically included where applicable.

{
  "instructions_text": "Summarize the full program and identify the primary incident timeline.",
  "included_segment_fields": ["summary", "scene.people[].emotion", "transcription"],
  "json_schema": {
    "type": "object",
    "properties": {
      "program_summary": { "type": "string" },
      "incident_timeline": {
        "type": "array",
        "items": { "type": "string" }
      }
    }
  }
}

Schema testing

The public surface includes schema validation endpoints and matching SDK/MCP methods so you can validate sample data against a schema before storing the prompt. Use that validation step whenever the schema includes nested objects, repeated arrays, or constrained types.

Design recommendations

  • Put evidence-level facts in segment output.
  • Put rollups, totals, and cross-segment conclusions in video_level.
  • Keep field names operationally useful because they appear later in search, filters, and exports.
  • Treat schema changes as contract changes. Existing prompt runs keep their own historical output shape.

Related documentation

This guide shows how to define a prompt with nested and repeated fields, validate the schema, and keep the output shape usable for search and filtering.

Add a second prompt layer that rolls segment evidence into one result per media item without replacing the segment-level output.

API reference

Prompts define the extraction contract. The public API supports prompt CRUD, schema testing, usage inspection, and prompt-definition draft generation.