# The Vectros data model

> Concept and mental model: what records, schemas, documents, folders, lookups, and
> version history are — and why the platform is shaped this way.

Vectros gives an application one coherent place to keep its structured data, its
documents, and the relationships between them — schema-validated, isolated per customer,
searchable in a single query stream, and carrying a full version history. This page
explains the pieces and how they fit together. For runnable guides see
[how-to.md](how-to.md); for the exhaustive field/option/limit list see
[reference.md](reference.md).

## The shape of the problem

Most applications built on top of an AI or knowledge layer need more than a search
index but less than a full relational database. A session note has a date, a provider,
a client, and a status. A lead has an email, a source, and a qualification score. A
support ticket has a priority, an assignee, and a body. You want that data typed and
validated, you want to look an entity up by a known key, you want it to come back in
natural-language search alongside your uploaded files, and — in regulated settings —
you want every change recorded.

Vectros models this with four resource families that share one isolation model, one
indexing pipeline, and one audit mechanism:

- **Schemas** declare the shape of a type — its fields, validation rules, which fields
  are searchable, which support indexed lookups, and which are sensitive.
- **Records** are structured JSON entities written against a schema (or schemaless).
- **Documents** are text or files you ingest — chunked, embedded, and indexed for
  retrieval, optionally carrying a structured payload of their own.
- **Folders** give documents and records a shared organizational hierarchy.

Around those four sit **lookups and references** (find an entity by a known value;
link one record to another) and **version history** (an immutable trail of every
write to an audited type).

## Schemas: define the shape once

A schema declares one type — a `typeName`, a `displayName`, and a set of field
definitions. Each field has a type (`string`, `number`, `boolean`, `date`, `enum`,
`array`, `object`, or `reference`), optional validation rules, and flags that decide
how it participates in the platform:

- `searchable: true` routes the field's text into the full-text search lane.
- `filterable: true` makes the field available as a search filter without it
  influencing relevance ranking.
- `sensitive: true` marks the field as PHI/PII — it is redacted at write, excluded
  from the search index, blind-indexed for lookups, and masked on read unless the
  caller's token carries the reveal scope (see [Sensitivity](#sensitivity-three-distinct-mechanisms)).

A schema also declares its **index mode** (`HYBRID`, `SEMANTIC`, `TEXT`, or `NONE`),
its **lookup fields**, the **surfaces** it may bind to (`record`, `document`, `user`,
`org`, `client`), and per-type **capabilities** such as audit history.

Schemas are deliberately optional in spirit: a bare schema with nothing but a
`typeName` and `displayName` is valid. Records written against a bare schema are stored
as-is with no payload validation. This removes the usual prototype-vs-production
fork — you get the same API surface whether you are sketching with free-form JSON or
running a fully-validated production type, and you can add fields incrementally as the
data model stabilizes (adding a field is non-breaking).

Schemas are versioned. Every edit increments a public revision counter, and each
record or document is stamped at write with the schema version that governed it — so a
record keeps the version it was written under even after the schema evolves.

> **Note:** schemas are replaced in full on update (PUT). There is no partial-update
> (PATCH) path for schemas — supply the complete intended schema body on every update.

## Records: structured entities

A record is a JSON `payload` written against a schema's `typeName`. On create, the
payload is validated against the schema's field definitions; required fields, types,
enum membership, and validation rules (length, range, pattern) are enforced before the
record is persisted. The record carries optional ownership fields (`userId`, `orgId`,
`clientId`) and an optional `folderId`, plus a system-assigned id, timestamps, and a
monotonically-increasing version.

Records are first-class search content. A record's searchable fields flow into the same
retrieval pipeline as ingested documents, so a single natural-language query can surface
a structured intake record and a clinical note in the same ranked result set.

Two update modes exist:

- **PUT** is a whole-object replace of the mutable fields. An omitted field is
  preserved; the `payload`, when supplied, **replaces** the stored payload in full (it
  is not deep-merged). To change one payload field with PUT you must resend the entire
  payload.
- **PATCH (SDK 0.26+)** is a true partial update following RFC 7386 (JSON Merge Patch),
  applied to the record's `payload`. Inside `payload`, keys present in the patch overwrite,
  nested objects recurse, a key set to `null` **deletes** that key, and absent keys are
  preserved — the natural way to change a single field without read-modify-write. Note that
  a **top-level** mutable field (such as `status` or `folderId`) set to `null` is *not* a
  delete: it is rejected with `400`. Clearing a top-level field is not supported this release.

Records support **optimistic concurrency**: pass the `version` you last read back as
`expectedVersion`, and the write is rejected with a `409 VERSION_CONFLICT` if the
record moved on in the meantime. Omit it for last-write-wins.

Deleting a record is a hard delete: the row and all of its lookup rows are removed
atomically, a tombstone is recorded for audit, and the record is dropped from the
search index on the next cycle. There is no soft-delete status that lingers in the
index — workflow states are expressed through the `status` field instead.

## Documents: text and files

A document is content you ingest for retrieval. Two ingest paths exist:

- **Inline text ingest** — send the raw text directly. The platform chunks it, embeds
  the chunks, and indexes them. Optionally store the raw text for later retrieval.
- **File upload** — request a presigned URL, PUT the file bytes to it directly (no
  Authorization header on that PUT), then poll until indexing completes. Text is
  extracted from the file, then chunked, embedded, and indexed.

Documents carry the same ownership and folder fields as records, and — when bound to a
schema via `schemaId` — a validated, lookup-indexed structured `payload` of their own
(records parity). Undeclared payload keys pass through as free-form and remain available
as search filters. Like records, documents support PUT (full replace) and PATCH
(RFC 7386, SDK 0.26+), optimistic concurrency, version history, and hard delete.

Documents move through a processing lifecycle — uploaded, text extracted, queued for
indexing, then `INDEXED` (searchable) or `STORED` (store-only, when index mode is
`NONE`). An update re-runs the pipeline; the old content is removed from the index as
the new content is written.

## Folders: shared organization

Folders give documents and records a common hierarchy. A folder has a name, an optional
description, optional ownership fields, and a stable, path-derived slug that makes
folder creation idempotent — re-creating a folder at the same path converges on the same
folder rather than duplicating it. Each context has a protected root folder; unparented
folders are placed under it.

A folder's parent is set at creation only. There is **no move or reparent operation** —
a folder cannot currently be relocated in the hierarchy through the API. Deleting a
folder that still contains children is rejected; empty the folder first.

## Lookups and references

**Lookups** are direct, indexed retrievals by a declared field value — no scan. You
declare which fields are lookup fields on the schema (bare field names, or
`{fieldName, unique}` to enforce uniqueness). At write time the platform maintains a
small index row per lookup field; a lookup reads that index directly. Three lookup modes
are supported, **one mode per call**:

- **exact** — match a single value.
- **range** — an inclusive `from`/`to` range, returned in ascending order
  (non-sensitive fields only).
- **prefix** — a string-prefix match in ascending order (string, non-sensitive fields
  only).

Equality is the default; **range** and **prefix** require declaring the field
`rangeEnabled` when you author the schema. That choice — like a field's equality-vs-range
shape generally — is **migration-locked** once the schema is live, and equality lookups
draw on a small fixed budget of fast index slots (range lookups use a relationship row
instead), so choose deliberately.

Lookups come in unique and non-unique flavors. A `unique` lookup returns at most one
record (and the platform enforces uniqueness on write). A non-unique lookup is an
**enumeration** — it returns every record sharing that value, paginated.

Every record additionally gets three automatic ownership lookups (`userId`, `orgId`,
`clientId`) without declaring them and without counting against the lookup-field cap, so
you can list records owned by a user, org, or client out of the box.

For a **sensitive** lookup field, the value must not appear in a URL (where it could be
captured by access logs or proxies). The platform offers a body-based lookup variant so
the sensitive value travels in the request body and is blind-indexed server-side.

**References** are typed links between records: a `reference` field declares the
`targetTypeName` it points at, the `targetSurface` it lives on (`record`/`document`/
`user`/`org`/`client` — both required), the target field to resolve against, and a
cardinality (`one` or `many`). By default the target must exist at write time. The
platform can additionally maintain per-field **reverse-reference rows** (opt-in) so the
inverse direction is indexed.

> **Note:** the reverse-reference *list* endpoint is not yet available — do not depend on
> querying back-references through the API today. Batch record write/lookup/get
> operations are likewise reserved and not yet implemented.

## Version history: an immutable change trail

Every write to an audited type emits an immutable version row. Each row records the
change type (`CREATE`, `UPDATE`, `DELETE`), who made the change, when, the full snapshot
of the state *prior* to the change, and a field-level diff of what changed. The current
state always lives on the entity itself; version history captures the "before" side of
every transition, so you can reconstruct exactly how an entity reached its present
state.

Audit history is on by default and controlled per schema through a capability flag.
Turning it off for a high-volume, low-value type saves the write cost; tombstones on
delete are still recorded regardless. The audit trail *is* the data layer — it is not a
separate, bypassable logging path — and heavy historical content is externalized to a
write-once, retention-governed store. The data-model-facing view is simply: **you get
full version history for free on every audited type.** The deeper compliance posture —
retention, the tamper-evident continuity chain, and how sensitive data is handled across
all of this — lives in [the operations & trust documentation](../operations-trust/compliance.md).

## Isolation: built in, not bolted on

Every record, schema, document, and folder is partitioned by an auth-derived context
key that the caller cannot forge. Reads and lookups are confined to the caller's own
context, and cross-context access is closed at the data layer by construction rather
than by a runtime check that a later change could regress. A request that supplies a
known id belonging to another tenant or context gets the same uniform "not found"
response as a request for an id that does not exist — the error message is never a probe
channel.

Scoped credentials narrow access further. A token whose data scope names a specific
owner can only reach entities owned by that owner; tenant-level (owner-less) entities
are reached only by explicitly opting in. This is what makes a single Vectros tenant
safe to partition across many of *your* customers. The full access model is covered in
the identity & access documentation.

## Sensitivity: three distinct mechanisms

When a schema field is marked `sensitive`, three independent protections apply — it is
worth understanding that they are separate:

1. **Redacted at write.** The sensitive value is destroyed out of audit snapshots and
   change diffs before they are persisted. This is *not* reversible masking — the value
   is not recoverable from the audit trail regardless of any later grant.
2. **Excluded from the search index.** A sensitive field's value never enters the search
   index, so it can never surface through search.
3. **Masked on read.** In normal reads the field is returned masked (`[redacted]`)
   unless the caller's token explicitly carries the reveal scope for that type. Lookups
   on the field still work — it is blind-indexed — without exposing the value.

These combine so that a sensitive field is usable as a key (you can look a record up by
it) while its plaintext never lands in logs, the search index, or an unprivileged
response. The full treatment is in
[the operations & trust documentation](../operations-trust/compliance.md).

## Where to go next

- [how-to.md](how-to.md) — runnable guides: define a schema, write and update records,
  PATCH, lookup, paginate, ingest a document, create a folder, read version history.
- [reference.md](reference.md) — every method, field, validation rule, limit, envelope
  shape, and error, plus an honest "what each feature does not do."
- [../operations-trust/compliance.md](../operations-trust/compliance.md) — version
  history retention, the tamper-evident chain, and the full sensitive-data posture.
