# Agentic-SDLC Knowledge Base

> The flagship blueprint: a whole-SDLC **system of record** for an AI development team — architecture decisions, the controls that govern the code, the conventions and gotchas an agent needs at a cold start, the runbooks, the post-mortems — modeled as nine typed schemas, linked into a cross-surface knowledge graph, and recalled by hybrid search + grounded RAG. It's the blueprint we run our own engineering org on.

## What it is — your team's engineering memory, recalled by meaning

The most valuable non-code asset a software team has is the *reasoning* around the code: why a decision was made, the rule before you touch the key boundary, the runbook for a wedged deploy, the gotcha that bit you last quarter. That knowledge usually lives as fragile, un-indexed local markdown — one bad disk from gone, and invisible to the agents doing the work.

The `agentic-sdlc` blueprint puts it on Vectros instead, split by a single principle: **content lives as documents, structure lives as records.** ADRs, designs, references, runbooks, and post-mortems are *documents* — the prose is the artifact, so you read it and ask it questions. Controls, conventions, gotchas, and the glossary are *records* — the typed fields are the artifact, so you query and enumerate them. They link into one **cross-surface knowledge graph**, and the whole thing is recalled by meaning: *"why is it shaped this way?"* returns the actual decision, cited.

This is the blueprint we dogfood internally — our own engineering org runs on it — which makes it both the most complete demonstration of the platform (typed records + deterministic lookup + a typed reference graph + hybrid search + grounded RAG + records-and-documents unified + the dual human/agent surface + governance, in one runnable use case) and the most credible.

## What `bootstrap` provisions

One command stands up everything below — no application code.

- **Nine schemas, split content vs structure:**
  - **Documents** (the markdown body is the artifact, searched + answered over): **`decision`** (ADRs), **`design`**, **`reference`**, **`runbook`**, **`postmortem`**. Each carries typed metadata (summary, status, area, tags, date) and a stable `externalId`, with a range/sort index on its date.
  - **Records** (typed fields, exact-queryable): **`control`** (a governance boundary that records its own `evidence`), **`convention`** (distinct `rule` / `why` / `howToApply` fields), **`gotcha`** (symptom → cause → fix), and **`term`** (a glossary entry with a `unique` exact-lookup).
- **A cross-surface knowledge graph** — typed `reference` edges where **records point at documents** (`control.verifiedBy` → the `runbook` that proves it; `convention.establishedBy` / `term.relatedDecision` → the `decision` behind them) and **documents point at documents** (`decision.supersedes`, `design.relatedDecision`, `runbook.bornFrom` → a `postmortem`). Provenance is navigable, not just searchable.
- **A least-privilege access profile** — `records:r/c/u`, `search:r`, `schemas:r`, `inference:r`, `documents:r/c`, `folders:r/c`. Note the **deliberate absence of delete**: knowledge is superseded or retired via a status flip, so the audit trail of how the team's thinking changed stays intact.
- **No bundled seeds.** The blueprint ships seedless — it provisions the nine schemas and a scoped key, and you fill it from your own corpus (next section). So the context starts clean; there's nothing synthetic to remove.

The apply is idempotent: re-running converges rather than duplicating, because every item is keyed by its `externalId` — so the knowledge base is **rebuildable** from source at any time.

## Before you start

This is an invite-only 0.x preview, so the honest prerequisites: you need an **early-access invite**, and from the dev portal you mint a short-lived **bridge token** — the human step that authenticates the CLI before it can provision anything. You also need **Node** (the CLI and MCP server run via `npx`).

## 1. Bootstrap the blueprint (seedless)

```bash
npm i -g @vectros-ai/cli      # or prefix the commands below with: npx -y @vectros-ai/cli
vectros login                 # one-time, browser sign-in
vectros bootstrap --blueprint agentic-sdlc --no-seed --yes
vectros whoami                # confirm tenant + scoped key
```

This provisions the context + the nine schemas + a least-privilege `ssk_*` (written once to `~/.vectros/agentic-sdlc.key.json`) and safe-merges the Vectros MCP server into your **Claude Desktop** config — use `--client code` for **Claude Code**, or `--print` to emit the snippet for any other MCP client. Add `--tenant test` to provision into the test tenant first for a dry run.

## 2. Ingest your corpus (agent-driven, idempotent)

There are two ingest paths, by surface — both driven by an **ingest agent** pointed at your source files with the bundled orientation prompt (an LLM maps your semi-structured docs to the right type far better than a brittle parser, and it's idempotent by `externalId`).

- **Documents** — the prose artifacts keep their markdown **body as-is**; the agent fills the typed metadata and calls `document_ingest` against the matching schema (`decision` / `design` / `reference` / `runbook` / `postmortem`), with `payload` carrying `summary`, `status`, `area`, `tags`, `date`, and any references.
- **Records** — the structured artifacts are typed fields, not prose; the agent extracts them and calls `record_create` per item (a `convention`'s `rule`/`why`/`howToApply`, a `control`'s `evidence` + `verifiedBy` runbook, a `term`'s unique key + definition).

Cross-surface edges resolve by the target's `externalId`, so **ingest the referenced documents before the records that point at them.** A one-shot backfill is that agent looped over your `docs/`, ADRs, and memory; an ongoing sync re-runs it on change — the same `externalId`s converge.

A bulk backfill is exactly the workload that trips the API's **per-minute rate limit** (per tenant, counting writes + searches, shared across all of a tenant's keys). Pace the ingest — for the free tier, roughly one record every couple of seconds is safe — and on an HTTP **429**, honor the `Retry-After` header. Because ingest is idempotent, a backfill that pauses or restarts simply converges; it never double-writes.

## 3. Query it — the recall payoff

| You want… | Call |
|---|---|
| "Why did we decide X?" (grounded, cited) | `rag_ask "why did we choose X?"` — answers over document bodies |
| "Which critical controls are active, and how is each proven?" | `record_query control { criticality:"critical", status:"active" }` → follow `verifiedBy` to the runbook |
| "What's the active rule for area X?" | `record_query convention { area:"<area>", status:"active" }` |
| "Have we hit this failure before?" | `hybrid_search "<symptom>" contentTypes:["documents"], typeName:"postmortem"`; plus `record_query gotcha { area:"deploy", status:"active" }` |
| "Define X" | `record_query term { term:"X" }` (unique lookup) |
| "Latest decisions / search the designs" | `hybrid_search "<topic>" contentTypes:["documents"], typeName:"decision"` (or `"design"`) |

Recall by meaning (`hybrid_search` / `rag_ask`) and deterministic enumeration (`record_query` by lookup field) are both first-class — a knowledge base needs both: search to surface the relevant decision you'd never remember by filename, lookup to enumerate a known slice ("every active control").

## 4. The dual surface — agents capture, your team browses

The same typed context is reachable two ways. **Agents** capture and recall over MCP — the bundled orientation prompt wires the recall-before-acting / capture-after loop, so a cold-start session reads the conventions before it writes code and records the new decision after. **Your team** browses the exact same records and documents in the data-plane app — no separate export, no second copy. One governed store, two surfaces.

## 5. Bridge your issue tracker — don't mirror it

Your tracker (GitLab, Jira, Linear) and this knowledge base are **two planes with different jobs**: the tracker owns *live status* (open/closed, assignee — volatile); the knowledge base owns *durable recall* (why/how/lessons — stable). Mirroring issues in creates a stale shadow copy that buries your decisions under issue churn. Instead, **promote by reference**: when you close out work that carries durable knowledge, distill it into the right type, tag it `issue:<id>`, and note the `externalId` back in the tracker. Be selective — most issues promote nothing; that selectivity is what keeps recall high-signal.

## Keep it healthy

- **Record the why** — rationale is the most-recalled field; a statement without it is a log entry, not knowledge.
- **Supersede, don't delete** — flip `status` so the evolution trail survives (the access profile has no delete by design).
- **Re-ingest is idempotent** — keyed on `externalId`, so a backfill converges and the whole knowledge base can be rebuilt from source at any time.

## Customize

This is a starting point — fork it for your org. Swap the `area` vocabulary for your subsystems; adjust the `status` / `severity` enums to your lifecycle; add or remove schemas (content-heavy types belong on the document surface, structure-heavy types are records — add a separate type when the *shape* differs or a first-class type strengthens *references*, not for a near-identical clone). Note that lookups are **migration-locked** — the equality-vs-range choice is fixed once a schema is live, so choose deliberately.
