Agentic-SDLC Knowledge Base

The flagship blueprint: a whole-SDLC system of record for an AI development team — architecture decisions, the controls that govern the code, the conventions and gotchas an agent needs at a cold start, the runbooks, the post-mortems — modeled as nine typed schemas, linked into a cross-surface knowledge graph, and recalled by hybrid search + grounded RAG. It's the blueprint we run our own engineering org on.

What it is — your team's engineering memory, recalled by meaning

The most valuable non-code asset a software team has is the reasoning around the code: why a decision was made, the rule before you touch the key boundary, the runbook for a wedged deploy, the gotcha that bit you last quarter. That knowledge usually lives as fragile, un-indexed local markdown — one bad disk from gone, and invisible to the agents doing the work.

The agentic-sdlc blueprint puts it on Vectros instead, split by a single principle: content lives as documents, structure lives as records. ADRs, designs, references, runbooks, and post-mortems are documents — the prose is the artifact, so you read it and ask it questions. Controls, conventions, gotchas, and the glossary are records — the typed fields are the artifact, so you query and enumerate them. They link into one cross-surface knowledge graph, and the whole thing is recalled by meaning: "why is it shaped this way?" returns the actual decision, cited.

This is the blueprint we dogfood internally — our own engineering org runs on it — which makes it both the most complete demonstration of the platform (typed records + deterministic lookup + a typed reference graph + hybrid search + grounded RAG + records-and-documents unified + the dual human/agent surface + governance, in one runnable use case) and the most credible.

What bootstrap provisions

One command stands up everything below — no application code.

  • Nine schemas, split content vs structure:
    • Documents (the markdown body is the artifact, searched + answered over): decision (ADRs), design, reference, runbook, postmortem. Each carries typed metadata (summary, status, area, tags, date) and a stable externalId, with a range/sort index on its date.
    • Records (typed fields, exact-queryable): control (a governance boundary that records its own evidence), convention (distinct rule / why / howToApply fields), gotcha (symptom → cause → fix), and term (a glossary entry with a unique exact-lookup).
  • A cross-surface knowledge graph — typed reference edges where records point at documents (control.verifiedBy → the runbook that proves it; convention.establishedBy / term.relatedDecision → the decision behind them) and documents point at documents (decision.supersedes, design.relatedDecision, runbook.bornFrom → a postmortem). Provenance is navigable, not just searchable.
  • A least-privilege access profilerecords:r/c/u, search:r, schemas:r, inference:r, documents:r/c, folders:r/c. Note the deliberate absence of delete: knowledge is superseded or retired via a status flip, so the audit trail of how the team's thinking changed stays intact.
  • No bundled seeds. The blueprint ships seedless — it provisions the nine schemas and a scoped key, and you fill it from your own corpus (next section). So the context starts clean; there's nothing synthetic to remove.

The apply is idempotent: re-running converges rather than duplicating, because every item is keyed by its externalId — so the knowledge base is rebuildable from source at any time.

Before you start

This is an invite-only 0.x preview, so the honest prerequisites: you need an early-access invite, and from the dev portal you mint a short-lived bridge token — the human step that authenticates the CLI before it can provision anything. You also need Node (the CLI and MCP server run via npx).

1. Bootstrap the blueprint (seedless)

npm i -g @vectros-ai/cli      # or prefix the commands below with: npx -y @vectros-ai/cli
vectros login                 # one-time, browser sign-in
vectros bootstrap --blueprint agentic-sdlc --no-seed --yes
vectros whoami                # confirm tenant + scoped key

This provisions the context + the nine schemas + a least-privilege ssk_* (written once to ~/.vectros/agentic-sdlc.key.json) and safe-merges the Vectros MCP server into your Claude Desktop config — use --client code for Claude Code, or --print to emit the snippet for any other MCP client. Add --tenant test to provision into the test tenant first for a dry run.

2. Ingest your corpus (agent-driven, idempotent)

There are two ingest paths, by surface — both driven by an ingest agent pointed at your source files with the bundled orientation prompt (an LLM maps your semi-structured docs to the right type far better than a brittle parser, and it's idempotent by externalId).

  • Documents — the prose artifacts keep their markdown body as-is; the agent fills the typed metadata and calls document_ingest against the matching schema (decision / design / reference / runbook / postmortem), with payload carrying summary, status, area, tags, date, and any references.
  • Records — the structured artifacts are typed fields, not prose; the agent extracts them and calls record_create per item (a convention's rule/why/howToApply, a control's evidence + verifiedBy runbook, a term's unique key + definition).

Cross-surface edges resolve by the target's externalId, so ingest the referenced documents before the records that point at them. A one-shot backfill is that agent looped over your docs/, ADRs, and memory; an ongoing sync re-runs it on change — the same externalIds converge.

A bulk backfill is exactly the workload that trips the API's per-minute rate limit (per tenant, counting writes + searches, shared across all of a tenant's keys). Pace the ingest — for the free tier, roughly one record every couple of seconds is safe — and on an HTTP 429, honor the Retry-After header. Because ingest is idempotent, a backfill that pauses or restarts simply converges; it never double-writes.

3. Query it — the recall payoff

You want…Call
"Why did we decide X?" (grounded, cited)rag_ask "why did we choose X?" — answers over document bodies
"Which critical controls are active, and how is each proven?"record_query control { criticality:"critical", status:"active" } → follow verifiedBy to the runbook
"What's the active rule for area X?"record_query convention { area:"<area>", status:"active" }
"Have we hit this failure before?"hybrid_search "<symptom>" contentTypes:["documents"], typeName:"postmortem"; plus record_query gotcha { area:"deploy", status:"active" }
"Define X"record_query term { term:"X" } (unique lookup)
"Latest decisions / search the designs"hybrid_search "<topic>" contentTypes:["documents"], typeName:"decision" (or "design")

Recall by meaning (hybrid_search / rag_ask) and deterministic enumeration (record_query by lookup field) are both first-class — a knowledge base needs both: search to surface the relevant decision you'd never remember by filename, lookup to enumerate a known slice ("every active control").

4. The dual surface — agents capture, your team browses

The same typed context is reachable two ways. Agents capture and recall over MCP — the bundled orientation prompt wires the recall-before-acting / capture-after loop, so a cold-start session reads the conventions before it writes code and records the new decision after. Your team browses the exact same records and documents in the data-plane app — no separate export, no second copy. One governed store, two surfaces.

5. Bridge your issue tracker — don't mirror it

Your tracker (GitLab, Jira, Linear) and this knowledge base are two planes with different jobs: the tracker owns live status (open/closed, assignee — volatile); the knowledge base owns durable recall (why/how/lessons — stable). Mirroring issues in creates a stale shadow copy that buries your decisions under issue churn. Instead, promote by reference: when you close out work that carries durable knowledge, distill it into the right type, tag it issue:<id>, and note the externalId back in the tracker. Be selective — most issues promote nothing; that selectivity is what keeps recall high-signal.

Keep it healthy

  • Record the why — rationale is the most-recalled field; a statement without it is a log entry, not knowledge.
  • Supersede, don't delete — flip status so the evolution trail survives (the access profile has no delete by design).
  • Re-ingest is idempotent — keyed on externalId, so a backfill converges and the whole knowledge base can be rebuilt from source at any time.

Customize

This is a starting point — fork it for your org. Swap the area vocabulary for your subsystems; adjust the status / severity enums to your lifecycle; add or remove schemas (content-heavy types belong on the document surface, structure-heavy types are records — add a separate type when the shape differs or a first-class type strengthens references, not for a near-identical clone). Note that lookups are migration-locked — the equality-vs-range choice is fixed once a schema is live, so choose deliberately.