Datachains
A datachain is a concrete instance of a DTPR schema. It describes one data-collecting technology — what it collects, why, who operates it, and what happens to the data downstream.
DTPR ships two wire forms of a datachain. The thin form is the canonical authored shape. The resolved form is its persisted, render-ready, optionally-LLM-authored sibling.
Thin form — DatachainInstance
At the top level, a DatachainInstance carries an id, an optional title and description for the system being described, a schema_version, and an ordered list of elements:
{
"id": "worcester-lpr",
"title": [{ "locale": "en", "value": "Worcester license plate reader" }],
"description": [{ "locale": "en", "value": "Parking enforcement automation." }],
"schema_version": "ai@2026-04-16-beta",
"elements": [
{ "element_id": "purpose.example" },
{ "element_id": "data.camera" }
]
}
The id is opaque (a stable identifier); title and description are localized human-readable strings — what renderers and agents display as the headline of the disclosure. Both default to [], so existing v2 instances keep parsing unchanged; renderers fall back to the id when title is empty.
Each entry in elements[] is an element placement — a reference to an element definition from the schema, optionally carrying:
- a
labeloverride, variableswith filled values (the element's description gets interpolated using them),- a
citation.
The schema defines the vocabulary (elements, categories, variables); the datachain picks from it. Element placements do not repeat the category — the category is derived from the element definition at lookup time.
Resolved form — ResolvedDatachainInstance
A ResolvedDatachainInstance is a strict superset of DatachainInstance that adds three fields:
schema_snapshot— the slice of schema content (datachain type, categories, elements) frozen at resolve-time. Decouples rendering from the live schema store, so a deployed disclosure keeps rendering correctly even if the schema evolves or the live version index drops the pinned version.suggested_elements— AI-proposed elements that are not present inschema_snapshot.elements. Defaults to[]. A non-empty value impliesauthoring_provenance.kind === 'ai_generated'.authoring_provenance— optional authoring telemetry describing who or what produced the disclosure (see Authoring provenance below).
The resolved form is what you persist when you want a disclosure that:
- renders offline against the snapshot, with no live schema-fetch,
- carries AI-suggested elements that have not yet been promoted into the schema,
- records that an AI agent (vs a human reviewer) authored the disclosure.
Schema definition lives in api/src/schema/datachain-instance-resolved.ts.
Resolve
The thin form becomes the resolved form via a pure operation:
POST /schemas/:version/resolve(REST) — see the REST resolve reference.resolve_datachain(MCP tool) — same operation, same wire shape, MCP soft-failure envelope.
The resolver loads the pinned schema version, intersects it with the elements actually placed (the lean subset rule — only categories and elements referenced by the instance are pinned), and returns a ResolvedDatachainInstance.
POST /api/v2/schemas/ai@2026-04-16-beta/resolve
Content-Type: application/json
{
"schema_version": "ai@2026-04-16-beta",
"elements": [
{ "element_id": "purpose.example" },
{ "element_id": "data.camera" }
]
}
{
"schema_version": "ai@2026-04-16-beta",
"elements": [
{ "element_id": "purpose.example" },
{ "element_id": "data.camera" }
],
"schema_snapshot": {
"datachain_type": { "/* DatachainType */": "..." },
"categories": [ "/* only categories referenced by the placements */" ],
"elements": [ "/* only elements referenced by the placements */" ]
},
"suggested_elements": []
}
payload_too_large error envelope rather than being silently truncated. Lean-subset pinning (only referenced categories and elements) keeps real disclosures well under the cap.Round-trip rule
ResolvedDatachainInstance whose suggested_elements is empty is structurally equivalent to a DatachainInstance once the three resolved-only fields are stripped. A ResolvedDatachainInstance with non-emptysuggested_elements cannot round-trip to a thin instance without losing the proposed elements — by design. Round-trip is post-parse equivalence, not byte-identity.In practice: rendering and offline persistence work against the resolved form. Re-validation against a live schema requires either (a) the empty-suggested_elements case, in which the resolved form strips trivially to a thin instance, or (b) promoting the suggested elements into the schema and re-resolving (see Promoted-element lifecycle below).
Authoring provenance
AuthoringProvenance is an optional, instance-level field on ResolvedDatachainInstance that records authoring telemetry. It is a discriminated union on kind:
{ "authoring_provenance": { "kind": "human" } }
{
"authoring_provenance": {
"kind": "ai_generated",
"model": "claude-sonnet-4-6",
"generated_at": "2026-05-07T18:42:00Z",
"element_provenance": {
"data.camera": {
"rationale": "Operator's privacy notice describes a roof-mounted camera per intersection.",
"confidence": "high",
"source_references": [
{
"quote": "Each enforcement vehicle carries an outward-facing license plate camera.",
"context": "Privacy notice §2 — Equipment"
}
],
"variable_rationale": {
"retention_days": "Stated as 90 days in the same paragraph."
}
},
"purpose.example": {
"rationale": "Inferred from the operator's stated parking-enforcement use case.",
"confidence": "medium"
}
}
}
}
Notes:
- Per-element rationale.
rationale,confidence,source_references, andvariable_rationaledescribe one element pick at a time and live underelement_provenance[<element_id>]. The whole-disclosure level only carrieskind,model, andgenerated_at— single proposal, single model, single timestamp. source_referencesare verbatim quotes. Each entry is{ quote, context? }— the exact text the model lifted from a source document, plus an optional locator (section, page, row id). They are not URLs; link-style citations belong on per-elementsources(see Citation vs authoring telemetry below).confidenceis qualitative.'high' | 'medium' | 'low'— three buckets, no numeric form. Renderers display the value verbatim.- R14 implication. A non-empty
suggested_elementsarray requireskind: 'ai_generated'. The reverse is not enforced — an AI-authored disclosure with no proposals may still mark itselfai_generated. - Orphan keys are rejected. Every key in
element_provenancemust reference a placementelement_idon this datachain. The semantic validator emitselement_provenance_unknown_elementfor any key that does not.
Citation vs authoring telemetry
authoring_provenance is distinct from the existing sources field on element placements (a ProvenanceRef). They answer different questions and coexist on the same shape:
| Field | What it documents |
|---|---|
sources (per placement) | Citation provenance. Where the claim about the system comes from — the operator's privacy notice, an AIA, a published spec. Authored by humans, asserts a fact about the disclosed system. |
authoring_provenance (per instance, per-element entries) | Authoring telemetry. Who or what produced the disclosure document — a human reviewer or an AI agent — and (when AI) per-element rationale, confidence, and the verbatim quotes the model leaned on. |
A human-authored disclosure with sources is the common case: humans cite humans. An AI-generated disclosure typically carries both — authoring_provenance.kind: 'ai_generated' documents the authoring loop; per-placement sources document the underlying system claims (which the model may have copied from the operator's published material).
authoring_provenance.element_provenance[<element_id>] — rationale, variable_rationale values, and the quote / context strings inside source_references — are LLM-authored and MUST be HTML-escaped at every rendering boundary. In Vue, {{ ... }} interpolation is safe; v-html MUST NOT be used on these fields. Inside @dtpr/ui/vue and @dtpr/ui/html, this is enforced at the component layer; downstream consumers that build their own templates are responsible for honoring it.Trust boundary on schema_snapshot
schema_snapshot is a convenience for offline rendering, not a forgery-resistant attestation. Consumers MUST NOT treat it as a provenance guarantee that the embedded categories, elements, or datachain-type definitions came from the canonical schema store.The snapshot is whatever the producer chose to embed at resolve-time. A consumer that re-fetches the live schema for the pinned version can compare and detect drift — but only when the pinned version is still served by the schema store. Once a version ages out of the live index, the snapshot is the only copy and there is no canonical comparison to perform.
DTPR does run a snapshot-consistency check during validation (the snapshot_drift semantic error) only when the pinned schema version is still served. This is a soft drift detector, not a forgery defense — a producer who fabricates a snapshot for a never-published version cannot be detected by it.
A content-hash binding that ties a ResolvedDatachainInstance to a verifiable schema digest is a deferred capability. Until it lands, treat schema_snapshot as authored data, evaluated under the same trust assumptions as the rest of the disclosure.
Promoted-element lifecycle (fork-forever)
When a suggested_element graduates from a proposal to a first-class element in a new schema version, prior persisted resolved artifacts do not auto-rebase. They stay pinned to the schema version they were resolved against, with the suggested element living on as suggested_elements[] content.
The path to adopt a promoted element on an existing artifact is:
- Take the thin elements from the original disclosure.
- Re-resolve against the new schema version (which now includes the promoted element as a first-class element).
- Persist the new
ResolvedDatachainInstance.
There is no in-place rebase. Each authoring round produces a fresh ResolvedDatachainInstance; the old one, if persisted, remains valid against its pinned snapshot indefinitely. The trade-off is deliberate: forking-forever keeps already-published disclosures stable and avoids the schema-evolution surprises that an auto-rebase would invite.
Rejection and discard flow
A stateful per-element edit-and-revalidate loop would require shaping the schema for partial mutation, which v1 deliberately avoids. Each authoring round is a fresh artifact; the previous one is dropped or archived, but never patched in place.
Validation
Datachains are validated against a pinned schema version. Both wire forms have a validator:
POST /schemas/:version/validate— REST, thin form.validate_datachain— MCP, thin form.POST /schemas/:version/validate_resolved— REST, resolved form.validate_resolved— MCP, resolved form.
All four validate both shape (Zod) and semantics (cardinality, required categories, placement rules). Shape errors surface as parse_error; semantic errors carry stable codes per rule. The resolved-form validators additionally enforce R14 (non-empty suggested_elements ⟹ ai_generated), R15a (no element-id collisions between snapshot and suggestions), and the soft snapshot_drift check when the pinned version is still served.
Rendering
Once valid, a datachain can be rendered to HTML:
render_datachain(MCP Apps) — accepts either wire form; produces HTML consumed viaresources/read.@dtpr/ui/vue+DtprDatachain— render inside a Vue app.@dtpr/ui/htmlrenderDatachainDocument— SSR the same components to a standalone HTML document.
When rendering a resolved form, the renderer marks any element pulled from suggested_elements with a "proposed" indicator (default-on). For every placement that has a matching entry in authoring_provenance.element_provenance, an expandable "AI proposal context" section beneath the element surfaces the per-element rationale, source-reference quotes, qualitative high/medium/low confidence label, and per-variable rationales.
See also
- Elements & categories
- Versions & releases
- Content hash
- REST resolve reference (
/rest/resolve) - MCP
resolve_datachainreference (/mcp/tools/resolve-datachain)