Taxonomy
Concepts

Datachains

The DTPR instance that describes a data-collecting technology — thin and resolved wire forms.

A datachain is a concrete instance of a DTPR schema. It describes one data-collecting technology — what it collects, why, who operates it, and what happens to the data downstream.

DTPR ships two wire forms of a datachain. The thin form is the canonical authored shape. The resolved form is its persisted, render-ready, optionally-LLM-authored sibling.

Thin form — DatachainInstance

At the top level, a DatachainInstance carries an id, an optional title and description for the system being described, a schema_version, and an ordered list of elements:

{
  "id": "worcester-lpr",
  "title": [{ "locale": "en", "value": "Worcester license plate reader" }],
  "description": [{ "locale": "en", "value": "Parking enforcement automation." }],
  "schema_version": "ai@2026-04-16-beta",
  "elements": [
    { "element_id": "purpose.example" },
    { "element_id": "data.camera" }
  ]
}

The id is opaque (a stable identifier); title and description are localized human-readable strings — what renderers and agents display as the headline of the disclosure. Both default to [], so existing v2 instances keep parsing unchanged; renderers fall back to the id when title is empty.

Each entry in elements[] is an element placement — a reference to an element definition from the schema, optionally carrying:

  • a label override,
  • variables with filled values (the element's description gets interpolated using them),
  • a citation.

The schema defines the vocabulary (elements, categories, variables); the datachain picks from it. Element placements do not repeat the category — the category is derived from the element definition at lookup time.

Resolved form — ResolvedDatachainInstance

A ResolvedDatachainInstance is a strict superset of DatachainInstance that adds three fields:

  • schema_snapshot — the slice of schema content (datachain type, categories, elements) frozen at resolve-time. Decouples rendering from the live schema store, so a deployed disclosure keeps rendering correctly even if the schema evolves or the live version index drops the pinned version.
  • suggested_elements — AI-proposed elements that are not present in schema_snapshot.elements. Defaults to []. A non-empty value implies authoring_provenance.kind === 'ai_generated'.
  • authoring_provenance — optional authoring telemetry describing who or what produced the disclosure (see Authoring provenance below).

The resolved form is what you persist when you want a disclosure that:

  • renders offline against the snapshot, with no live schema-fetch,
  • carries AI-suggested elements that have not yet been promoted into the schema,
  • records that an AI agent (vs a human reviewer) authored the disclosure.

Schema definition lives in api/src/schema/datachain-instance-resolved.ts.

Resolve

The thin form becomes the resolved form via a pure operation:

  • POST /schemas/:version/resolve (REST) — see the REST resolve reference.
  • resolve_datachain (MCP tool) — same operation, same wire shape, MCP soft-failure envelope.

The resolver loads the pinned schema version, intersects it with the elements actually placed (the lean subset rule — only categories and elements referenced by the instance are pinned), and returns a ResolvedDatachainInstance.

POST /api/v2/schemas/ai@2026-04-16-beta/resolve
Content-Type: application/json

{
  "schema_version": "ai@2026-04-16-beta",
  "elements": [
    { "element_id": "purpose.example" },
    { "element_id": "data.camera" }
  ]
}
{
  "schema_version": "ai@2026-04-16-beta",
  "elements": [
    { "element_id": "purpose.example" },
    { "element_id": "data.camera" }
  ],
  "schema_snapshot": {
    "datachain_type": { "/* DatachainType */": "..." },
    "categories":     [ "/* only categories referenced by the placements */" ],
    "elements":       [ "/* only elements referenced by the placements */" ]
  },
  "suggested_elements": []
}
Operational regime. Resolve runs under a dedicated rate-limit bucket of 15 requests per 60 seconds, a per-route wall-clock budget of 5000 ms, and a 512 KB response cap. Bundles that exceed the cap return a payload_too_large error envelope rather than being silently truncated. Lean-subset pinning (only referenced categories and elements) keeps real disclosures well under the cap.

Round-trip rule

The resolved → thin round-trip is conditional. A ResolvedDatachainInstance whose suggested_elements is empty is structurally equivalent to a DatachainInstance once the three resolved-only fields are stripped. A ResolvedDatachainInstance with non-emptysuggested_elements cannot round-trip to a thin instance without losing the proposed elements — by design. Round-trip is post-parse equivalence, not byte-identity.

In practice: rendering and offline persistence work against the resolved form. Re-validation against a live schema requires either (a) the empty-suggested_elements case, in which the resolved form strips trivially to a thin instance, or (b) promoting the suggested elements into the schema and re-resolving (see Promoted-element lifecycle below).

Authoring provenance

AuthoringProvenance is an optional, instance-level field on ResolvedDatachainInstance that records authoring telemetry. It is a discriminated union on kind:

{ "authoring_provenance": { "kind": "human" } }
{
  "authoring_provenance": {
    "kind": "ai_generated",
    "model": "claude-sonnet-4-6",
    "generated_at": "2026-05-07T18:42:00Z",
    "element_provenance": {
      "data.camera": {
        "rationale": "Operator's privacy notice describes a roof-mounted camera per intersection.",
        "confidence": "high",
        "source_references": [
          {
            "quote": "Each enforcement vehicle carries an outward-facing license plate camera.",
            "context": "Privacy notice §2 — Equipment"
          }
        ],
        "variable_rationale": {
          "retention_days": "Stated as 90 days in the same paragraph."
        }
      },
      "purpose.example": {
        "rationale": "Inferred from the operator's stated parking-enforcement use case.",
        "confidence": "medium"
      }
    }
  }
}

Notes:

  • Per-element rationale. rationale, confidence, source_references, and variable_rationale describe one element pick at a time and live under element_provenance[<element_id>]. The whole-disclosure level only carries kind, model, and generated_at — single proposal, single model, single timestamp.
  • source_references are verbatim quotes. Each entry is { quote, context? } — the exact text the model lifted from a source document, plus an optional locator (section, page, row id). They are not URLs; link-style citations belong on per-element sources (see Citation vs authoring telemetry below).
  • confidence is qualitative. 'high' | 'medium' | 'low' — three buckets, no numeric form. Renderers display the value verbatim.
  • R14 implication. A non-empty suggested_elements array requires kind: 'ai_generated'. The reverse is not enforced — an AI-authored disclosure with no proposals may still mark itself ai_generated.
  • Orphan keys are rejected. Every key in element_provenance must reference a placement element_id on this datachain. The semantic validator emits element_provenance_unknown_element for any key that does not.

Citation vs authoring telemetry

authoring_provenance is distinct from the existing sources field on element placements (a ProvenanceRef). They answer different questions and coexist on the same shape:

FieldWhat it documents
sources (per placement)Citation provenance. Where the claim about the system comes from — the operator's privacy notice, an AIA, a published spec. Authored by humans, asserts a fact about the disclosed system.
authoring_provenance (per instance, per-element entries)Authoring telemetry. Who or what produced the disclosure document — a human reviewer or an AI agent — and (when AI) per-element rationale, confidence, and the verbatim quotes the model leaned on.

A human-authored disclosure with sources is the common case: humans cite humans. An AI-generated disclosure typically carries both — authoring_provenance.kind: 'ai_generated' documents the authoring loop; per-placement sources document the underlying system claims (which the model may have copied from the operator's published material).

Render-time HTML-escape policy. Free-text fields under authoring_provenance.element_provenance[<element_id>]rationale, variable_rationale values, and the quote / context strings inside source_references — are LLM-authored and MUST be HTML-escaped at every rendering boundary. In Vue, {{ ... }} interpolation is safe; v-html MUST NOT be used on these fields. Inside @dtpr/ui/vue and @dtpr/ui/html, this is enforced at the component layer; downstream consumers that build their own templates are responsible for honoring it.

Trust boundary on schema_snapshot

schema_snapshot is a convenience for offline rendering, not a forgery-resistant attestation. Consumers MUST NOT treat it as a provenance guarantee that the embedded categories, elements, or datachain-type definitions came from the canonical schema store.

The snapshot is whatever the producer chose to embed at resolve-time. A consumer that re-fetches the live schema for the pinned version can compare and detect drift — but only when the pinned version is still served by the schema store. Once a version ages out of the live index, the snapshot is the only copy and there is no canonical comparison to perform.

DTPR does run a snapshot-consistency check during validation (the snapshot_drift semantic error) only when the pinned schema version is still served. This is a soft drift detector, not a forgery defense — a producer who fabricates a snapshot for a never-published version cannot be detected by it.

A content-hash binding that ties a ResolvedDatachainInstance to a verifiable schema digest is a deferred capability. Until it lands, treat schema_snapshot as authored data, evaluated under the same trust assumptions as the rest of the disclosure.

When a suggested_element graduates from a proposal to a first-class element in a new schema version, prior persisted resolved artifacts do not auto-rebase. They stay pinned to the schema version they were resolved against, with the suggested element living on as suggested_elements[] content.

The path to adopt a promoted element on an existing artifact is:

  1. Take the thin elements from the original disclosure.
  2. Re-resolve against the new schema version (which now includes the promoted element as a first-class element).
  3. Persist the new ResolvedDatachainInstance.

There is no in-place rebase. Each authoring round produces a fresh ResolvedDatachainInstance; the old one, if persisted, remains valid against its pinned snapshot indefinitely. The trade-off is deliberate: forking-forever keeps already-published disclosures stable and avoids the schema-evolution surprises that an auto-rebase would invite.

Rejection and discard flow

Rejection is operational, not stateful. When a reviewer rejects an LLM-authored draft, the documented path is: discard the artifact and re-invoke the skill (with feedback) to produce a new resolved form. The schema does not model an in-product "edit out a single suggested element" mutation.

A stateful per-element edit-and-revalidate loop would require shaping the schema for partial mutation, which v1 deliberately avoids. Each authoring round is a fresh artifact; the previous one is dropped or archived, but never patched in place.

Validation

Datachains are validated against a pinned schema version. Both wire forms have a validator:

All four validate both shape (Zod) and semantics (cardinality, required categories, placement rules). Shape errors surface as parse_error; semantic errors carry stable codes per rule. The resolved-form validators additionally enforce R14 (non-empty suggested_elementsai_generated), R15a (no element-id collisions between snapshot and suggestions), and the soft snapshot_drift check when the pinned version is still served.

Rendering

Once valid, a datachain can be rendered to HTML:

  • render_datachain (MCP Apps) — accepts either wire form; produces HTML consumed via resources/read.
  • @dtpr/ui/vue + DtprDatachain — render inside a Vue app.
  • @dtpr/ui/html renderDatachainDocument — SSR the same components to a standalone HTML document.

When rendering a resolved form, the renderer marks any element pulled from suggested_elements with a "proposed" indicator (default-on). For every placement that has a matching entry in authoring_provenance.element_provenance, an expandable "AI proposal context" section beneath the element surfaces the per-element rationale, source-reference quotes, qualitative high/medium/low confidence label, and per-variable rationales.

See also