Organise

Data Asset Inventory & Lineage

Know what data you have, who owns it, and where it came from.

Scans your data estate — warehouses, lakes, APIs, models, dashboards — and maintains a live catalogue: schema, ownership, lineage, freshness, quality score, usage. When a table breaks, you know who owns it. When compliance asks where customer data flows, you trace lineage in minutes. Without it, this knowledge lives in heads and Slack threads.

Shape

warehouselakeAPIsBI toolsscheduled scanasset cataloguelineage edgesstewardfreshnesssearch assets…searchable catalogue

Operational dimensions

Human supervisor

Person oversees and intervenes by exception.

Scheduled

Fires on a clock.

High data gravity

Owns a system-of-record; expensive to migrate.

Read-only inbound

Consumes external data; does not write back.

Inputs

  • connectors to source systems (warehouse, lake, APIs, BI tools)
  • lineage signals (query logs, ETL job metadata, dbt manifests)
  • steward annotations (owner, sensitivity classification, quality notes)
  • schema change events

Outputs

  • asset catalogue (dataset / table / column records + metadata + lineage edges)
  • freshness and quality state per asset
  • stewardship view (owner assignments, classification, deprecation status)
  • searchable data catalogue interface

Mechanism

Maintains a catalogue of data assets (datasets, tables, columns, files, models, dashboards) with their metadata — schema, ownership, lineage, freshness, quality, usage.

Why this is a primitive

Kept separate from graph-instantiation despite the structural overlap (in principle, inventory IS graph-instantiation over an asset meta-schema) because the operation is dominated by a recurring asset-lifecycle: scan source systems, detect schema/lineage automatically, attach steward metadata, surface freshness/quality signals, deprecate. That scan-and-maintain loop is what's load-bearing, not arbitrary graph traversal. If we treated this as a special case of graph-instantiation we would lose the inventory-as-operational-discipline framing. CHALLENGE FLAG: defensible to delete and re-express data-catalogue compositions as `graph-instantiation + vocabulary-authoring`; kept because the metadata-lifecycle operation is reused by enough compositions (data catalogue, MDM steward views, model registry) that it earns its place.

Where it shows up

Scale-up — builds a data catalogue over Snowflake + S3 + Fivetran so the analytics team stops asking #data-help which table is the source of truth for revenue
Regulated financial institution — maintains asset-level lineage to satisfy model-risk and data-lineage audit requirements without manual documentation sprints
Healthcare system — inventories PHI-containing datasets across the estate so the privacy team can assess breach scope and demonstrate HIPAA compliance
Data platform team — tracks ML model artefacts, training datasets, and feature tables as first-class assets so model provenance is queryable rather than tribal

Related primitives

Tags

structured-datagovernanceAIscheduleddata-quality

See where it fits.

Primitives are configured into named solution shapes for each client’s domain. The fastest next step is a conversation about which shape fits your problem.

Start a conversation