Notice what worked

Controlled Experimentation

Turn 'it seemed to work' into a causal claim.

Designs treatment-and-control experiments — A/B, multivariate, holdout, switchback — assigns subjects under a proper randomisation scheme, and delivers a statistically grounded ship/kill verdict. The output isn't a metric trend; it's a causal claim with a confidence interval you can defend to a sceptical stakeholder.

Shape

Operational dimensions

Requires approval

Each output waits on a human decision.

On demand

Fires when a user asks.

Medium data gravity

Holds working state that compounds over runs.

Read-only inbound

Consumes external data; does not write back.

Inputs

hypothesis and treatment definitions
target metric and success criteria
subject population or traffic split
power analysis parameters (MDE, significance threshold, desired power)

Outputs

per-arm outcome distribution
treatment-effect estimate with confidence interval
ship / kill / extend decision input
experiment record for future retrieval

Mechanism

Assigns subjects to treatment and control conditions under an experimental design (A/B, multivariate, holdout, switchback), runs the experiment, and produces a causal readout of treatment effect with statistical confidence.

Why this is a primitive

Cannot be decomposed — the design → randomised-assignment → measure → causal-inference operation is one mathematical machinery (power analysis, randomisation, treatment-effect estimation, multiple-comparison handling). It is distinct from descriptive telemetry refinement: telemetry observes what happened, experimentation imposes the treatment-control structure that lets you claim WHY it happened. Strip the assignment-and-causal-inference layer and you have an A/B-flavoured dashboard with no causal claim.

Where it shows up

Growth team — A/B test of two onboarding flows on 14-day activation rate, with a statistically powered verdict on which flow to roll out

EdTech product — holdout experiment on a new adaptive routing algorithm, measuring whether it improves mastery rate versus the control population

Content team — multivariate test of email subject-line variants against open rate, with multiple-comparison correction to avoid false positives

ML platform — switchback experiment on a new ranking model in a marketplace, alternating treatment and control windows to handle temporal confounds

Related primitives

Composes with

Telemetry-Driven Refinement Loop →Stated-Feedback Capture →Retrospective Capture →

Alternative to

Telemetry-Driven Refinement Loop →