Data Flow Contracts
Configuration
Parameters are separated into two levels. The orchestrator owns the config and routes the right data to each pipeline stage.
Problem-Level Parameters
Shared across the entire run, not specific to any initiative.
Parameter |
Type |
Description |
|---|---|---|
budget |
Currency |
Total budget constraint for ALLOCATE |
scale_sample_size |
int |
Sample size for scale-phase MEASURE runs |
max_workers |
int |
Parallelism for fan-out stages |
Initiative-Level Parameters
Specific to each initiative, known upfront (not produced by pipeline stages).
Parameter |
Type |
Description |
|---|---|---|
initiative_id |
InitiativeId |
Unique identifier |
cost_to_scale |
Currency |
Cost to scale this initiative to production |
Key principle: Initiative-level parameters (e.g.
cost_to_scale) are not passed through pipeline stages. The orchestrator enriches stage inputs with the relevant initiative parameters from the config. This keeps contracts clean — each stage only produces its own outputs.
Measure → Evaluate
Field |
Type |
Description |
|---|---|---|
initiative_id |
InitiativeId |
Identifier for the initiative |
effect_estimate |
float |
Estimated causal effect (point estimate) |
ci_lower |
float |
Lower bound of confidence interval |
ci_upper |
float |
Upper bound of confidence interval |
p_value |
float |
Statistical significance |
sample_size |
int |
Number of observations in pilot |
model_type |
ModelType |
experiment / quasi-experiment / time-series |
diagnostics |
dict |
Model fit diagnostics |
Invariants:
ci_lower <= effect_estimate <= ci_upperp_valuein [0.0, 1.0]sample_size >= 30(minimum for statistical validity)diagnosticsmust containr_squaredorlog_likelihood
Evaluate → Allocate
Field |
Type |
Description |
|---|---|---|
initiative_id |
InitiativeId |
Initiative identifier |
confidence |
Confidence |
Methodology-based confidence (0-1) |
cost |
Currency |
Cost to scale this initiative |
R_best |
float |
Upper CI bound → best-case return |
R_med |
float |
Point estimate → median return |
R_worst |
float |
Lower CI bound → worst-case return |
model_type |
ModelType |
Source methodology (informational) |
Invariants:
R_worst <= R_med <= R_bestcost > 0(zero-cost initiatives don’t need allocation decisions)
Allocate → Scale
Field |
Type |
Description |
|---|---|---|
selected_initiatives |
list[InitiativeId] |
Initiatives approved for scaling |
predicted_returns |
dict[InitiativeId, float] |
Expected returns from pilot (per initiative) |
budget_allocated |
dict[InitiativeId, Currency] |
Budget allocated per initiative |
Invariants:
selected_initiativesmay be empty (if no initiative fits the budget or all have negative expected returns, the orchestrator skips SCALE and returns an empty outcome report)All keys in
predicted_returnsandbudget_allocatedexist inselected_initiativessum(budget_allocated.values()) <= total_budget
Scale → Outcome Report
Field |
Type |
Description |
|---|---|---|
initiative_id |
InitiativeId |
Initiative identifier |
predicted_return |
float |
Return predicted from pilot |
actual_return |
float |
Observed return at scale |
prediction_error |
float |
Difference (actual - predicted) |
sample_size_pilot |
int |
Original pilot sample |
sample_size_scale |
int |
Scale sample size |
budget_allocated |
Currency |
Budget allocated to this initiative |
confidence_score |
Confidence |
Methodology-based confidence from EVALUATE |
model_type |
ModelType |
Source methodology |
Invariants:
prediction_error == actual_return - predicted_returnsample_size_scale >= sample_size_pilot(scale should be larger)sample_size_pilot >= 30