Configuration
Impact Engine uses YAML configuration files to control all aspects of data sourcing, measurement, and output. This guide documents the actual configuration schema as implemented in the code.
Configuration structure
The engine uses YAML configuration files with three main sections.
DATA:
SOURCE:
# Data source configuration
TRANSFORM:
# Optional data transformation
MEASUREMENT:
# Model configuration
OUTPUT:
# Output path configuration
DATA section
Configures where metrics data comes from and how it’s transformed.
SOURCE configuration
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
string |
No |
Data source type: |
|
object |
Yes |
Source-specific configuration |
Simulator CONFIG parameters (default)
The simulator generates synthetic metrics data from a product catalog.
DATA:
SOURCE:
type: simulator
CONFIG:
mode: rule
seed: 42
path: data/products.csv
start_date: "2024-01-01"
end_date: "2024-01-31"
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
Yes |
- |
Path to products CSV file |
|
string |
Yes |
- |
Analysis start date (YYYY-MM-DD) |
|
string |
Yes |
- |
Analysis end date (YYYY-MM-DD) |
|
string |
No |
|
Simulation mode: |
|
int |
No |
|
Random seed for reproducibility |
File CONFIG parameters
Load metrics from an existing CSV or Parquet file instead of simulating.
DATA:
SOURCE:
type: file
CONFIG:
path: data/metrics.csv
product_id_column: product_id
date_column: date
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
Yes |
- |
Path to data file (CSV, Parquet, or partitioned Parquet directory) |
|
string |
No |
|
Column name for product identifiers |
|
string |
No |
|
Column name for dates |
Enrichment configuration
Apply synthetic interventions to simulated data for testing causal impact detection.
DATA:
SOURCE:
type: simulator
CONFIG:
mode: rule
seed: 42
path: data/products.csv
start_date: "2024-11-01"
end_date: "2024-12-15"
ENRICHMENT:
FUNCTION: product_detail_boost
PARAMS:
quality_boost: 0.15
enrichment_fraction: 1.0
enrichment_start: "2024-11-23"
seed: 42
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
Enrichment function: |
|
float |
Yes |
Magnitude of the quality score boost (e.g., 0.15) |
|
float |
No |
Fraction of products to enrich (0.0-1.0, default 1.0) |
|
string |
Yes |
Date when enrichment begins (YYYY-MM-DD) |
|
int |
No |
Random seed for reproducibility |
TRANSFORM configuration
Optional transformation applied to data before model fitting.
DATA:
TRANSFORM:
FUNCTION: aggregate_by_date
PARAMS:
metric: revenue
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
No |
|
Transform function name |
|
object |
No |
|
Function-specific parameters |
Available transforms
Each model typically pairs with a specific transform. The engine selects the transform by name from a registry.
Transform |
Used With |
Description |
Key Parameters |
|---|---|---|---|
|
Any |
No-op default. Passes data through unchanged. |
None |
|
Interrupted Time Series |
Sums all numeric columns by date, producing one row per date. |
|
|
Synthetic Control |
Adds a |
|
|
Metrics Approximation |
Aggregates baseline metric per product into cross-sectional format. |
|
|
Metrics Approximation (simulator source) |
Converts simulator time-series into before/after quality scores and baseline sales per product. |
|
MEASUREMENT section
Configures the statistical model for impact analysis.
Common parameters
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
string |
No |
Model type (default: |
|
object |
Yes |
Model-specific parameters |
Interrupted time series model
MEASUREMENT:
MODEL: interrupted_time_series
PARAMS:
intervention_date: "2024-01-15"
dependent_variable: revenue
order: [1, 0, 0]
seasonal_order: [0, 0, 0, 0]
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
Yes |
- |
Date when intervention occurred (YYYY-MM-DD) |
|
string |
No |
|
Column name to analyze |
|
array |
No |
|
ARIMA order (p, d, q) |
|
array |
No |
|
Seasonal ARIMA order (P, D, Q, s) |
Experiment model
MEASUREMENT:
MODEL: experiment
PARAMS:
formula: "revenue ~ treatment + price"
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
Yes |
- |
R-style formula where all variables must exist in the DataFrame |
Subclassification model
MEASUREMENT:
MODEL: subclassification
PARAMS:
dependent_variable: revenue
treatment_column: treatment
covariate_columns:
- price
n_strata: 5
estimand: att
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
No |
|
Outcome column name |
|
string |
Yes |
- |
Binary treatment indicator column |
|
list |
Yes |
- |
Columns used for propensity stratification |
|
int |
No |
|
Number of quantile-based strata |
|
string |
No |
|
Estimand: |
Nearest neighbour matching model
MEASUREMENT:
MODEL: nearest_neighbour_matching
PARAMS:
dependent_variable: revenue
treatment_column: treatment
covariate_columns:
- price
caliper: 0.2
replace: false
ratio: 1
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
No |
|
Outcome column name |
|
string |
Yes |
- |
Binary treatment indicator column |
|
list |
Yes |
- |
Columns used for matching |
|
float |
No |
|
Maximum distance for a valid match |
|
bool |
No |
|
Whether to match with replacement |
|
int |
No |
|
Number of matches per unit |
|
bool |
No |
|
Shuffle data before matching |
|
int |
No |
|
Random seed for reproducibility |
|
int |
No |
|
Number of parallel jobs |
Metrics approximation model
MEASUREMENT:
MODEL: metrics_approximation
PARAMS:
metric_before_column: quality_before
metric_after_column: quality_after
baseline_column: baseline_sales
RESPONSE:
FUNCTION: linear
PARAMS:
coefficient: 0.5
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
No |
|
Column name for pre-intervention metric |
|
string |
No |
|
Column name for post-intervention metric |
|
string |
No |
|
Column name for baseline outcome |
|
string |
No |
|
Response function name from the response registry |
|
float |
No |
|
Coefficient for the linear response function |
Synthetic control model
MEASUREMENT:
MODEL: synthetic_control
PARAMS:
treatment_time: 15
treated_unit: "unit_A"
outcome_column: revenue
unit_column: unit_id
time_column: date
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
int |
Yes |
- |
Time index when intervention occurred |
|
string |
Yes |
- |
Name of the treated unit |
|
string |
Yes |
- |
Column with the outcome variable |
|
string |
No |
|
Column identifying units |
|
string |
No |
|
Column identifying time periods |
|
string |
No |
|
Optimization method passed to pysyncon |
|
string |
No |
|
Initial weight strategy for optimization |
OUTPUT section
Configures where results are stored.
OUTPUT:
PATH: output
Parameter |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
No |
|
Directory for output files |
Complete example
DATA:
SOURCE:
type: simulator
CONFIG:
mode: rule
seed: 42
path: data/products.csv
start_date: "2024-01-01"
end_date: "2024-03-31"
TRANSFORM:
FUNCTION: aggregate_by_date
PARAMS:
metric: revenue
MEASUREMENT:
MODEL: interrupted_time_series
PARAMS:
intervention_date: "2024-02-01"
dependent_variable: revenue
order: [1, 0, 0]
seasonal_order: [0, 0, 0, 7]
OUTPUT:
PATH: output