Usage

Workflow

Impact Engine follows the same three steps regardless of which measurement model is used.

1. Prepare a product catalog. Provide a CSV with product characteristics (product_identifier, category, price). In demo notebooks, the catalog simulator generates this automatically.

2. Write a YAML configuration file. The config has three sections: DATA selects the data source and optional transformations, MEASUREMENT selects the model and its parameters, and OUTPUT sets the storage path. See Configuration for the full parameter reference.

DATA:
  SOURCE:
    type: simulator
    CONFIG:
      path: data/products.csv
      start_date: "2024-01-01"
      end_date: "2024-01-31"
  TRANSFORM:
    FUNCTION: aggregate_by_date
    PARAMS:
      metric: revenue

MEASUREMENT:
  MODEL: interrupted_time_series
  PARAMS:
    intervention_date: "2024-01-15"
    dependent_variable: revenue

OUTPUT:
  PATH: output

3. Run the analysis.

from impact_engine_measure import measure_impact, load_results

job_info = measure_impact(
    config_path="config.yaml",
    storage_url="./results"
)
results = load_results(job_info)
print(results.model_type)         # "interrupted_time_series"
print(results.job_id)             # "job-20260101-abc123"
print(results.impact_results)     # {"schema_version": "2.0", "model_type": ..., "data": {...}}
print(results.transformed_metrics.head())  # DataFrame with aggregated revenue

The engine loads products, retrieves metrics, applies transformations, fits the model, and writes results. measure_impact() returns a JobInfo object; pass it to load_results() to get a typed MeasureJobResult with all artifacts loaded: config, impact_results, products, business_metrics, transformed_metrics, and any model-specific model_artifacts.

Output

Every run produces a standardized output regardless of which model was used.

impact_results.json contains the result envelope:

{
  "schema_version": "2.0",
  "model_type": "<model_name>",
  "data": {
    "model_params": { },
    "impact_estimates": { },
    "model_summary": { }
  },
  "metadata": {
    "executed_at": "2026-02-08T12:00:00+00:00"
  }
}

The three keys inside data are standardized across all models. model_params echoes the input parameters. impact_estimates holds the treatment effect measurements. model_summary provides fit diagnostics and sample sizes.

manifest.json lists all output files and their formats, making the output self-describing. Consumers should read the manifest to resolve file paths rather than hardcoding filenames.

Some models produce supplementary artifacts as Parquet files (e.g., per-stratum breakdowns, matched data). These are listed in the manifest and named {model_type}__{artifact_name}.parquet.

Available models

Each model has a demo notebook with a runnable end-to-end example including truth recovery validation and convergence analysis.

Model	Library	Interface	Description	Demo
Experiment	statsmodels	`ols()`	Linear regression for randomized A/B tests	demo_experiment
Interrupted Time Series	statsmodels	`SARIMAX()`	ARIMA-based pre/post intervention comparison on aggregated time series	demo_interrupted_time_series
Nearest Neighbour Matching	causalml	`NearestNeighborMatch`	Causal matching on covariates for ATT/ATC estimation	demo_nearest_neighbour_matching
Subclassification	pandas / NumPy	`qcut()` + `np.average()`	Propensity stratification with within-stratum treatment effects	demo_subclassification
Synthetic Control	pysyncon	`Synth`	Synthetic control method for aggregate intervention analysis	demo_synthetic_control
Metrics Approximation	(built-in)	Response function registry	Response function approximation using a library of candidate functions	demo_metrics_approximation