Download the notebook here! Interactive online version: colab

Synthetic Control

This notebook demonstrates synthetic control impact estimation via pysyncon `Synth <https://sdfordham.github.io/pysyncon/synth.html>`__.

The synthetic control method constructs a weighted combination of control units as a counterfactual for the treated unit, then estimates the causal effect as the difference between observed and synthetic outcomes in the post-treatment period.

Workflow overview

  1. User provides products.csv (each product = a “unit” in the panel)

  2. User configures DATA.ENRICHMENT for treatment assignment

  3. User calls measure_impact(config.yaml)

  4. Engine handles everything internally (adapter, enrichment, transform, model)

Initial setup

[1]:
from pathlib import Path

import pandas as pd
from impact_engine_measure import measure_impact, load_results
from impact_engine_measure.core.validation import load_config
from impact_engine_measure.core import apply_transform
from impact_engine_measure.models.factory import get_model_adapter
from online_retail_simulator import enrich, simulate

Step 1 — Product Catalog

We use a small catalog (20 products) because synthetic control treats each product as a separate unit in the donor pool.

[2]:
output_path = Path("output/demo_synthetic_control")
output_path.mkdir(parents=True, exist_ok=True)

catalog_job = simulate("configs/demo_synthetic_control_catalog.yaml", job_id="catalog")
products = catalog_job.load_df("products")

print(f"Generated {len(products)} products")
print(f"Products catalog: {catalog_job.get_store().full_path('products.csv')}")
products.head()
Generated 20 products
Products catalog: /home/runner/work/tools-impact-engine-measure/tools-impact-engine-measure/docs/source/methods/output/demo_synthetic_control/catalog/products.csv
[2]:
product_identifier category price
0 B1P4DZHDS9 Electronics 686.37
1 B1SE4QSNG7 Toys & Games 80.75
2 BXTPQIDT5C Food & Beverage 42.02
3 B3F1ZMC8Q6 Food & Beverage 33.42
4 B2NQRBTF0Y Toys & Games 27.52

Step 2 — Engine configuration

Configure the engine with the following sections.

  • ENRICHMENT — Quality boost applied to ~10% of products starting Nov 15

  • TRANSFORMprepare_for_synthetic_control adds a time-aware treatment column

  • MODELsynthetic_control with one designated treated unit

The panel structure is 20 products x 30 days (Nov 1–30). Enrichment starts Nov 15, giving 14 pre-treatment and 16 post-treatment periods.

With seed=42 and enrichment_fraction=0.1, product BU9XOOP3LG is the treated unit.

[3]:
config_path = "configs/demo_synthetic_control.yaml"

Step 3 — Impact evaluation

A single call to measure_impact() handles everything.

  • Engine creates CatalogSimulatorAdapter

  • Adapter simulates daily metrics (30-day panel)

  • Adapter applies enrichment (quality boost to treated product after Nov 15)

  • prepare_for_synthetic_control transform adds the treatment column

  • SyntheticControlAdapter builds a Dataprep object and fits via pysyncon’s Synth

[4]:
job_info = measure_impact(config_path, str(output_path), job_id="results")
print(f"Job ID: {job_info.job_id}")
Job ID: results

Step 4 — Review results

[5]:
result = load_results(job_info)

data = result.impact_results["data"]
model_params = data["model_params"]
estimates = data["impact_estimates"]
summary = data["model_summary"]

print("=" * 60)
print("SYNTHETIC CONTROL RESULTS")
print("=" * 60)

print(f"\nModel Type: {result.model_type}")
print(f"Treated Unit: {model_params['treated_unit']}")
print(f"Treatment Time: {model_params['treatment_time']}")

print("\n--- Impact Estimates ---")
print(f"ATT:               {estimates['att']:.4f}")
print(f"Standard Error:    {estimates['se']:.4f}")
print(f"CI Lower:          {estimates['ci_lower']:.4f}")
print(f"CI Upper:          {estimates['ci_upper']:.4f}")
print(f"Cumulative Effect: {estimates['cumulative_effect']:.4f}")

print("\n--- Model Summary ---")
print(f"Pre-treatment periods:  {summary['n_pre_periods']}")
print(f"Post-treatment periods: {summary['n_post_periods']}")
print(f"Control units:          {summary['n_control_units']}")
print(f"MSPE:                   {summary['mspe']:.4f}")
print(f"MAE:                    {summary['mae']:.4f}")

print("\n--- Control Unit Weights ---")
for unit, weight in summary["weights"].items():
    if weight > 0.001:
        print(f"  {unit}: {weight:.4f}")

print("\n" + "=" * 60)
print("Demo Complete!")
print("=" * 60)
============================================================
SYNTHETIC CONTROL RESULTS
============================================================

Model Type: synthetic_control
Treated Unit: BU9XOOP3LG
Treatment Time: 2024-11-15 00:00:00

--- Impact Estimates ---
ATT:               -11.2270
Standard Error:    28.5847
CI Lower:          -67.2530
CI Upper:          44.7990
Cumulative Effect: -179.6323

--- Model Summary ---
Pre-treatment periods:  14
Post-treatment periods: 16
Control units:          19
MSPE:                   20932.9987
MAE:                    100.6053

--- Control Unit Weights ---
  B1P4DZHDS9: 0.0530
  B1SE4QSNG7: 0.0491
  BXTPQIDT5C: 0.0491
  B3F1ZMC8Q6: 0.0491
  B2NQRBTF0Y: 0.0491
  B0OL6NCQ2G: 0.0491
  BELIUY7PF3: 0.0491
  BZ13P24N6K: 0.0491
  BY3H2A222X: 0.0491
  BZUQSUBFIE: 0.0491
  BY4UC1UWUT: 0.0491
  BUVBMP9HJO: 0.0645
  B43C4K7KPX: 0.0491
  BA10ZQZF6S: 0.0491
  BFRQFNIKYW: 0.0491
  BNMEM8N6RZ: 0.0491
  BJJ2JZ2Q0J: 0.0491
  BDEQZHZL3U: 0.0963
  BGA1RZXPLW: 0.0491

============================================================
Demo Complete!
============================================================

Step 5 — Model validation

Compare the model’s ATT estimate against the true per-period revenue difference for the treated unit (enriched vs counterfactual).

[6]:
def calculate_true_effect(
    baseline_metrics: pd.DataFrame,
    enriched_metrics: pd.DataFrame,
    treated_unit: str,
    treatment_time: str,
) -> dict:
    """Calculate TRUE per-period effect for the treated unit."""
    treatment_date = pd.Timestamp(treatment_time)

    baseline_unit = baseline_metrics[
        (baseline_metrics["product_id"] == treated_unit) & (pd.to_datetime(baseline_metrics["date"]) >= treatment_date)
    ]
    enriched_unit = enriched_metrics[
        (enriched_metrics["product_id"] == treated_unit) & (pd.to_datetime(enriched_metrics["date"]) >= treatment_date)
    ]

    baseline_mean = baseline_unit["revenue"].mean()
    enriched_mean = enriched_unit["revenue"].mean()
    mean_effect = enriched_mean - baseline_mean

    return {
        "baseline_mean": float(baseline_mean),
        "enriched_mean": float(enriched_mean),
        "mean_effect": float(mean_effect),
    }
[7]:
baseline_metrics = catalog_job.load_df("metrics").rename(columns={"product_identifier": "product_id"})

enrich("configs/demo_synthetic_control_enrichment.yaml", catalog_job)
enriched_metrics = catalog_job.load_df("enriched").rename(columns={"product_identifier": "product_id"})

print(f"Baseline records: {len(baseline_metrics)}")
print(f"Enriched records: {len(enriched_metrics)}")
Baseline records: 600
Enriched records: 600
[8]:
treated_unit = model_params["treated_unit"]
true_effect = calculate_true_effect(baseline_metrics, enriched_metrics, treated_unit, "2024-11-15")

true_me = true_effect["mean_effect"]
model_me = estimates["att"]

if true_me != 0:
    recovery_accuracy = (1 - abs(1 - model_me / true_me)) * 100
else:
    recovery_accuracy = 100 if model_me == 0 else 0

print("=" * 60)
print("TRUTH RECOVERY VALIDATION")
print("=" * 60)
print(f"True mean effect:  {true_me:.4f}")
print(f"Model estimate:    {model_me:.4f}")
print(f"Recovery accuracy: {max(0, recovery_accuracy):.1f}%")
print("=" * 60)
============================================================
TRUTH RECOVERY VALIDATION
============================================================
True mean effect:  0.0000
Model estimate:    -11.2270
Recovery accuracy: 0.0%
============================================================

Convergence analysis

How does the estimate improve as the number of control units increases? We vary the catalog size (keeping 1 treated unit) and observe convergence.

[9]:
control_sizes = [3, 5, 8, 12, 15, 18]
estimates_list = []
truth_list = []

parsed = load_config(config_path)
measurement_params = parsed["MEASUREMENT"]["PARAMS"]

# enrichment_start is auto-injected during measure_impact() but not
# available when calling apply_transform directly — supply it explicitly.
transform_config = {
    "FUNCTION": "prepare_for_synthetic_control",
    "PARAMS": {"enrichment_start": "2024-11-15"},
}

# Determine which product is enriched in the retrieved metrics
# (enrichment assignment may differ from the measure_impact run).
enriched_ids = enriched_metrics[enriched_metrics["enriched"]]["product_id"].unique()
convergence_treated = enriched_ids[0]
all_ids = enriched_metrics["product_id"].unique()
control_pool = [pid for pid in all_ids if pid != convergence_treated]

for n_controls in control_sizes:
    subset_ids = [convergence_treated] + control_pool[:n_controls]
    enriched_sub = enriched_metrics[enriched_metrics["product_id"].isin(subset_ids)]
    baseline_sub = baseline_metrics[baseline_metrics["product_id"].isin(subset_ids)]

    true = calculate_true_effect(baseline_sub, enriched_sub, convergence_treated, "2024-11-15")
    truth_list.append(true["mean_effect"])

    transformed = apply_transform(enriched_sub, transform_config)
    model = get_model_adapter("synthetic_control")
    model.connect(measurement_params)
    result = model.fit(
        data=transformed,
        treatment_time="2024-11-15",
        treated_unit=convergence_treated,
        outcome_column="revenue",
        unit_column="product_id",
        time_column="date",
    )
    estimates_list.append(result.data["impact_estimates"]["att"])

print("Convergence analysis complete.")
Convergence analysis complete.
[10]:
from notebook_support import plot_convergence

plot_convergence(
    control_sizes,
    estimates_list,
    truth_list,
    xlabel="Number of Control Units",
    ylabel="ATT",
    title="Synthetic Control: Convergence of Estimate to True Effect",
)
../_images/methods_demo_synthetic_control_17_0.png