{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Parameter Sensitivity\n", "\n", "Every measurement model has tuning parameters that influence the treatment effect estimate. Subclassification requires choosing the number of strata; nearest neighbour matching requires setting a caliper distance. How sensitive are results to these choices?\n", "\n", "This notebook answers two questions:\n", "\n", "1. **Single-seed sensitivity** — How does the estimate change as we sweep a tuning parameter?\n", "2. **Sensitivity with uncertainty** — Are the observed patterns robust to sampling variation, or just noise?\n", "\n", "We use the same A/A test design (true effect = 0) so that any deviation from 0 reflects estimator behavior, not a real treatment effect." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initial setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import copy\n", "import os\n", "from pathlib import Path\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import yaml\n", "from impact_engine_measure import measure_impact, load_results\n", "from impact_engine_measure.core.validation import load_config\n", "from online_retail_simulator import simulate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Configurable via environment variables for CI (reduced values speed up execution)\n", "N_REPS = 25\n", "\n", "output_path = Path(\"output/demo_parameter_sensitivity\")\n", "output_path.mkdir(parents=True, exist_ok=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 — Product catalog\n", "\n", "All parameter sweeps use the same product catalog." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(\"configs/demo_model_selection_catalog.yaml\") as f:\n", " catalog_config = yaml.safe_load(f)\n", "\n", "tmp_catalog = output_path / \"catalog_config.yaml\"\n", "with open(tmp_catalog, \"w\") as f:\n", " yaml.dump(catalog_config, f, default_flow_style=False)\n", "\n", "catalog_job = simulate(str(tmp_catalog), job_id=\"catalog\")\n", "products = catalog_job.load_df(\"products\")\n", "\n", "print(f\"Generated {len(products)} products\")\n", "products.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 — Configuration" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "config_path = \"configs/demo_model_selection.yaml\"\n", "true_te = 0 # A/A design: no treatment effect by construction\n", "\n", "base_config = load_config(config_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def run_with_override(base_config, measurement_override, storage_url, job_id, source_seed=None):\n", " \"\"\"Override MEASUREMENT in base config, write temp YAML, run measure_impact().\n", "\n", " Optionally override the data-generating seed for Monte Carlo replications.\n", " Returns the full MeasureJobResult for access to both impact_results and transformed_metrics.\n", " \"\"\"\n", " config = copy.deepcopy(base_config)\n", " config[\"MEASUREMENT\"] = measurement_override\n", " if source_seed is not None:\n", " config[\"DATA\"][\"SOURCE\"][\"CONFIG\"][\"seed\"] = source_seed\n", "\n", " tmp_config_path = Path(storage_url) / f\"config_{job_id}.yaml\"\n", " tmp_config_path.parent.mkdir(parents=True, exist_ok=True)\n", " with open(tmp_config_path, \"w\") as f:\n", " yaml.dump(config, f, default_flow_style=False)\n", "\n", " job_info = measure_impact(str(tmp_config_path), storage_url, job_id=job_id)\n", " return load_results(job_info)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 — Parameter sensitivity (single seed)\n", "\n", "For a given model and data, how sensitive is the treatment effect estimate to tuning parameters?\n", "We sweep one parameter at a time while keeping everything else fixed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3a. Subclassification: `n_strata`\n", "\n", "More strata means finer partitioning of the covariate space.\n", "This can improve precision but may leave strata without common support." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_strata_values = [2, 3, 5, 10, 20, 50, 100]\n", "subclass_estimates = []\n", "strata_used = []\n", "strata_dropped = []\n", "mean_revenue = None\n", "\n", "for n in n_strata_values:\n", " measurement = {\n", " \"MODEL\": \"subclassification\",\n", " \"PARAMS\": {\n", " \"treatment_column\": \"enriched\",\n", " \"covariate_columns\": [\"price\"],\n", " \"n_strata\": n,\n", " \"estimand\": \"att\",\n", " \"dependent_variable\": \"revenue\",\n", " },\n", " }\n", " result = run_with_override(base_config, measurement, str(output_path), f\"subclass_strata_{n}\")\n", " estimates = result.impact_results[\"data\"][\"impact_estimates\"]\n", " if mean_revenue is None:\n", " mean_revenue = result.transformed_metrics[\"revenue\"].mean()\n", "\n", " subclass_estimates.append(estimates[\"treatment_effect\"])\n", " strata_used.append(estimates[\"n_strata\"])\n", " strata_dropped.append(estimates[\"n_strata_dropped\"])\n", "\n", "subclass_sensitivity = pd.DataFrame(\n", " {\n", " \"n_strata (requested)\": n_strata_values,\n", " \"Strata Used\": strata_used,\n", " \"Strata Dropped\": strata_dropped,\n", " \"Treatment Effect\": subclass_estimates,\n", " \"Absolute Error\": [abs(est - true_te) for est in subclass_estimates],\n", " \"Relative Error (%)\": [abs(est - true_te) / mean_revenue * 100 for est in subclass_estimates],\n", " }\n", ")\n", "\n", "print(\"Subclassification: n_strata Sensitivity\")\n", "print(f\"Mean revenue: {mean_revenue:.2f} (used as denominator for relative error)\")\n", "print(\"-\" * 90)\n", "print(subclass_sensitivity.to_string(index=False, float_format=lambda x: f\"{x:.4f}\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from notebook_support import plot_parameter_sensitivity\n", "\n", "plot_parameter_sensitivity(\n", " param_values=n_strata_values,\n", " estimates=subclass_estimates,\n", " true_effect=true_te,\n", " xlabel=\"Number of Strata (n_strata)\",\n", " ylabel=\"Treatment Effect\",\n", " title=\"Subclassification: Sensitivity to n_strata\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3b. Nearest neighbour matching: `caliper`\n", "\n", "The caliper controls the maximum allowed distance between a treated unit and its matched control.\n", "Smaller values enforce tighter matches but may discard units, while larger values allow more matches with worse balance." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "caliper_values = [0.01, 0.05, 0.1, 0.2, 0.5, 1.0, 2.0]\n", "matching_estimates = []\n", "n_matched_att_list = []\n", "\n", "for cal in caliper_values:\n", " measurement = {\n", " \"MODEL\": \"nearest_neighbour_matching\",\n", " \"PARAMS\": {\n", " \"treatment_column\": \"enriched\",\n", " \"covariate_columns\": [\"price\"],\n", " \"dependent_variable\": \"revenue\",\n", " \"caliper\": cal,\n", " \"replace\": True,\n", " \"ratio\": 1,\n", " },\n", " }\n", " result = run_with_override(base_config, measurement, str(output_path), f\"matching_caliper_{cal}\")\n", " estimates = result.impact_results[\"data\"][\"impact_estimates\"]\n", " summary = result.impact_results[\"data\"][\"model_summary\"]\n", "\n", " matching_estimates.append(estimates[\"att\"])\n", " n_matched_att_list.append(summary[\"n_matched_att\"])\n", "\n", "matching_sensitivity = pd.DataFrame(\n", " {\n", " \"Caliper\": caliper_values,\n", " \"N Matched (ATT)\": n_matched_att_list,\n", " \"Treatment Effect (ATT)\": matching_estimates,\n", " \"Absolute Error\": [abs(est - true_te) for est in matching_estimates],\n", " \"Relative Error (%)\": [abs(est - true_te) / mean_revenue * 100 for est in matching_estimates],\n", " }\n", ")\n", "\n", "print(\"Nearest Neighbour Matching: Caliper Sensitivity\")\n", "print(f\"Mean revenue: {mean_revenue:.2f} (used as denominator for relative error)\")\n", "print(\"-\" * 90)\n", "print(matching_sensitivity.to_string(index=False, float_format=lambda x: f\"{x:.4f}\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_parameter_sensitivity(\n", " param_values=caliper_values,\n", " estimates=matching_estimates,\n", " true_effect=true_te,\n", " xlabel=\"Caliper\",\n", " ylabel=\"Treatment Effect (ATT)\",\n", " title=\"Nearest Neighbour Matching: Sensitivity to Caliper\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4 — Parameter sensitivity with uncertainty\n", "\n", "Step 3 showed how estimates change with tuning parameters using a single seed.\n", "Here we add uncertainty bands by running each parameter value across multiple replications.\n", "This reveals whether apparent sensitivity is real or just noise." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rng = np.random.default_rng(seed=2024)\n", "mc_seeds = rng.integers(low=0, high=2**31, size=N_REPS).tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4a. Subclassification: `n_strata`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_strata_mc = {n: [] for n in n_strata_values}\n", "\n", "for i, seed in enumerate(mc_seeds):\n", " for n in n_strata_values:\n", " measurement = {\n", " \"MODEL\": \"subclassification\",\n", " \"PARAMS\": {\n", " \"treatment_column\": \"enriched\",\n", " \"covariate_columns\": [\"price\"],\n", " \"n_strata\": n,\n", " \"estimand\": \"att\",\n", " \"dependent_variable\": \"revenue\",\n", " },\n", " }\n", " result = run_with_override(\n", " base_config,\n", " measurement,\n", " str(output_path),\n", " f\"mc_subclass_{n}_rep{i}\",\n", " source_seed=seed,\n", " )\n", " n_strata_mc[n].append(result.impact_results[\"data\"][\"impact_estimates\"][\"treatment_effect\"])\n", "\n", " if (i + 1) % 5 == 0:\n", " print(f\"Subclassification sweep: {i + 1}/{N_REPS} replications\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from notebook_support import plot_parameter_sensitivity_mc\n", "\n", "strata_means = [np.mean(n_strata_mc[n]) for n in n_strata_values]\n", "strata_stds = [np.std(n_strata_mc[n], ddof=1) for n in n_strata_values]\n", "strata_lower = [m - s for m, s in zip(strata_means, strata_stds)]\n", "strata_upper = [m + s for m, s in zip(strata_means, strata_stds)]\n", "\n", "plot_parameter_sensitivity_mc(\n", " param_values=n_strata_values,\n", " mean_estimates=strata_means,\n", " lower_band=strata_lower,\n", " upper_band=strata_upper,\n", " true_effect=true_te,\n", " xlabel=\"Number of Strata (n_strata)\",\n", " ylabel=\"Treatment Effect\",\n", " title=f\"Subclassification: n_strata Sensitivity ({N_REPS} replications)\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4b. Nearest Neighbour Matching: `caliper`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "caliper_mc = {c: [] for c in caliper_values}\n", "\n", "for i, seed in enumerate(mc_seeds):\n", " for cal in caliper_values:\n", " measurement = {\n", " \"MODEL\": \"nearest_neighbour_matching\",\n", " \"PARAMS\": {\n", " \"treatment_column\": \"enriched\",\n", " \"covariate_columns\": [\"price\"],\n", " \"dependent_variable\": \"revenue\",\n", " \"caliper\": cal,\n", " \"replace\": True,\n", " \"ratio\": 1,\n", " },\n", " }\n", " result = run_with_override(\n", " base_config,\n", " measurement,\n", " str(output_path),\n", " f\"mc_matching_{cal}_rep{i}\",\n", " source_seed=seed,\n", " )\n", " caliper_mc[cal].append(result.impact_results[\"data\"][\"impact_estimates\"][\"att\"])\n", "\n", " if (i + 1) % 5 == 0:\n", " print(f\"Matching sweep: {i + 1}/{N_REPS} replications\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cal_means = [np.mean(caliper_mc[c]) for c in caliper_values]\n", "cal_stds = [np.std(caliper_mc[c], ddof=1) for c in caliper_values]\n", "cal_lower = [m - s for m, s in zip(cal_means, cal_stds)]\n", "cal_upper = [m + s for m, s in zip(cal_means, cal_stds)]\n", "\n", "plot_parameter_sensitivity_mc(\n", " param_values=caliper_values,\n", " mean_estimates=cal_means,\n", " lower_band=cal_lower,\n", " upper_band=cal_upper,\n", " true_effect=true_te,\n", " xlabel=\"Caliper\",\n", " ylabel=\"Treatment Effect (ATT)\",\n", " title=f\"Nearest Neighbour Matching: Caliper Sensitivity ({N_REPS} replications)\",\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 4 }