{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Time Series\n", "\n", "This notebook demonstrates **interrupted time series** impact estimation via [statsmodels](https://www.statsmodels.org/) [`SARIMAX()`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html).\n", "\n", "## Workflow overview\n", "\n", "1. Generate product characteristics using the catalog simulator\n", "2. Configure the engine with enrichment\n", "3. Run impact evaluation\n", "4. Review results\n", "5. Validate against known true effect + convergence analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initial setup\n", "\n", "Import the required packages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import pandas as pd\n", "from impact_engine_measure import measure_impact, load_results\n", "from impact_engine_measure.core.validation import load_config\n", "from impact_engine_measure.core import apply_transform\n", "from impact_engine_measure.models.factory import get_model_adapter\n", "from online_retail_simulator import enrich, simulate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 — Product Catalog\n", "\n", "In production, this would be your actual product catalog." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "output_path = Path(\"output/demo_interrupted_time_series\")\n", "output_path.mkdir(parents=True, exist_ok=True)\n", "\n", "catalog_job = simulate(\"configs/demo_interrupted_time_series_catalog.yaml\", job_id=\"catalog\")\n", "products = catalog_job.load_df(\"products\")\n", "\n", "print(f\"Generated {len(products)} products\")\n", "print(f\"Products catalog: {catalog_job.get_store().full_path('products.csv')}\")\n", "products.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 — Engine configuration\n", "\n", "Configure the engine with the following sections.\n", "- `DATA` — Where to get products and how to simulate metrics\n", "- `ENRICHMENT` — Quality boost applied to all products starting Nov 15\n", "- `MEASUREMENT` — Interrupted time series model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "config_path = \"configs/demo_interrupted_time_series.yaml\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 — Impact evaluation\n", "\n", "Call `measure_impact()` with the configuration file. The engine handles the following.\n", "- Loading products\n", "- Simulating daily metrics\n", "- Aggregating data\n", "- Running the interrupted time series model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "job_info = measure_impact(config_path, str(output_path), job_id=\"results\")\n", "print(f\"Job ID: {job_info.job_id}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4 — Review results\n", "\n", "Load and display the impact evaluation results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = load_results(job_info)\n", "\n", "data = result.impact_results[\"data\"]\n", "model_params = data[\"model_params\"]\n", "estimates = data[\"impact_estimates\"]\n", "summary = data[\"model_summary\"]\n", "\n", "print(\"=\" * 60)\n", "print(\"IMPACT EVALUATION RESULTS\")\n", "print(\"=\" * 60)\n", "print(f\"\\nModel: {result.model_type}\")\n", "print(f\"Intervention Date: {model_params['intervention_date']}\")\n", "print(f\"Dependent Variable: {model_params['dependent_variable']}\")\n", "\n", "print(\"\\n--- Impact Estimates ---\")\n", "print(f\"Pre-intervention mean: ${estimates['pre_intervention_mean']:,.2f}\")\n", "print(f\"Post-intervention mean: ${estimates['post_intervention_mean']:,.2f}\")\n", "print(f\"Absolute change: ${estimates['absolute_change']:,.2f}\")\n", "print(f\"Percent change: {estimates['percent_change']:.1f}%\")\n", "\n", "print(\"\\n--- Model Summary ---\")\n", "print(f\"Observations: {summary['n_observations']}\")\n", "print(f\"Pre-period: {summary['pre_period_length']} days\")\n", "print(f\"Post-period: {summary['post_period_length']} days\")\n", "\n", "print(\"\\n\" + \"=\" * 60)\n", "print(\"Demo Complete!\")\n", "print(\"=\" * 60)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5 — Model validation\n", "\n", "Compare the model's estimate against the **true causal effect** computed from counterfactual vs factual data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def calculate_true_effect(\n", " baseline_metrics: pd.DataFrame,\n", " enriched_metrics: pd.DataFrame,\n", " intervention_date: str,\n", " metric: str = \"revenue\",\n", ") -> dict:\n", " \"\"\"Calculate TRUE causal effect by comparing factual vs counterfactual.\"\"\"\n", " intervention = pd.Timestamp(intervention_date)\n", "\n", " baseline_daily = baseline_metrics.groupby(\"date\")[metric].sum().reset_index()\n", " enriched_daily = enriched_metrics.groupby(\"date\")[metric].sum().reset_index()\n", " baseline_daily[\"date\"] = pd.to_datetime(baseline_daily[\"date\"])\n", " enriched_daily[\"date\"] = pd.to_datetime(enriched_daily[\"date\"])\n", "\n", " baseline_post = baseline_daily[baseline_daily[\"date\"] >= intervention][metric]\n", " enriched_post = enriched_daily[enriched_daily[\"date\"] >= intervention][metric]\n", "\n", " baseline_mean = baseline_post.mean()\n", " enriched_mean = enriched_post.mean()\n", " absolute_effect = enriched_mean - baseline_mean\n", " percent_effect = (absolute_effect / baseline_mean * 100) if baseline_mean > 0 else 0\n", "\n", " return {\n", " \"counterfactual_mean\": float(baseline_mean),\n", " \"factual_mean\": float(enriched_mean),\n", " \"absolute_effect\": float(absolute_effect),\n", " \"percent_effect\": float(percent_effect),\n", " }" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "baseline_metrics = catalog_job.load_df(\"metrics\").rename(columns={\"product_identifier\": \"product_id\"})\n", "\n", "enrich(\"configs/demo_interrupted_time_series_enrichment.yaml\", catalog_job)\n", "enriched_metrics = catalog_job.load_df(\"enriched\").rename(columns={\"product_identifier\": \"product_id\"})\n", "\n", "print(f\"Baseline records: {len(baseline_metrics)}\")\n", "print(f\"Enriched records: {len(enriched_metrics)}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "true_effect = calculate_true_effect(baseline_metrics, enriched_metrics, \"2024-11-15\", \"revenue\")\n", "\n", "true_pct = true_effect[\"percent_effect\"]\n", "its_pct = estimates[\"percent_change\"]\n", "\n", "if true_pct != 0:\n", " recovery_accuracy = (1 - abs(1 - its_pct / true_pct)) * 100\n", "else:\n", " recovery_accuracy = 100 if its_pct == 0 else 0\n", "\n", "print(\"=\" * 60)\n", "print(\"TRUTH RECOVERY VALIDATION\")\n", "print(\"=\" * 60)\n", "print(f\"True effect: {true_pct:.1f}%\")\n", "print(f\"ITS estimate: {its_pct:.1f}%\")\n", "print(f\"Recovery accuracy: {max(0, recovery_accuracy):.1f}%\")\n", "print(\"=\" * 60)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convergence analysis\n", "\n", "How does the estimate converge to the true effect as sample size increases?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sample_sizes = [10, 25, 50, 100, 200, 500]\n", "estimates_list = []\n", "truth_list = []\n", "\n", "parsed = load_config(config_path)\n", "transform_config = {\"FUNCTION\": \"aggregate_by_date\", \"PARAMS\": {\"metric\": \"revenue\"}}\n", "all_product_ids = enriched_metrics[\"product_id\"].unique()\n", "measurement_params = parsed[\"MEASUREMENT\"][\"PARAMS\"]\n", "\n", "for n in sample_sizes:\n", " subset_ids = all_product_ids[:n]\n", " enriched_sub = enriched_metrics[enriched_metrics[\"product_id\"].isin(subset_ids)]\n", " baseline_sub = baseline_metrics[baseline_metrics[\"product_id\"].isin(subset_ids)]\n", "\n", " true = calculate_true_effect(baseline_sub, enriched_sub, \"2024-11-15\", \"revenue\")\n", " truth_list.append(true[\"percent_effect\"])\n", "\n", " transformed = apply_transform(enriched_sub, transform_config)\n", " model = get_model_adapter(\"interrupted_time_series\")\n", " model.connect(measurement_params)\n", " result = model.fit(data=transformed, intervention_date=\"2024-11-15\", dependent_variable=\"revenue\")\n", " estimates_list.append(result.data[\"impact_estimates\"][\"percent_change\"])\n", "\n", "print(\"Convergence analysis complete.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from notebook_support import plot_convergence\n", "\n", "plot_convergence(\n", " sample_sizes,\n", " estimates_list,\n", " truth_list,\n", " xlabel=\"Number of Products\",\n", " ylabel=\"Effect Estimate (%)\",\n", " title=\"ITS: Convergence of Estimate to True Effect\",\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 4 }