{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Metrics Approximation\n", "\n", "This notebook demonstrates **metrics-based impact approximation** via the engine's internal response function registry.\n", "\n", "## Workflow overview\n", "\n", "1. User provides `products.csv`\n", "2. User configures `DATA.ENRICHMENT` section\n", "3. User calls `measure_impact(config.yaml)`\n", "4. Engine handles everything internally (adapter, enrichment, transform, model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initial setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import pandas as pd\n", "from impact_engine_measure import measure_impact, load_results\n", "from impact_engine_measure.core.validation import load_config\n", "from impact_engine_measure.core import apply_transform\n", "from impact_engine_measure.models.factory import get_model_adapter\n", "from online_retail_simulator import enrich, simulate\n", "from online_retail_simulator.simulate import simulate_product_details" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 — Product Catalog\n", "\n", "In production, this would be your actual product catalog." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "output_path = Path(\"output/demo_metrics_approximation\")\n", "output_path.mkdir(parents=True, exist_ok=True)\n", "\n", "catalog_job = simulate(\"configs/demo_metrics_approximation_catalog.yaml\", job_id=\"catalog\")\n", "products = catalog_job.load_df(\"products\")\n", "\n", "print(f\"Generated {len(products)} products\")\n", "print(f\"Products catalog: {catalog_job.get_store().full_path('products.csv')}\")\n", "products.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 — Engine configuration\n", "\n", "Configure the engine with the following sections.\n", "- `ENRICHMENT` — Quality boost parameters\n", "- `TRANSFORM` — Prepare data for approximation\n", "- `MODEL` — `metrics_approximation` with response function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "config_path = \"configs/demo_metrics_approximation.yaml\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 — Impact evaluation\n", "\n", "A single call to `measure_impact()` handles everything.\n", "- Engine creates `CatalogSimulatorAdapter`\n", "- Adapter simulates metrics\n", "- Adapter generates `product_details`\n", "- Adapter applies enrichment (quality boost)\n", "- Transform extracts `quality_before`/`quality_after`\n", "- `MetricsApproximationAdapter` computes impact" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "job_info = measure_impact(config_path, str(output_path), job_id=\"results\")\n", "print(f\"Job ID: {job_info.job_id}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4 — Review results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = load_results(job_info)\n", "\n", "data = result.impact_results[\"data\"]\n", "model_params = data[\"model_params\"]\n", "estimates = data[\"impact_estimates\"]\n", "summary = data[\"model_summary\"]\n", "\n", "print(\"=\" * 60)\n", "print(\"METRICS-BASED IMPACT APPROXIMATION RESULTS\")\n", "print(\"=\" * 60)\n", "\n", "print(f\"\\nModel Type: {result.model_type}\")\n", "print(f\"Response Function: {model_params['response_function']}\")\n", "\n", "print(\"\\n--- Aggregate Impact Estimates ---\")\n", "print(f\"Total Impact: ${estimates['impact']:.2f}\")\n", "print(f\"Number of Products: {summary['n_products']}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Per-product data from model artifacts\n", "per_product_df = result.model_artifacts[\"product_level_impacts\"]\n", "\n", "print(\"\\n--- Per-Product Breakdown (first 10) ---\")\n", "print(\"-\" * 60)\n", "print(f\"{'Product':<20} {'Delta Quality':<15} {'Baseline':<12} {'Impact':<12}\")\n", "print(\"-\" * 60)\n", "for _, p in per_product_df.head(10).iterrows():\n", " print(f\"{p['product_id']:<20} {p['delta_metric']:<15.4f} ${p['baseline_outcome']:<11.2f} ${p['impact']:<11.2f}\")\n", "\n", "print(\"\\n\" + \"=\" * 60)\n", "print(\"Demo Complete!\")\n", "print(\"=\" * 60)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5 — Model validation\n", "\n", "Compare the model's estimate against the **true causal effect** computed from counterfactual vs factual data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def calculate_true_effect(\n", " baseline_metrics: pd.DataFrame,\n", " enriched_metrics: pd.DataFrame,\n", ") -> dict:\n", " \"\"\"Calculate TRUE impact by comparing total revenue with vs without enrichment.\"\"\"\n", " baseline_total = baseline_metrics[\"revenue\"].sum()\n", " enriched_total = enriched_metrics[\"revenue\"].sum()\n", " impact = enriched_total - baseline_total\n", "\n", " return {\n", " \"baseline_total\": float(baseline_total),\n", " \"enriched_total\": float(enriched_total),\n", " \"impact\": float(impact),\n", " }" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "baseline_metrics = catalog_job.load_df(\"metrics\").rename(columns={\"product_identifier\": \"product_id\"})\n", "\n", "# Generate product details (adds quality_score to products, required before enrichment)\n", "pd_config = {\"PRODUCT_DETAILS\": {\"FUNCTION\": \"simulate_product_details_mock\"}}\n", "store = catalog_job.get_store()\n", "store.write_yaml(\"product_details_config.yaml\", pd_config)\n", "catalog_job = simulate_product_details(catalog_job, store.full_path(\"product_details_config.yaml\"))\n", "\n", "enrich(\"configs/demo_metrics_approximation_enrichment.yaml\", catalog_job)\n", "enriched_metrics = catalog_job.load_df(\"enriched\").rename(columns={\"product_identifier\": \"product_id\"})\n", "\n", "# Add quality_score (mirrors adapter._apply_enrichment logic)\n", "parsed = load_config(config_path)\n", "enrichment_start = pd.to_datetime(parsed[\"DATA\"][\"ENRICHMENT\"][\"PARAMS\"][\"enrichment_start\"])\n", "\n", "products_original = catalog_job.load_df(\"product_details_original\")\n", "products_enriched = catalog_job.load_df(\"product_details_enriched\")\n", "orig_quality = products_original.set_index(\"product_identifier\")[\"quality_score\"].to_dict()\n", "enr_quality = products_enriched.set_index(\"product_identifier\")[\"quality_score\"].to_dict()\n", "\n", "enriched_metrics[\"date\"] = pd.to_datetime(enriched_metrics[\"date\"])\n", "enriched_metrics[\"quality_score\"] = enriched_metrics.apply(\n", " lambda row: (\n", " orig_quality.get(row[\"product_id\"], 0.5)\n", " if row[\"date\"] < enrichment_start\n", " else enr_quality.get(row[\"product_id\"], 0.5)\n", " ),\n", " axis=1,\n", ")\n", "\n", "print(f\"Baseline records: {len(baseline_metrics)}\")\n", "print(f\"Enriched records: {len(enriched_metrics)}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "true_effect = calculate_true_effect(baseline_metrics, enriched_metrics)\n", "\n", "true_impact = true_effect[\"impact\"]\n", "model_impact = estimates[\"impact\"]\n", "\n", "if true_impact != 0:\n", " recovery_accuracy = (1 - abs(1 - model_impact / true_impact)) * 100\n", "else:\n", " recovery_accuracy = 100 if model_impact == 0 else 0\n", "\n", "print(\"=\" * 60)\n", "print(\"TRUTH RECOVERY VALIDATION\")\n", "print(\"=\" * 60)\n", "print(f\"True impact: ${true_impact:,.2f}\")\n", "print(f\"Model estimate: ${model_impact:,.2f}\")\n", "print(f\"Recovery accuracy: {max(0, recovery_accuracy):.1f}%\")\n", "print(\"=\" * 60)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convergence analysis\n", "\n", "How does the estimate converge to the true effect as sample size increases?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sample_sizes = [5, 10, 25, 50, 100]\n", "estimates_list = []\n", "truth_list = []\n", "\n", "transform_config = parsed[\"DATA\"][\"TRANSFORM\"]\n", "measurement_config = parsed[\"MEASUREMENT\"]\n", "all_product_ids = enriched_metrics[\"product_id\"].unique()\n", "\n", "for n in sample_sizes:\n", " subset_ids = all_product_ids[:n]\n", " enriched_sub = enriched_metrics[enriched_metrics[\"product_id\"].isin(subset_ids)]\n", " baseline_sub = baseline_metrics[baseline_metrics[\"product_id\"].isin(subset_ids)]\n", "\n", " true = calculate_true_effect(baseline_sub, enriched_sub)\n", " truth_list.append(true[\"impact\"])\n", "\n", " transformed = apply_transform(enriched_sub, transform_config)\n", " model = get_model_adapter(\"metrics_approximation\")\n", " model.connect(measurement_config[\"PARAMS\"])\n", " result = model.fit(data=transformed)\n", " estimates_list.append(result.data[\"impact_estimates\"][\"impact\"])\n", "\n", "print(\"Convergence analysis complete.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from notebook_support import plot_convergence\n", "\n", "plot_convergence(\n", " sample_sizes,\n", " estimates_list,\n", " truth_list,\n", " xlabel=\"Number of Products\",\n", " ylabel=\"Impact ($)\",\n", " title=\"Metrics Approximation: Convergence of Estimate to True Effect\",\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 4 }