{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "# Demo\n\nThis notebook provides a high-level overview of the **Online Retail Simulator** package and its capabilities.\n\n## What is Online Retail Simulator?\n\nA Python package for generating **synthetic e-commerce data** for:\n- Testing and demos without exposing real business data\n- ML model training with realistic retail patterns\n- A/B test simulation and experimentation\n- Teaching analytics and data science concepts\n\n## Key Capabilities\n\n- **Rule-based generation**: Fast, configurable synthetic data\n- **ML-based synthesis**: Learn patterns from real data (optional SDV integration)\n- **Reproducible results**: Seed control for deterministic output\n- **8 product categories**: Electronics, Books, Clothing, and more\n- **Funnel metrics**: Impressions, visits, cart adds, orders"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "First, let's install the package (if running in Colab) and import the necessary libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment if running in Google Colab\n",
    "# !pip install online-retail-simulator matplotlib seaborn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "from online_retail_simulator import simulate, load_job_results\n",
    "\n",
    "# Set plot style\n",
    "sns.set_theme(style=\"whitegrid\")\n",
    "plt.rcParams[\"figure.figsize\"] = (10, 6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate Sample Data\n",
    "\n",
    "We'll generate 30 days of synthetic sales data with a simple configuration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "import os\n\n# Run simulation using config file\nconfig_path = os.path.join(os.path.dirname(__file__) if \"__file__\" in dir() else \".\", \"config_demo.yaml\")\njob_info = simulate(config_path)\n\n# Load results\nresults = load_job_results(job_info)\nproducts_df = results[\"products\"]\nmetrics_df = results[\"metrics\"]\n\nprint(f\"Generated {len(products_df)} products\")\nprint(f\"Generated {len(metrics_df)} metrics records\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exploring the Generated Data\n",
    "\n",
    "Let's look at the structure and contents of our synthetic dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Preview the metrics data\nprint(f\"Date range: {metrics_df['date'].min()} to {metrics_df['date'].max()}\")\nprint(f\"Categories: {metrics_df['category'].nunique()}\")\nprint(f\"Total revenue: ${metrics_df['revenue'].sum():,.2f}\")\nprint()\nmetrics_df.head(10)"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Revenue by Category\n",
    "\n",
    "How is revenue distributed across product categories?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Revenue by category\ncategory_revenue = metrics_df.groupby(\"category\")[\"revenue\"].sum().sort_values()\n\nfig, ax = plt.subplots(figsize=(10, 6))\ncategory_revenue.plot(kind=\"barh\", ax=ax, color=sns.color_palette(\"viridis\", len(category_revenue)))\nax.set_xlabel(\"Revenue ($)\")\nax.set_ylabel(\"Category\")\nax.set_title(\"Total Revenue by Category\")\nax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f\"${x:,.0f}\"))\nplt.tight_layout()\nplt.show()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Daily Sales Trend\n",
    "\n",
    "How do sales vary over time?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Daily sales trend\ndaily_sales = metrics_df.groupby(\"date\").agg({\n    \"ordered_units\": \"sum\",\n    \"revenue\": \"sum\"\n}).reset_index()\ndaily_sales[\"date\"] = pd.to_datetime(daily_sales[\"date\"])\n\nfig, ax = plt.subplots(figsize=(12, 5))\nax.plot(daily_sales[\"date\"], daily_sales[\"revenue\"], marker=\"o\", linewidth=2, markersize=4)\nax.fill_between(daily_sales[\"date\"], daily_sales[\"revenue\"], alpha=0.3)\nax.set_xlabel(\"Date\")\nax.set_ylabel(\"Revenue ($)\")\nax.set_title(\"Daily Revenue Trend\")\nax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f\"${x:,.0f}\"))\nplt.xticks(rotation=45)\nplt.tight_layout()\nplt.show()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conversion Funnel\n",
    "\n",
    "The data includes full customer journey metrics: impressions, visits, cart adds, and orders."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Conversion funnel\nfunnel_data = {\n    \"Impressions\": metrics_df[\"impressions\"].sum(),\n    \"Visits\": metrics_df[\"visits\"].sum(),\n    \"Cart Adds\": metrics_df[\"cart_adds\"].sum(),\n    \"Orders\": metrics_df[\"ordered_units\"].sum()\n}\n\nstages = list(funnel_data.keys())\nvalues = list(funnel_data.values())\n\nfig, ax = plt.subplots(figsize=(10, 6))\ncolors = sns.color_palette(\"Blues_r\", len(stages))\nbars = ax.barh(stages[::-1], values[::-1], color=colors)\nax.set_xlabel(\"Count\")\nax.set_title(\"Customer Journey Funnel\")\n\n# Add value labels\nfor bar, val in zip(bars, values[::-1]):\n    ax.text(val + max(values) * 0.01, bar.get_y() + bar.get_height() / 2,\n            f\"{val:,}\", va=\"center\", fontsize=10)\n\n# Add conversion rates\nprint(\"Conversion Rates:\")\nprint(f\"  Impressions → Visits: {values[1]/values[0]*100:.1f}%\")\nprint(f\"  Visits → Cart Adds: {values[2]/values[1]*100:.1f}%\")\nprint(f\"  Cart Adds → Orders: {values[3]/values[2]*100:.1f}%\")\nprint(f\"  Overall (Impressions → Orders): {values[3]/values[0]*100:.2f}%\")\n\nplt.tight_layout()\nplt.show()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Descriptive Analysis\n",
    "\n",
    "Let's dive deeper into the data patterns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Distribution of Order Values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Distribution of revenue per transaction\nfig, ax = plt.subplots(figsize=(10, 5))\nsns.histplot(metrics_df[\"revenue\"], bins=50, kde=True, ax=ax)\nax.set_xlabel(\"Revenue ($)\")\nax.set_ylabel(\"Frequency\")\nax.set_title(\"Distribution of Transaction Revenue\")\nax.axvline(metrics_df[\"revenue\"].mean(), color=\"red\", linestyle=\"--\", label=f\"Mean: ${metrics_df['revenue'].mean():,.2f}\")\nax.axvline(metrics_df[\"revenue\"].median(), color=\"orange\", linestyle=\"--\", label=f\"Median: ${metrics_df['revenue'].median():,.2f}\")\nax.legend()\nplt.tight_layout()\nplt.show()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Units per Order by Category"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Units per order by category\nfig, ax = plt.subplots(figsize=(12, 6))\norder = metrics_df.groupby(\"category\")[\"ordered_units\"].median().sort_values().index\nsns.boxplot(data=metrics_df, x=\"category\", y=\"ordered_units\", order=order, palette=\"viridis\", ax=ax)\nax.set_xlabel(\"Category\")\nax.set_ylabel(\"Ordered Units\")\nax.set_title(\"Distribution of Ordered Units by Category\")\nplt.xticks(rotation=45, ha=\"right\")\nplt.tight_layout()\nplt.show()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Correlation Between Metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Correlation heatmap of numeric metrics\nnumeric_cols = [\"price\", \"impressions\", \"visits\", \"cart_adds\", \"ordered_units\", \"revenue\"]\ncorrelation_matrix = metrics_df[numeric_cols].corr()\n\nfig, ax = plt.subplots(figsize=(8, 6))\nsns.heatmap(correlation_matrix, annot=True, cmap=\"coolwarm\", center=0,\n            fmt=\".2f\", square=True, ax=ax, linewidths=0.5)\nax.set_title(\"Correlation Matrix of Sales Metrics\")\nplt.tight_layout()\nplt.show()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Enrichment: Simulating Treatment Effects\n",
    "\n",
    "The package can simulate treatment effects (e.g., A/B test outcomes) by boosting sales for a subset of products starting at a specific date."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "from online_retail_simulator import enrich\n\n# Apply enrichment using config file (boost sales by 50% for 30% of products starting Nov 15)\nenrich_config_path = os.path.join(os.path.dirname(__file__) if \"__file__\" in dir() else \".\", \"config_enrichment.yaml\")\nenriched_job = enrich(enrich_config_path, job_info)\n\n# Load enriched results\nenriched_results = load_job_results(enriched_job)\nenriched_df = enriched_results[\"enriched\"]\nprint(f\"Applied enrichment to {len(enriched_df)} records\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Compare before and after: daily revenue time series\ndaily_original = metrics_df.groupby(\"date\")[\"revenue\"].sum().reset_index()\ndaily_original[\"date\"] = pd.to_datetime(daily_original[\"date\"])\ndaily_original[\"type\"] = \"Original\"\n\ndaily_enriched = enriched_df.groupby(\"date\")[\"revenue\"].sum().reset_index()\ndaily_enriched[\"date\"] = pd.to_datetime(daily_enriched[\"date\"])\ndaily_enriched[\"type\"] = \"Enriched\"\n\n# Plot comparison\nfig, ax = plt.subplots(figsize=(12, 6))\nax.plot(daily_original[\"date\"], daily_original[\"revenue\"], \n        marker=\"o\", linewidth=2, markersize=4, label=\"Original\", color=\"#1f77b4\")\nax.plot(daily_enriched[\"date\"], daily_enriched[\"revenue\"], \n        marker=\"s\", linewidth=2, markersize=4, label=\"Enriched\", color=\"#2ca02c\")\n\n# Mark enrichment start\nenrichment_start = pd.to_datetime(\"2024-11-15\")\nax.axvline(enrichment_start, color=\"red\", linestyle=\"--\", alpha=0.7, label=\"Enrichment Start\")\n\nax.set_xlabel(\"Date\")\nax.set_ylabel(\"Revenue ($)\")\nax.set_title(\"Daily Revenue: Before vs After Enrichment\")\nax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f\"${x:,.0f}\"))\nax.legend()\nplt.xticks(rotation=45)\nplt.tight_layout()\nplt.show()\n\n# Print lift metrics\npost_start = enriched_df[\"date\"] >= \"2024-11-15\"\noriginal_post_revenue = metrics_df[metrics_df[\"date\"] >= \"2024-11-15\"][\"revenue\"].sum()\nenriched_post_revenue = enriched_df[post_start][\"revenue\"].sum()\nlift = (enriched_post_revenue / original_post_revenue - 1) * 100\n\nprint(f\"\\nPost-enrichment period (Nov 15-30):\")\nprint(f\"  Original revenue:  ${original_post_revenue:,.2f}\")\nprint(f\"  Enriched revenue:  ${enriched_post_revenue:,.2f}\")\nprint(f\"  Revenue lift:      {lift:.1f}%\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Next Steps\n\nThis overview covers the basics of generating and exploring synthetic retail data. For more details:\n\n- **Full Documentation**: [Online Retail Simulator Docs](https://eisenhauerio.github.io/tools-catalog-generator/)\n- **Configuration Reference**: Learn about all available parameters\n- **API Reference**: Detailed function documentation\n- **Demo Scripts**: See `demo/` directory for more examples\n\n### Key Functions\n\n```python\n# Core simulation\nsimulate(config_path)         # Generate complete dataset\nsimulate_products()           # Generate product catalog only\nsimulate_metrics()            # Generate sales metrics\n\n# Enrichment\nenrich(config_path, job)      # Apply treatment effects\n\n# Results management\nload_job_results(job)         # Load all results\nlist_jobs()                   # List saved jobs\ncleanup_old_jobs(days=30)     # Clean up old outputs\n```"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}