# Usage ## Workflow Every simulation follows the same steps regardless of which backend is used. **1. Write a YAML configuration file.** The config selects a generation backend (`RULE` or `SYNTHESIZER`), sets the parameters for product characteristics and metrics, and optionally configures storage. See [Configuration](configuration.md) for the full parameter reference. ```yaml STORAGE: PATH: "output/demo" RULE: CHARACTERISTICS: FUNCTION: simulate_characteristics_rule_based PARAMS: num_products: 50 seed: 42 METRICS: FUNCTION: simulate_metrics_rule_based PARAMS: date_start: "2024-01-01" date_end: "2024-01-31" sale_prob: 0.7 ``` **2. Run the simulation.** ```python from online_retail_simulator import simulate job_info = simulate("config.yaml") ``` The simulator generates product characteristics, adds product details (title, description, brand, features), and produces daily or weekly sales metrics. The return value is a `JobInfo` object that tracks all output artifacts. **3. Load and inspect results.** ```python from online_retail_simulator import load_job_results results = load_job_results(job_info) products_df = results["products"] metrics_df = results["metrics"] ``` --- ## Output Each run creates a job directory under the configured storage path, named `job-YYYYMMDD-HHMMSS-{uuid}`. The directory contains: | File | Description | |------|-------------| | `products.csv` | Product catalog with characteristics, titles, descriptions, brands, and features | | `sales.csv` | Daily or weekly sales metrics per product | | `metadata.json` | Job metadata including configuration, timestamps, and row counts | | `config.yaml` | Copy of the configuration used for this run | Products include a `quality_score` (0.0 to 1.0) reflecting data completeness based on title, description, features, and brand. The score affects conversion probability in metrics simulation. --- ## Optional: Enrichment Enrichment applies controlled treatment effects to simulated data. This is useful for testing causal inference pipelines against known ground truth. **1. Write an enrichment configuration.** ```yaml IMPACT: FUNCTION: "combined_boost" PARAMS: effect_size: 0.5 ramp_days: 7 enrichment_fraction: 0.3 enrichment_start: "2024-01-15" seed: 42 ``` **2. Apply enrichment to an existing simulation.** ```python from online_retail_simulator import enrich enriched_job = enrich("enrichment_config.yaml", job_info) ``` --- ## Available Backends Each backend generates product characteristics and sales metrics using a different strategy. | Backend | Config Key | Description | |---------|------------|-------------| | Rule-based | `RULE` | Deterministic generation using configurable rules and probability distributions | | Synthesizer-based | `SYNTHESIZER` | ML-based generation using SDV (Synthetic Data Vault) learned from real data | --- ## Available Enrichment Functions | Function | Description | |----------|-------------| | `combined_boost` | Gradual rollout with ramp-up period and partial product treatment (most realistic) | | `quantity_boost` | Immediate multiplicative increase in ordered units | | `probability_boost` | Alias for `quantity_boost` (probability reflected in quantity for existing records) | Custom enrichment functions can be registered at runtime. See [Configuration](configuration.md) for parameter details.