Usage

Workflow

Every simulation follows the same steps regardless of which backend is used.

1. Write a YAML configuration file. The config selects a generation backend (RULE or SYNTHESIZER), sets the parameters for product characteristics and metrics, and optionally configures storage. See Configuration for the full parameter reference.

STORAGE:
  PATH: "output/demo"

RULE:
  CHARACTERISTICS:
    FUNCTION: simulate_characteristics_rule_based
    PARAMS:
      num_products: 50
      seed: 42
  METRICS:
    FUNCTION: simulate_metrics_rule_based
    PARAMS:
      date_start: "2024-01-01"
      date_end: "2024-01-31"
      sale_prob: 0.7

2. Run the simulation.

from online_retail_simulator import simulate

job_info = simulate("config.yaml")

The simulator generates product characteristics, adds product details (title, description, brand, features), and produces daily or weekly sales metrics. The return value is a JobInfo object that tracks all output artifacts.

3. Load and inspect results.

from online_retail_simulator import load_job_results

results = load_job_results(job_info)
products_df = results["products"]
metrics_df = results["metrics"]

Output

Each run creates a job directory under the configured storage path, named job-YYYYMMDD-HHMMSS-{uuid}. The directory contains:

File	Description
`products.csv`	Product catalog with characteristics, titles, descriptions, brands, and features
`sales.csv`	Daily or weekly sales metrics per product
`metadata.json`	Job metadata including configuration, timestamps, and row counts
`config.yaml`	Copy of the configuration used for this run

Products include a quality_score (0.0 to 1.0) reflecting data completeness based on title, description, features, and brand. The score affects conversion probability in metrics simulation.

Optional: Enrichment

Enrichment applies controlled treatment effects to simulated data. This is useful for testing causal inference pipelines against known ground truth.

1. Write an enrichment configuration.

IMPACT:
  FUNCTION: "combined_boost"
  PARAMS:
    effect_size: 0.5
    ramp_days: 7
    enrichment_fraction: 0.3
    enrichment_start: "2024-01-15"
    seed: 42

2. Apply enrichment to an existing simulation.

from online_retail_simulator import enrich

enriched_job = enrich("enrichment_config.yaml", job_info)

Available Backends

Each backend generates product characteristics and sales metrics using a different strategy.

Backend	Config Key	Description
Rule-based	`RULE`	Deterministic generation using configurable rules and probability distributions
Synthesizer-based	`SYNTHESIZER`	ML-based generation using SDV (Synthetic Data Vault) learned from real data

Available Enrichment Functions

Function	Description
`combined_boost`	Gradual rollout with ramp-up period and partial product treatment (most realistic)
`quantity_boost`	Immediate multiplicative increase in ordered units
`probability_boost`	Alias for `quantity_boost` (probability reflected in quantity for existing records)

Custom enrichment functions can be registered at runtime. See Configuration for parameter details.