Usage

Workflow

Every simulation follows the same steps regardless of which backend is used.

1. Write a YAML configuration file. The config selects a generation backend (RULE or SYNTHESIZER), sets the parameters for product characteristics and metrics, and optionally configures storage. See Configuration for the full parameter reference.

STORAGE:
  PATH: "output/demo"

RULE:
  CHARACTERISTICS:
    FUNCTION: simulate_characteristics_rule_based
    PARAMS:
      num_products: 50
      seed: 42
  METRICS:
    FUNCTION: simulate_metrics_rule_based
    PARAMS:
      date_start: "2024-01-01"
      date_end: "2024-01-31"
      sale_prob: 0.7

2. Run the simulation.

from online_retail_simulator import simulate

job_info = simulate("config.yaml")

The simulator generates product characteristics, adds product details (title, description, brand, features), and produces daily or weekly sales metrics. The return value is a JobInfo object that tracks all output artifacts.

3. Load and inspect results.

from online_retail_simulator import load_job_results

results = load_job_results(job_info)
products_df = results["products"]
metrics_df = results["metrics"]

Output

Each run creates a job directory under the configured storage path, named job-YYYYMMDD-HHMMSS-{uuid}. The directory contains:

File

Description

products.csv

Product catalog with characteristics, titles, descriptions, brands, and features

sales.csv

Daily or weekly sales metrics per product

metadata.json

Job metadata including configuration, timestamps, and row counts

config.yaml

Copy of the configuration used for this run

Products include a quality_score (0.0 to 1.0) reflecting data completeness based on title, description, features, and brand. The score affects conversion probability in metrics simulation.


Optional: Enrichment

Enrichment applies controlled treatment effects to simulated data. This is useful for testing causal inference pipelines against known ground truth.

1. Write an enrichment configuration.

IMPACT:
  FUNCTION: "combined_boost"
  PARAMS:
    effect_size: 0.5
    ramp_days: 7
    enrichment_fraction: 0.3
    enrichment_start: "2024-01-15"
    seed: 42

2. Apply enrichment to an existing simulation.

from online_retail_simulator import enrich

enriched_job = enrich("enrichment_config.yaml", job_info)

Available Backends

Each backend generates product characteristics and sales metrics using a different strategy.

Backend

Config Key

Description

Rule-based

RULE

Deterministic generation using configurable rules and probability distributions

Synthesizer-based

SYNTHESIZER

ML-based generation using SDV (Synthetic Data Vault) learned from real data


Available Enrichment Functions

Function

Description

combined_boost

Gradual rollout with ramp-up period and partial product treatment (most realistic)

quantity_boost

Immediate multiplicative increase in ordered units

probability_boost

Alias for quantity_boost (probability reflected in quantity for existing records)

Custom enrichment functions can be registered at runtime. See Configuration for parameter details.