Configuration
The Online Retail Simulator uses YAML configuration files to control all aspects of data generation and enrichment. This guide documents the actual configuration schema as implemented in the code.
See Also: For practical examples, see the Demo notebook.
Configuration Structure
All configuration files follow this hierarchical structure:
STORAGE: # Optional: Output storage settings
PATH: "output/run" # Directory for job-based storage
RULE: # Rule-based mode (use RULE or SYNTHESIZER, not both)
CHARACTERISTICS:
FUNCTION: function_name
PARAMS:
# Function-specific parameters
METRICS:
FUNCTION: function_name
PARAMS:
# Function-specific parameters
SYNTHESIZER: # ML-based mode (alternative to RULE)
CHARACTERISTICS:
FUNCTION: function_name
PARAMS:
# Function-specific parameters
METRICS:
FUNCTION: function_name
PARAMS:
# Function-specific parameters
Storage Configuration
STORAGE.PATH
Controls where simulation results are saved.
STORAGE:
PATH: "output/myproject" # Base directory for job storage
Behavior:
Each simulation creates a unique job directory under this path
Job directories are named:
job-YYYYMMDD-HHMMSS-{uuid}Contains:
products.csv,sales.csv,metadata.json,config.yamlDefault:
output/run
Rule-Based Configuration
Rule-based mode uses deterministic algorithms. You must specify exactly one of RULE or SYNTHESIZER.
Complete Example
STORAGE:
PATH: "output/sim_demo"
RULE:
CHARACTERISTICS:
FUNCTION: simulate_characteristics_rule_based
PARAMS:
num_products: 50
seed: null # Optional: set for reproducibility
METRICS:
FUNCTION: simulate_metrics_rule_based
PARAMS:
date_start: "2024-11-01"
date_end: "2024-11-30"
sale_prob: 0.7
seed: null # Optional: set for reproducibility
granularity: "daily" # or "weekly"
impression_to_visit_rate: 0.15
visit_to_cart_rate: 0.25
cart_to_order_rate: 0.80
RULE.CHARACTERISTICS Parameters
Function: simulate_characteristics_rule_based
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
int |
100 |
Total number of products to generate |
|
int or null |
null |
Random seed for reproducibility |
Example:
RULE:
CHARACTERISTICS:
FUNCTION: simulate_characteristics_rule_based
PARAMS:
num_products: 200
seed: 42
RULE.METRICS Parameters
Function: simulate_metrics_rule_based
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
“2024-01-01” |
Start date (YYYY-MM-DD) |
|
string |
“2024-01-31” |
End date (YYYY-MM-DD) |
|
float |
0.7 |
Daily probability of sale per product (0.0-1.0) |
|
int or null |
null |
Random seed for reproducibility |
|
string |
“daily” |
Time granularity: “daily” or “weekly” |
|
float |
0.15 |
Funnel: impressions → visits conversion rate |
|
float |
0.25 |
Funnel: visits → cart adds conversion rate |
|
float |
0.80 |
Funnel: cart adds → orders conversion rate |
Example:
RULE:
METRICS:
FUNCTION: simulate_metrics_rule_based
PARAMS:
date_start: "2024-01-01"
date_end: "2024-12-31"
sale_prob: 0.6
seed: 42
granularity: "daily"
Granularity: Daily vs Weekly
The granularity parameter controls output aggregation:
Daily (default):
One row per product per day
Example: 10 products × 31 days = 310 rows
Weekly:
One row per product per week (ISO week: Monday-Sunday)
Date range auto-expanded to full week boundaries
All funnel metrics summed across the week
datecolumn shows Monday of each weekExample: 10 products × 5 weeks = 50 rows
Weekly Example:
RULE:
METRICS:
FUNCTION: simulate_metrics_rule_based
PARAMS:
date_start: "2024-01-03" # Expanded to 2024-01-01 (Monday)
date_end: "2024-01-25" # Expanded to 2024-01-28 (Sunday)
granularity: "weekly"
sale_prob: 0.7
Funnel Conversion Rates
Control the customer journey funnel:
Impressions → Visits → Cart Adds → Orders
↓15% ↓25% ↓80%
Example with custom funnel:
RULE:
METRICS:
FUNCTION: simulate_metrics_rule_based
PARAMS:
date_start: "2024-11-01"
date_end: "2024-11-30"
sale_prob: 0.7
impression_to_visit_rate: 0.20 # 20% of impressions → visits
visit_to_cart_rate: 0.30 # 30% of visits → cart adds
cart_to_order_rate: 0.85 # 85% of cart adds → orders
Synthesizer-Based Configuration
ML-based mode uses SDV (Synthetic Data Vault) for sophisticated pattern learning.
Complete Example
STORAGE:
PATH: "output/ml_demo"
SYNTHESIZER:
CHARACTERISTICS:
FUNCTION: gaussian_copula
PARAMS:
training_data_path: "data/real_products.csv" # Required
num_rows: 100
seed: null
METRICS:
FUNCTION: gaussian_copula
PARAMS:
training_data_path: "data/real_sales.csv" # Required
num_rows: 1000
seed: null
SYNTHESIZER.CHARACTERISTICS Parameters
Function: gaussian_copula
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Path to CSV file with real product data |
|
int |
100 |
Number of synthetic products to generate |
|
int or null |
null |
Random seed for reproducibility |
SYNTHESIZER.METRICS Parameters
Function: gaussian_copula
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Path to CSV file with real sales data |
|
int |
1000 |
Number of synthetic sales records to generate |
|
int |
null |
Random seed for reproducibility |
Enrichment Configuration
Enrichment configurations are separate YAML files used with the enrich() function.
Structure
IMPACT:
FUNCTION: "function_name"
PARAMS:
# Function-specific parameters
Built-in Enrichment Functions
combined_boost
Gradual rollout with partial treatment (most realistic).
IMPACT:
FUNCTION: "combined_boost"
PARAMS:
effect_size: 0.5 # 50% boost in ordered_units
ramp_days: 7 # Gradual ramp-up over 7 days
enrichment_fraction: 0.3 # 30% of products get enriched
enrichment_start: "2024-11-15" # Start date
seed: 42 # Reproducible product selection
Parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
float |
0.5 |
Maximum boost (0.5 = 50% increase) |
|
int |
7 |
Days for effect to reach full strength |
|
float |
0.3 |
Fraction of products to enrich (0.0-1.0) |
|
string |
“2024-11-15” |
Start date (YYYY-MM-DD) |
|
int |
42 |
Random seed for product selection |
Behavior:
Selects random fraction of products for enrichment
Effect ramps up linearly over
ramp_daysAfter ramp period, full
effect_sizeis appliedIncreases
ordered_unitsfor enriched productsRecalculates
revenue=ordered_units×price
quantity_boost
Simple multiplicative boost (no ramp-up).
IMPACT:
FUNCTION: "quantity_boost"
PARAMS:
effect_size: 0.5
enrichment_fraction: 0.3
enrichment_start: "2024-11-15"
seed: 42
Parameters: Same as combined_boost except no ramp_days.
Behavior: Immediate full effect on enrichment start date.
probability_boost
Alias for quantity_boost (probability reflected in quantity for existing sales).
IMPACT:
FUNCTION: "probability_boost"
PARAMS:
effect_size: 0.5
enrichment_fraction: 0.3
enrichment_start: "2024-11-15"
seed: 42
Product Details Configuration
Product details generation adds titles, descriptions, brands, and features to products created by simulate_characteristics().
Structure
PRODUCT_DETAILS:
FUNCTION: "function_name"
Built-in Product Details Functions
simulate_product_details_mock
Rule-based generation using templates (default, no external dependencies).
PRODUCT_DETAILS:
FUNCTION: simulate_product_details_mock
Generates realistic mock data based on product category. No additional parameters required.
simulate_product_details_ollama
LLM-based generation using local Ollama for more realistic content.
PRODUCT_DETAILS:
FUNCTION: simulate_product_details_ollama
Requires Ollama running locally at http://localhost:11434 with a compatible model.
Complete Example
STORAGE:
PATH: "output/product_catalog"
RULE:
CHARACTERISTICS:
FUNCTION: simulate_characteristics_rule_based
PARAMS:
num_products: 50
seed: 42
PRODUCT_DETAILS:
FUNCTION: simulate_product_details_mock
Usage
from online_retail_simulator import simulate_characteristics, simulate_product_details, load_job_results
# Generate base products
job_info = simulate_characteristics("config.yaml")
# Add product details
job_info = simulate_product_details(job_info, "config.yaml")
# Load enriched products
results = load_job_results(job_info)
products_df = results["products"]
# products_df now includes: title, description, brand, features
Configuration Validation
The config processor validates all configurations and provides clear error messages:
Common Errors:
❌ Including both
RULEandSYNTHESIZER(choose one)❌ Missing required parameters
❌ Typos in parameter names
❌ Invalid parameter types
❌
training_data_path: nullfor synthesizer mode
Example Validation Error:
ValueError: Unexpected parameters for RULE.METRICS.simulate_metrics_rule_based:
['SALE_PROB']. Expected: ['cart_to_order_rate', 'date_end', 'date_start',
'granularity', 'impression_to_visit_rate', 'sale_prob', 'seed', 'visit_to_cart_rate']
Complete Configuration Examples
Minimal Configuration
RULE:
CHARACTERISTICS:
FUNCTION: simulate_characteristics_rule_based
PARAMS:
num_products: 20
METRICS:
FUNCTION: simulate_metrics_rule_based
PARAMS:
date_start: "2024-11-01"
date_end: "2024-11-07"
sale_prob: 0.8
Production Configuration
STORAGE:
PATH: "output/production_sim"
RULE:
CHARACTERISTICS:
FUNCTION: simulate_characteristics_rule_based
PARAMS:
num_products: 1000
seed: 42
METRICS:
FUNCTION: simulate_metrics_rule_based
PARAMS:
date_start: "2024-01-01"
date_end: "2024-12-31"
sale_prob: 0.65
seed: 42
granularity: "daily"
impression_to_visit_rate: 0.18
visit_to_cart_rate: 0.28
cart_to_order_rate: 0.82
Weekly Aggregation Configuration
STORAGE:
PATH: "output/weekly_metrics"
RULE:
CHARACTERISTICS:
FUNCTION: simulate_characteristics_rule_based
PARAMS:
num_products: 100
seed: 42
METRICS:
FUNCTION: simulate_metrics_rule_based
PARAMS:
date_start: "2024-01-01"
date_end: "2024-12-31"
sale_prob: 0.7
seed: 42
granularity: "weekly" # Weekly aggregation
Custom Functions
You can register custom simulation and enrichment functions:
from online_retail_simulator import register_metrics_function
def my_custom_metrics(product_characteristics, config):
# Your custom implementation
return sales_df
# Register it
register_metrics_function("my_custom_metrics", my_custom_metrics)
Then use in configuration:
RULE:
METRICS:
FUNCTION: my_custom_metrics
PARAMS:
# Your custom parameters (no validation)
my_param1: value1
my_param2: value2
Note: Custom functions skip parameter validation since their schemas aren’t known at config time.
Next Steps
Examples: See the Demo notebook for practical examples
API: See API Reference for function documentation
Architecture: See Architecture for system internals