API

This section provides detailed documentation for all public APIs in the Online Retail Simulator.

Core Functions

Online Retail Simulator - Generate synthetic retail data for experimentation.

online_retail_simulator.simulate(config_path, products_df=None, job_id=None)[source]

Runs simulate_products (or uses provided products), optionally simulate_product_details, and simulate_metrics.

All results are automatically saved to a job-based directory structure under the configured storage path.

Parameters:

config_path (str) – Path to configuration file
products_df (DataFrame | None) – Optional DataFrame of existing products. If provided, skips product generation and uses this DataFrame instead. Expected columns: product_identifier, category, price
job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Information about the saved job

Return type:

JobInfo

online_retail_simulator.simulate_products(config_path, job_id=None)[source]

Simulate products using the backend specified in config.

Parameters:

config_path (str) – Path to configuration file
job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Job containing products.csv

Return type:

JobInfo

online_retail_simulator.simulate_product_details(job_info, config_path)[source]

Simulate product details using configured backend.

Loads existing products, enriches with title/description/brand/features, and saves back to the same job.

Config example:

PRODUCT_DETAILS:: FUNCTION: simulate_product_details_mock # or simulate_product_details_ollama

Parameters:

job_info (JobInfo) – Job containing products.csv
config_path (str) – Path to configuration file

Returns:

Same job with updated products.csv

Return type:

JobInfo

online_retail_simulator.simulate_metrics(job_info, config_path)[source]

Simulate product metrics using the backend specified in config.

Parameters:

job_info – JobInfo containing products.csv
config_path (str) – Path to configuration file

Returns:

Same job, now also containing metrics.csv

Return type:

JobInfo

online_retail_simulator.enrich(config_path, job_info)[source]

Apply enrichment to metrics data using a config file.

Saves enriched results to the same job directory.

Parameters:

config_path (str) – Path to enrichment config (YAML or JSON)
job_info (JobInfo) – JobInfo object to load metrics data from

Returns:

Same job, now also containing enriched.csv and optionally potential_outcomes.csv

Return type:

JobInfo

online_retail_simulator.register_enrichment_function(name, func)[source]

Register an enrichment function.

Parameters:

name (str)
func (Callable)

Return type:

None

online_retail_simulator.register_enrichment_module(module_name)[source]

Register all compatible functions from a module.

Parameters:: module_name (str)
Return type:: None

online_retail_simulator.list_enrichment_functions()[source]

List all registered enrichment functions.

Return type:: List[str]

online_retail_simulator.clear_enrichment_registry()[source]

Clear all registered enrichment functions.

Return type:: None

online_retail_simulator.register_products_function(name, func)[source]

Register a products generation function.

Parameters:

name (str)
func (Callable)

Return type:

None

online_retail_simulator.register_metrics_function(name, func)[source]

Register a metrics generation function.

Parameters:

name (str)
func (Callable)

Return type:

None

online_retail_simulator.register_simulation_module(module_name, prefix='')[source]

Register all compatible functions from a module.

Functions are automatically detected based on their signatures: - Products functions: must have ‘config’ parameter - Metrics functions: must have ‘products’ and ‘config’ parameters

Parameters:

module_name (str)
prefix (str)

Return type:

None

online_retail_simulator.list_simulation_functions()[source]

List all registered simulation functions.

Return type:: Dict[str, List[str]]

online_retail_simulator.get_simulation_function(func_type, name)[source]

Get a registered simulation function.

Parameters:

func_type (str) – Type of function (‘products’ or ‘metrics’)
name (str) – Name of the function

Returns:

The registered function

Return type:

Callable

class online_retail_simulator.JobInfo(job_id, storage_path)[source]

Bases: object

Information about a simulation job and its storage location.

Parameters:

job_id (str)
storage_path (str)

job_id: str

storage_path: str

get_store()[source]

Get an ArtifactStore for this job’s directory.

Return type:: ArtifactStore

save_df(name, df)[source]

Save a DataFrame to this job’s directory.

Parameters:

name (str)
df (DataFrame)

Return type:

None

load_df(name)[source]

Load a DataFrame from this job’s directory.

Parameters:: name (str)
Return type:: DataFrame | None

__init__(job_id, storage_path)

Parameters:

job_id (str)
storage_path (str)

Return type:

None

online_retail_simulator.create_job(config, config_path, job_id=None)[source]

Create a new job directory with config.

Parameters:

config (Dict) – Configuration dictionary (expects STORAGE.PATH)
config_path (str) – Path to original config file
job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Information about the created job

Return type:

JobInfo

online_retail_simulator.load_job_results(job_info)[source]

Load simulation results for a job.

Parameters:: job_info (JobInfo) – JobInfo containing job details
Returns:: ‘products’, ‘metrics’, ‘enriched’
Return type:: Dict with available DataFrames
Raises:: FileNotFoundError – If job directory doesn’t exist

online_retail_simulator.load_job_metadata(job_info)[source]

Load metadata for a job.

Parameters:: job_info (JobInfo) – JobInfo containing job details
Returns:: Job metadata
Return type:: Dict
Raises:: FileNotFoundError – If job directory or metadata file doesn’t exist

online_retail_simulator.list_jobs(storage_path='.')[source]

List all available job IDs in a storage path.

Parameters:: storage_path (str) – Base path where job directories are stored
Returns:: List of job IDs sorted by creation time (newest first)
Return type:: List[str]

online_retail_simulator.cleanup_old_jobs(storage_path='.', keep_count=10)[source]

Clean up old job directories, keeping only the most recent ones.

Parameters:

storage_path (str) – Base path where job directories are stored
keep_count (int) – Number of recent jobs to keep

Returns:

List of removed job IDs

Return type:

List[str]

Simulation Module

Simulation module for generating synthetic retail data.

online_retail_simulator.simulate.simulate(config_path, products_df=None, job_id=None)[source]

Runs simulate_products (or uses provided products), optionally simulate_product_details, and simulate_metrics.

All results are automatically saved to a job-based directory structure under the configured storage path.

Parameters:

config_path (str) – Path to configuration file
products_df (DataFrame | None) – Optional DataFrame of existing products. If provided, skips product generation and uses this DataFrame instead. Expected columns: product_identifier, category, price
job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Information about the saved job

Return type:

JobInfo

online_retail_simulator.simulate.simulate_products(config_path, job_id=None)[source]

Simulate products using the backend specified in config.

Parameters:

config_path (str) – Path to configuration file
job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Job containing products.csv

Return type:

JobInfo

online_retail_simulator.simulate.simulate_product_details(job_info, config_path)[source]

Simulate product details using configured backend.

Loads existing products, enriches with title/description/brand/features, and saves back to the same job.

Config example:

PRODUCT_DETAILS:: FUNCTION: simulate_product_details_mock # or simulate_product_details_ollama

Parameters:

job_info (JobInfo) – Job containing products.csv
config_path (str) – Path to configuration file

Returns:

Same job with updated products.csv

Return type:

JobInfo

online_retail_simulator.simulate.simulate_metrics(job_info, config_path)[source]

Simulate product metrics using the backend specified in config.

Parameters:

job_info – JobInfo containing products.csv
config_path (str) – Path to configuration file

Returns:

Same job, now also containing metrics.csv

Return type:

JobInfo

online_retail_simulator.simulate.register_products_function(name, func)[source]

Register a products generation function.

Parameters:

name (str)
func (Callable)

Return type:

None

online_retail_simulator.simulate.register_metrics_function(name, func)[source]

Register a metrics generation function.

Parameters:

name (str)
func (Callable)

Return type:

None

online_retail_simulator.simulate.register_simulation_module(module_name, prefix='')[source]

Register all compatible functions from a module.

Functions are automatically detected based on their signatures: - Products functions: must have ‘config’ parameter - Metrics functions: must have ‘products’ and ‘config’ parameters

Parameters:

module_name (str)
prefix (str)

Return type:

None

online_retail_simulator.simulate.list_simulation_functions()[source]

List all registered simulation functions.

Return type:: Dict[str, List[str]]

online_retail_simulator.simulate.get_simulation_function(func_type, name)[source]

Get a registered simulation function.

Parameters:

func_type (str) – Type of function (‘products’ or ‘metrics’)
name (str) – Name of the function

Returns:

The registered function

Return type:

Callable

Products Generation

Interface for simulating products. Dispatches to appropriate backend based on config.

online_retail_simulator.simulate.products.simulate_products(config_path, job_id=None)[source]

Simulate products using the backend specified in config.

Parameters:

config_path (str) – Path to configuration file
job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Job containing products.csv

Return type:

JobInfo

Rule-based product simulation.

online_retail_simulator.simulate.products_rule_based.generate_random_product_identifier(rng, prefix='B')[source]

Generate a random product identifier. - 10 characters total - Alphanumeric - Defaults to starting with ‘B’

Parameters:

rng (Generator)
prefix (str)

Return type:

str

online_retail_simulator.simulate.products_rule_based.simulate_products_rule_based(config)[source]

Generate synthetic products (rule-based). :param config: Complete configuration dictionary

Returns:: DataFrame of products
Parameters:: config (Dict)
Return type:: DataFrame

Synthesizer-based product simulation. Reads a DataFrame from the path specified in config[‘SYNTHESIZER’][‘dataframe_path’]. No error handling, hard failures only.

online_retail_simulator.simulate.products_synthesizer_based.simulate_products_synthesizer_based(config)[source]

Generate synthetic products using Gaussian Copula synthesizer. :param config: Complete configuration dictionary

Returns:: DataFrame of synthetic products
Parameters:: config (Dict)
Return type:: DataFrame

Metrics Generation

Interface for simulating product metrics. Dispatches to appropriate backend based on config.

online_retail_simulator.simulate.metrics.simulate_metrics(job_info, config_path)[source]

Simulate product metrics using the backend specified in config.

Parameters:

job_info – JobInfo containing products.csv
config_path (str) – Path to configuration file

Returns:

Same job, now also containing metrics.csv

Return type:

JobInfo

Rule-based product metrics simulation (minimal skeleton).

online_retail_simulator.simulate.metrics_rule_based.simulate_metrics_rule_based(products, config)[source]

Generate synthetic product metrics with customer journey funnel (rule-based).

Simulates a realistic conversion funnel: impressions → visits → cart adds → orders.

Parameters:

products (DataFrame) – DataFrame of products
config (Dict) – Complete configuration dictionary

Returns:

DataFrame of product metrics (one row per product per time period). Columns: product_identifier, category, price, date, impressions, visits, cart_adds, ordered_units, revenue.

Return type:

DataFrame

Synthesizer-based simulation backend for metrics. Takes products DataFrame and config path. No error handling, hard failures only.

online_retail_simulator.simulate.metrics_synthesizer_based.simulate_metrics_synthesizer_based(products, config)[source]

Generate synthetic product metrics using Gaussian Copula synthesizer. :param products: DataFrame of products (unused in current implementation) :param config: Complete configuration dictionary

Returns:

DataFrame of synthetic metrics

Parameters:

products (DataFrame)
config (Dict)

Return type:

DataFrame

Enrichment Module

Enrichment module for applying treatments to sales data.

online_retail_simulator.enrich.enrich(config_path, job_info)[source]

Apply enrichment to metrics data using a config file.

Saves enriched results to the same job directory.

Parameters:

config_path (str) – Path to enrichment config (YAML or JSON)
job_info (JobInfo) – JobInfo object to load metrics data from

Returns:

Same job, now also containing enriched.csv and optionally potential_outcomes.csv

Return type:

JobInfo

online_retail_simulator.enrich.register_enrichment_function(name, func)[source]

Register an enrichment function.

Parameters:

name (str)
func (Callable)

Return type:

None

online_retail_simulator.enrich.register_enrichment_module(module_name)[source]

Register all compatible functions from a module.

Parameters:: module_name (str)
Return type:: None

online_retail_simulator.enrich.list_enrichment_functions()[source]

List all registered enrichment functions.

Return type:: List[str]

online_retail_simulator.enrich.clear_enrichment_registry()[source]

Clear all registered enrichment functions.

Return type:: None

Interface for applying enrichment treatments to metrics data. Dispatches to impact-based implementation based on config.

online_retail_simulator.enrich.enrichment.parse_impact_spec(impact_spec)[source]

Parse IMPACT specification into module, function, and params.

Supports dict format with capitalized keys: {“FUNCTION”: “product_detail_boost”, “PARAMS”: {“effect_size”: 0.5, “ramp_days”: 7}} {“MODULE”: “my_module”, “FUNCTION”: “my_func”, “PARAMS”: {…}} # MODULE ignored, kept for compatibility

Parameters:: impact_spec (Dict) – IMPACT specification from config (must be dict)
Returns:: Tuple of (module_name, function_name, params_dict)
Return type:: Tuple[str, str, Dict[str, Any]]

online_retail_simulator.enrich.enrichment.assign_enrichment(products, fraction, seed=None)[source]

Assign enrichment treatment to a fraction of products.

Parameters:

products (List[Dict]) – List of product dictionaries
fraction (float) – Fraction of products to enrich (0.0 to 1.0)
seed (int) – Random seed for reproducibility

Returns:

List of products with added ‘enriched’ boolean field

Return type:

List[Dict]

online_retail_simulator.enrich.enrichment.apply_enrichment_to_metrics(metrics, enriched_products, enrichment_start, effect_function, **kwargs)[source]

Apply enrichment treatment effect to metrics data.

Parameters:

metrics (List[Dict]) – List of metric record dictionaries
enriched_products (List[Dict]) – List of products with ‘enriched’ field
enrichment_start (str) – Start date of enrichment (YYYY-MM-DD)
effect_function (Callable) – Treatment effect function to apply
**kwargs – Additional parameters to pass to effect function

Returns:

List of modified metrics with treatment effect applied

Return type:

List[Dict]

online_retail_simulator.enrich.enrichment.enrich(config_path, df, job_info=None, products_df=None)[source]

Apply enrichment to a DataFrame using a config file.

Parameters:

config_path (str) – Path to enrichment config (YAML or JSON, local or S3)
df (DataFrame) – DataFrame with metrics data (must include product_identifier)
job_info – Optional JobInfo for product-aware enrichment functions
products_df – Optional products DataFrame for product-aware enrichment functions

Returns:

enriched_df: DataFrame with enrichment applied (factual version)
potential_outcomes_df: DataFrame with Y0/Y1 for all products, or None if not provided

Return type:

Tuple of (enriched_df, potential_outcomes_df)

Library of predefined treatment effect functions for catalog enrichment.

online_retail_simulator.enrich.enrichment_library.quantity_boost(metrics, **kwargs)[source]

Boost ordered units by a percentage for enriched products.

Parameters:

metrics (list) – List of metric record dictionaries
**kwargs – Parameters including: - effect_size: Percentage increase in ordered units (default: 0.5 for 50% boost) - enrichment_fraction: Fraction of products to enrich (default: 0.3) - enrichment_start: Start date of enrichment (default: “2024-11-15”) - seed: Random seed for product selection (default: 42) - min_units: Minimum units for enriched products with zero sales (default: 1)

Returns:

treated_metrics: List of modified metric dictionaries with treatment applied
potential_outcomes_df: DataFrame with Y0_revenue and Y1_revenue for all products

Return type:

Tuple of (treated_metrics, potential_outcomes_df)

online_retail_simulator.enrich.enrichment_library.probability_boost(metrics, **kwargs)[source]

Boost sale probability (simulated by ordered units increase as proxy).

Parameters:

metrics (list) – List of metric record dictionaries
**kwargs – Same parameters as quantity_boost

Returns:

Tuple of (treated_metrics, potential_outcomes_df) - same as quantity_boost

Return type:

tuple

online_retail_simulator.enrich.enrichment_library.product_detail_boost(metrics, **kwargs)[source]

Product detail regeneration and metrics boost for enrichment experiments.

Selects a fraction of products for treatment, regenerates their product details (title, description, features) while preserving brand/category/price, and applies metrics boost effect.

Parameters:

metrics (list) – List of metric record dictionaries
**kwargs – Parameters including: - job_info: JobInfo for saving product artifacts (required for saving) - products: List of product dictionaries (required for product details) - effect_size: Percentage increase in ordered units (default: 0.5) - ramp_days: Number of days for ramp-up period (default: 7) - enrichment_fraction: Fraction of products to enrich (default: 0.3) - enrichment_start: Start date of enrichment (default: “2024-11-15”) - seed: Random seed for product selection (default: 42) - prompt_path: Path to custom prompt template file (optional) - backend: Backend to use for regeneration (“mock” or “ollama”, default: “mock”)

Returns:

treated_metrics: List of modified metric dictionaries with treatment applied
potential_outcomes_df: DataFrame with Y0_revenue and Y1_revenue for all products

Return type:

Tuple of (treated_metrics, potential_outcomes_df)

Impact-based enrichment registry for custom user-defined enrichment functions.

This module provides a registration system that allows users to register their own impact-based enrichment functions.

online_retail_simulator.enrich.enrichment_registry.register_enrichment_function(name, func)[source]

Register an enrichment function.

Parameters:

name (str)
func (Callable)

Return type:

None

online_retail_simulator.enrich.enrichment_registry.register_enrichment_module(module_name)[source]

Register all compatible functions from a module.

Parameters:: module_name (str)
Return type:: None

online_retail_simulator.enrich.enrichment_registry.list_enrichment_functions()[source]

List all registered enrichment functions.

Return type:: List[str]

online_retail_simulator.enrich.enrichment_registry.clear_enrichment_registry()[source]

Clear all registered enrichment functions.

Return type:: None

online_retail_simulator.enrich.enrichment_registry.load_effect_function(module_name, function_name)[source]

Load treatment effect function from registry.

Parameters:

module_name (str) – Name of module (ignored, kept for backward compatibility)
function_name (str) – Name of function in registry

Returns:

Treatment effect function

Return type:

Callable

Configuration Module

Configuration processing with defaults and validation.

online_retail_simulator.config_processor.load_defaults()[source]

Load default configuration from package.

Return type:: Dict[str, Any]

online_retail_simulator.config_processor.get_impact_defaults(function_name)[source]

Get default parameters for an IMPACT enrichment function.

Parameters:: function_name (str) – Name of the enrichment function (e.g., “product_detail_boost”)
Returns:: Dictionary of default parameters for the function, or empty dict if not found
Return type:: Dict[str, Any]

online_retail_simulator.config_processor.deep_merge(base, override)[source]

Deep merge two dictionaries, with override values taking precedence.

Parameters:

base (Dict) – Base dictionary (defaults)
override (Dict) – Override dictionary (user config)

Returns:

Merged dictionary

Return type:

Dict

online_retail_simulator.config_processor.validate_config(config)[source]

Validate configuration has required fields and valid parameters.

Parameters:: config (Dict[str, Any])
Return type:: None

online_retail_simulator.config_processor.process_config(config_path)[source]

Load, merge with defaults, and validate configuration.

Parameters:

config_path (str) – Path to user configuration file (local or S3)

Returns:

Complete validated configuration

Raises:

FileNotFoundError – If config file doesn’t exist
ValueError – If configuration is invalid

Return type:

Dict[str, Any]