API

This section provides detailed documentation for all public APIs in the Online Retail Simulator.

Core Functions

Online Retail Simulator - Generate synthetic retail data for experimentation.

online_retail_simulator.simulate(config_path, products_df=None, job_id=None)[source]

Runs simulate_products (or uses provided products), optionally simulate_product_details, and simulate_metrics.

All results are automatically saved to a job-based directory structure under the configured storage path.

Parameters:
  • config_path (str) – Path to configuration file

  • products_df (DataFrame | None) – Optional DataFrame of existing products. If provided, skips product generation and uses this DataFrame instead. Expected columns: product_identifier, category, price

  • job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Information about the saved job

Return type:

JobInfo

online_retail_simulator.simulate_products(config_path, job_id=None)[source]

Simulate products using the backend specified in config.

Parameters:
  • config_path (str) – Path to configuration file

  • job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Job containing products.csv

Return type:

JobInfo

online_retail_simulator.simulate_product_details(job_info, config_path)[source]

Simulate product details using configured backend.

Loads existing products, enriches with title/description/brand/features, and saves back to the same job.

Config example:
PRODUCT_DETAILS:

FUNCTION: simulate_product_details_mock # or simulate_product_details_ollama

Parameters:
  • job_info (JobInfo) – Job containing products.csv

  • config_path (str) – Path to configuration file

Returns:

Same job with updated products.csv

Return type:

JobInfo

online_retail_simulator.simulate_metrics(job_info, config_path)[source]

Simulate product metrics using the backend specified in config.

Parameters:
  • job_info – JobInfo containing products.csv

  • config_path (str) – Path to configuration file

Returns:

Same job, now also containing metrics.csv

Return type:

JobInfo

online_retail_simulator.enrich(config_path, job_info)[source]

Apply enrichment to metrics data using a config file.

Saves enriched results to the same job directory.

Parameters:
  • config_path (str) – Path to enrichment config (YAML or JSON)

  • job_info (JobInfo) – JobInfo object to load metrics data from

Returns:

Same job, now also containing enriched.csv and optionally potential_outcomes.csv

Return type:

JobInfo

online_retail_simulator.register_enrichment_function(name, func)[source]

Register an enrichment function.

Parameters:
Return type:

None

online_retail_simulator.register_enrichment_module(module_name)[source]

Register all compatible functions from a module.

Parameters:

module_name (str)

Return type:

None

online_retail_simulator.list_enrichment_functions()[source]

List all registered enrichment functions.

Return type:

List[str]

online_retail_simulator.clear_enrichment_registry()[source]

Clear all registered enrichment functions.

Return type:

None

online_retail_simulator.register_products_function(name, func)[source]

Register a products generation function.

Parameters:
Return type:

None

online_retail_simulator.register_metrics_function(name, func)[source]

Register a metrics generation function.

Parameters:
Return type:

None

online_retail_simulator.register_simulation_module(module_name, prefix='')[source]

Register all compatible functions from a module.

Functions are automatically detected based on their signatures: - Products functions: must have ‘config’ parameter - Metrics functions: must have ‘products’ and ‘config’ parameters

Parameters:
  • module_name (str)

  • prefix (str)

Return type:

None

online_retail_simulator.list_simulation_functions()[source]

List all registered simulation functions.

Return type:

Dict[str, List[str]]

online_retail_simulator.get_simulation_function(func_type, name)[source]

Get a registered simulation function.

Parameters:
  • func_type (str) – Type of function (‘products’ or ‘metrics’)

  • name (str) – Name of the function

Returns:

The registered function

Return type:

Callable

class online_retail_simulator.JobInfo(job_id, storage_path)[source]

Bases: object

Information about a simulation job and its storage location.

Parameters:
  • job_id (str)

  • storage_path (str)

job_id: str
storage_path: str
get_store()[source]

Get an ArtifactStore for this job’s directory.

Return type:

ArtifactStore

save_df(name, df)[source]

Save a DataFrame to this job’s directory.

Parameters:
  • name (str)

  • df (DataFrame)

Return type:

None

load_df(name)[source]

Load a DataFrame from this job’s directory.

Parameters:

name (str)

Return type:

DataFrame | None

__init__(job_id, storage_path)
Parameters:
  • job_id (str)

  • storage_path (str)

Return type:

None

online_retail_simulator.create_job(config, config_path, job_id=None)[source]

Create a new job directory with config.

Parameters:
  • config (Dict) – Configuration dictionary (expects STORAGE.PATH)

  • config_path (str) – Path to original config file

  • job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Information about the created job

Return type:

JobInfo

online_retail_simulator.load_job_results(job_info)[source]

Load simulation results for a job.

Parameters:

job_info (JobInfo) – JobInfo containing job details

Returns:

‘products’, ‘metrics’, ‘enriched’

Return type:

Dict with available DataFrames

Raises:

FileNotFoundError – If job directory doesn’t exist

online_retail_simulator.load_job_metadata(job_info)[source]

Load metadata for a job.

Parameters:

job_info (JobInfo) – JobInfo containing job details

Returns:

Job metadata

Return type:

Dict

Raises:

FileNotFoundError – If job directory or metadata file doesn’t exist

online_retail_simulator.list_jobs(storage_path='.')[source]

List all available job IDs in a storage path.

Parameters:

storage_path (str) – Base path where job directories are stored

Returns:

List of job IDs sorted by creation time (newest first)

Return type:

List[str]

online_retail_simulator.cleanup_old_jobs(storage_path='.', keep_count=10)[source]

Clean up old job directories, keeping only the most recent ones.

Parameters:
  • storage_path (str) – Base path where job directories are stored

  • keep_count (int) – Number of recent jobs to keep

Returns:

List of removed job IDs

Return type:

List[str]

Simulation Module

Simulation module for generating synthetic retail data.

online_retail_simulator.simulate.simulate(config_path, products_df=None, job_id=None)[source]

Runs simulate_products (or uses provided products), optionally simulate_product_details, and simulate_metrics.

All results are automatically saved to a job-based directory structure under the configured storage path.

Parameters:
  • config_path (str) – Path to configuration file

  • products_df (DataFrame | None) – Optional DataFrame of existing products. If provided, skips product generation and uses this DataFrame instead. Expected columns: product_identifier, category, price

  • job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Information about the saved job

Return type:

JobInfo

online_retail_simulator.simulate.simulate_products(config_path, job_id=None)[source]

Simulate products using the backend specified in config.

Parameters:
  • config_path (str) – Path to configuration file

  • job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Job containing products.csv

Return type:

JobInfo

online_retail_simulator.simulate.simulate_product_details(job_info, config_path)[source]

Simulate product details using configured backend.

Loads existing products, enriches with title/description/brand/features, and saves back to the same job.

Config example:
PRODUCT_DETAILS:

FUNCTION: simulate_product_details_mock # or simulate_product_details_ollama

Parameters:
  • job_info (JobInfo) – Job containing products.csv

  • config_path (str) – Path to configuration file

Returns:

Same job with updated products.csv

Return type:

JobInfo

online_retail_simulator.simulate.simulate_metrics(job_info, config_path)[source]

Simulate product metrics using the backend specified in config.

Parameters:
  • job_info – JobInfo containing products.csv

  • config_path (str) – Path to configuration file

Returns:

Same job, now also containing metrics.csv

Return type:

JobInfo

online_retail_simulator.simulate.register_products_function(name, func)[source]

Register a products generation function.

Parameters:
Return type:

None

online_retail_simulator.simulate.register_metrics_function(name, func)[source]

Register a metrics generation function.

Parameters:
Return type:

None

online_retail_simulator.simulate.register_simulation_module(module_name, prefix='')[source]

Register all compatible functions from a module.

Functions are automatically detected based on their signatures: - Products functions: must have ‘config’ parameter - Metrics functions: must have ‘products’ and ‘config’ parameters

Parameters:
  • module_name (str)

  • prefix (str)

Return type:

None

online_retail_simulator.simulate.list_simulation_functions()[source]

List all registered simulation functions.

Return type:

Dict[str, List[str]]

online_retail_simulator.simulate.get_simulation_function(func_type, name)[source]

Get a registered simulation function.

Parameters:
  • func_type (str) – Type of function (‘products’ or ‘metrics’)

  • name (str) – Name of the function

Returns:

The registered function

Return type:

Callable

Products Generation

Interface for simulating products. Dispatches to appropriate backend based on config.

online_retail_simulator.simulate.products.simulate_products(config_path, job_id=None)[source]

Simulate products using the backend specified in config.

Parameters:
  • config_path (str) – Path to configuration file

  • job_id (str | None) – Optional job ID, auto-generated if not provided

Returns:

Job containing products.csv

Return type:

JobInfo

Rule-based product simulation.

online_retail_simulator.simulate.products_rule_based.generate_random_product_identifier(rng, prefix='B')[source]

Generate a random product identifier. - 10 characters total - Alphanumeric - Defaults to starting with ‘B’

Parameters:
  • rng (Generator)

  • prefix (str)

Return type:

str

online_retail_simulator.simulate.products_rule_based.simulate_products_rule_based(config)[source]

Generate synthetic products (rule-based). :param config: Complete configuration dictionary

Returns:

DataFrame of products

Parameters:

config (Dict)

Return type:

DataFrame

Synthesizer-based product simulation. Reads a DataFrame from the path specified in config[‘SYNTHESIZER’][‘dataframe_path’]. No error handling, hard failures only.

online_retail_simulator.simulate.products_synthesizer_based.simulate_products_synthesizer_based(config)[source]

Generate synthetic products using Gaussian Copula synthesizer. :param config: Complete configuration dictionary

Returns:

DataFrame of synthetic products

Parameters:

config (Dict)

Return type:

DataFrame

Metrics Generation

Interface for simulating product metrics. Dispatches to appropriate backend based on config.

online_retail_simulator.simulate.metrics.simulate_metrics(job_info, config_path)[source]

Simulate product metrics using the backend specified in config.

Parameters:
  • job_info – JobInfo containing products.csv

  • config_path (str) – Path to configuration file

Returns:

Same job, now also containing metrics.csv

Return type:

JobInfo

Rule-based product metrics simulation (minimal skeleton).

online_retail_simulator.simulate.metrics_rule_based.simulate_metrics_rule_based(products, config)[source]

Generate synthetic product metrics with customer journey funnel (rule-based).

Simulates a realistic conversion funnel: impressions → visits → cart adds → orders.

Parameters:
  • products (DataFrame) – DataFrame of products

  • config (Dict) – Complete configuration dictionary

Returns:

DataFrame of product metrics (one row per product per time period). Columns: product_identifier, category, price, date, impressions, visits, cart_adds, ordered_units, revenue.

Return type:

DataFrame

Synthesizer-based simulation backend for metrics. Takes products DataFrame and config path. No error handling, hard failures only.

online_retail_simulator.simulate.metrics_synthesizer_based.simulate_metrics_synthesizer_based(products, config)[source]

Generate synthetic product metrics using Gaussian Copula synthesizer. :param products: DataFrame of products (unused in current implementation) :param config: Complete configuration dictionary

Returns:

DataFrame of synthetic metrics

Parameters:
  • products (DataFrame)

  • config (Dict)

Return type:

DataFrame

Enrichment Module

Enrichment module for applying treatments to sales data.

online_retail_simulator.enrich.enrich(config_path, job_info)[source]

Apply enrichment to metrics data using a config file.

Saves enriched results to the same job directory.

Parameters:
  • config_path (str) – Path to enrichment config (YAML or JSON)

  • job_info (JobInfo) – JobInfo object to load metrics data from

Returns:

Same job, now also containing enriched.csv and optionally potential_outcomes.csv

Return type:

JobInfo

online_retail_simulator.enrich.register_enrichment_function(name, func)[source]

Register an enrichment function.

Parameters:
Return type:

None

online_retail_simulator.enrich.register_enrichment_module(module_name)[source]

Register all compatible functions from a module.

Parameters:

module_name (str)

Return type:

None

online_retail_simulator.enrich.list_enrichment_functions()[source]

List all registered enrichment functions.

Return type:

List[str]

online_retail_simulator.enrich.clear_enrichment_registry()[source]

Clear all registered enrichment functions.

Return type:

None

Interface for applying enrichment treatments to metrics data. Dispatches to impact-based implementation based on config.

online_retail_simulator.enrich.enrichment.parse_impact_spec(impact_spec)[source]

Parse IMPACT specification into module, function, and params.

Supports dict format with capitalized keys: {“FUNCTION”: “product_detail_boost”, “PARAMS”: {“effect_size”: 0.5, “ramp_days”: 7}} {“MODULE”: “my_module”, “FUNCTION”: “my_func”, “PARAMS”: {…}} # MODULE ignored, kept for compatibility

Parameters:

impact_spec (Dict) – IMPACT specification from config (must be dict)

Returns:

Tuple of (module_name, function_name, params_dict)

Return type:

Tuple[str, str, Dict[str, Any]]

online_retail_simulator.enrich.enrichment.assign_enrichment(products, fraction, seed=None)[source]

Assign enrichment treatment to a fraction of products.

Parameters:
  • products (List[Dict]) – List of product dictionaries

  • fraction (float) – Fraction of products to enrich (0.0 to 1.0)

  • seed (int) – Random seed for reproducibility

Returns:

List of products with added ‘enriched’ boolean field

Return type:

List[Dict]

online_retail_simulator.enrich.enrichment.apply_enrichment_to_metrics(metrics, enriched_products, enrichment_start, effect_function, **kwargs)[source]

Apply enrichment treatment effect to metrics data.

Parameters:
  • metrics (List[Dict]) – List of metric record dictionaries

  • enriched_products (List[Dict]) – List of products with ‘enriched’ field

  • enrichment_start (str) – Start date of enrichment (YYYY-MM-DD)

  • effect_function (Callable) – Treatment effect function to apply

  • **kwargs – Additional parameters to pass to effect function

Returns:

List of modified metrics with treatment effect applied

Return type:

List[Dict]

online_retail_simulator.enrich.enrichment.enrich(config_path, df, job_info=None, products_df=None)[source]

Apply enrichment to a DataFrame using a config file.

Parameters:
  • config_path (str) – Path to enrichment config (YAML or JSON, local or S3)

  • df (DataFrame) – DataFrame with metrics data (must include product_identifier)

  • job_info – Optional JobInfo for product-aware enrichment functions

  • products_df – Optional products DataFrame for product-aware enrichment functions

Returns:

  • enriched_df: DataFrame with enrichment applied (factual version)

  • potential_outcomes_df: DataFrame with Y0/Y1 for all products, or None if not provided

Return type:

Tuple of (enriched_df, potential_outcomes_df)

Library of predefined treatment effect functions for catalog enrichment.

online_retail_simulator.enrich.enrichment_library.quantity_boost(metrics, **kwargs)[source]

Boost ordered units by a percentage for enriched products.

Parameters:
  • metrics (list) – List of metric record dictionaries

  • **kwargs – Parameters including: - effect_size: Percentage increase in ordered units (default: 0.5 for 50% boost) - enrichment_fraction: Fraction of products to enrich (default: 0.3) - enrichment_start: Start date of enrichment (default: “2024-11-15”) - seed: Random seed for product selection (default: 42) - min_units: Minimum units for enriched products with zero sales (default: 1)

Returns:

  • treated_metrics: List of modified metric dictionaries with treatment applied

  • potential_outcomes_df: DataFrame with Y0_revenue and Y1_revenue for all products

Return type:

Tuple of (treated_metrics, potential_outcomes_df)

online_retail_simulator.enrich.enrichment_library.probability_boost(metrics, **kwargs)[source]

Boost sale probability (simulated by ordered units increase as proxy).

Parameters:
  • metrics (list) – List of metric record dictionaries

  • **kwargs – Same parameters as quantity_boost

Returns:

Tuple of (treated_metrics, potential_outcomes_df) - same as quantity_boost

Return type:

tuple

online_retail_simulator.enrich.enrichment_library.product_detail_boost(metrics, **kwargs)[source]

Product detail regeneration and metrics boost for enrichment experiments.

Selects a fraction of products for treatment, regenerates their product details (title, description, features) while preserving brand/category/price, and applies metrics boost effect.

Parameters:
  • metrics (list) – List of metric record dictionaries

  • **kwargs – Parameters including: - job_info: JobInfo for saving product artifacts (required for saving) - products: List of product dictionaries (required for product details) - effect_size: Percentage increase in ordered units (default: 0.5) - ramp_days: Number of days for ramp-up period (default: 7) - enrichment_fraction: Fraction of products to enrich (default: 0.3) - enrichment_start: Start date of enrichment (default: “2024-11-15”) - seed: Random seed for product selection (default: 42) - prompt_path: Path to custom prompt template file (optional) - backend: Backend to use for regeneration (“mock” or “ollama”, default: “mock”)

Returns:

  • treated_metrics: List of modified metric dictionaries with treatment applied

  • potential_outcomes_df: DataFrame with Y0_revenue and Y1_revenue for all products

Return type:

Tuple of (treated_metrics, potential_outcomes_df)

Impact-based enrichment registry for custom user-defined enrichment functions.

This module provides a registration system that allows users to register their own impact-based enrichment functions.

online_retail_simulator.enrich.enrichment_registry.register_enrichment_function(name, func)[source]

Register an enrichment function.

Parameters:
Return type:

None

online_retail_simulator.enrich.enrichment_registry.register_enrichment_module(module_name)[source]

Register all compatible functions from a module.

Parameters:

module_name (str)

Return type:

None

online_retail_simulator.enrich.enrichment_registry.list_enrichment_functions()[source]

List all registered enrichment functions.

Return type:

List[str]

online_retail_simulator.enrich.enrichment_registry.clear_enrichment_registry()[source]

Clear all registered enrichment functions.

Return type:

None

online_retail_simulator.enrich.enrichment_registry.load_effect_function(module_name, function_name)[source]

Load treatment effect function from registry.

Parameters:
  • module_name (str) – Name of module (ignored, kept for backward compatibility)

  • function_name (str) – Name of function in registry

Returns:

Treatment effect function

Return type:

Callable

Configuration Module

Configuration processing with defaults and validation.

online_retail_simulator.config_processor.load_defaults()[source]

Load default configuration from package.

Return type:

Dict[str, Any]

online_retail_simulator.config_processor.get_impact_defaults(function_name)[source]

Get default parameters for an IMPACT enrichment function.

Parameters:

function_name (str) – Name of the enrichment function (e.g., “product_detail_boost”)

Returns:

Dictionary of default parameters for the function, or empty dict if not found

Return type:

Dict[str, Any]

online_retail_simulator.config_processor.deep_merge(base, override)[source]

Deep merge two dictionaries, with override values taking precedence.

Parameters:
  • base (Dict) – Base dictionary (defaults)

  • override (Dict) – Override dictionary (user config)

Returns:

Merged dictionary

Return type:

Dict

online_retail_simulator.config_processor.validate_config(config)[source]

Validate configuration has required fields and valid parameters.

Parameters:

config (Dict[str, Any])

Return type:

None

online_retail_simulator.config_processor.process_config(config_path)[source]

Load, merge with defaults, and validate configuration.

Parameters:

config_path (str) – Path to user configuration file (local or S3)

Returns:

Complete validated configuration

Raises:
Return type:

Dict[str, Any]