API

This section provides detailed documentation for all public APIs in the Online Retail Simulator.

Core Functions

Online Retail Simulator - Generate synthetic retail data for experimentation.

class online_retail_simulator.JobInfo(job_id: str, storage_path: str)[source]

Bases: object

Information about a simulation job and its storage location.

get_store() → ArtifactStore[source]: Get an ArtifactStore for this job’s directory.

job_id: str

load_df(name: str) → DataFrame | None[source]: Load a DataFrame from this job’s directory.

save_df(name: str, df: DataFrame) → None[source]: Save a DataFrame to this job’s directory.

storage_path: str

online_retail_simulator.cleanup_old_jobs(storage_path: str = '.', keep_count: int = 10) → List[str][source]

Clean up old job directories, keeping only the most recent ones.

Parameters:

storage_path – Base path where job directories are stored
keep_count – Number of recent jobs to keep

Returns:

List of removed job IDs

online_retail_simulator.clear_enrichment_registry() → None[source]: Clear all registered enrichment functions.

online_retail_simulator.create_job(config: Dict, config_path: str, job_id: str | None = None) → JobInfo[source]

Create a new job directory with config.

Parameters:

config – Configuration dictionary (expects STORAGE.PATH)
config_path – Path to original config file
job_id – Optional job ID, auto-generated if not provided

Returns:

Information about the created job

Return type:

JobInfo

online_retail_simulator.enrich(config_path: str, job_info: JobInfo) → JobInfo[source]

Apply enrichment to metrics data using a config file.

Saves enriched results to the same job directory.

Parameters:

config_path – Path to enrichment config (YAML or JSON)
job_info – JobInfo object to load metrics data from

Returns:

Same job, now also containing enriched.csv and optionally potential_outcomes.csv

Return type:

JobInfo

online_retail_simulator.get_simulation_function(func_type: str, name: str) → Callable[source]

Get a registered simulation function.

Parameters:

func_type – Type of function (‘products’ or ‘metrics’)
name – Name of the function

Returns:

The registered function

online_retail_simulator.list_enrichment_functions() → List[str][source]: List all registered enrichment functions.

online_retail_simulator.list_jobs(storage_path: str = '.') → List[str][source]

List all available job IDs in a storage path.

Parameters:: storage_path – Base path where job directories are stored
Returns:: List of job IDs sorted by creation time (newest first)

online_retail_simulator.list_simulation_functions() → Dict[str, List[str]][source]: List all registered simulation functions.

online_retail_simulator.load_job_metadata(job_info: JobInfo) → Dict[source]

Load metadata for a job.

Parameters:: job_info – JobInfo containing job details
Returns:: Job metadata
Return type:: Dict
Raises:: FileNotFoundError – If job directory or metadata file doesn’t exist

online_retail_simulator.load_job_results(job_info: JobInfo) → Dict[str, DataFrame][source]

Load simulation results for a job.

Parameters:: job_info – JobInfo containing job details
Returns:: ‘products’, ‘metrics’, ‘enriched’
Return type:: Dict with available DataFrames
Raises:: FileNotFoundError – If job directory doesn’t exist

online_retail_simulator.register_enrichment_function(name: str, func: Callable) → None[source]: Register an enrichment function.

online_retail_simulator.register_enrichment_module(module_name: str) → None[source]: Register all compatible functions from a module.

online_retail_simulator.register_metrics_function(name: str, func: Callable) → None[source]: Register a metrics generation function.

online_retail_simulator.register_products_function(name: str, func: Callable) → None[source]: Register a products generation function.

online_retail_simulator.register_simulation_module(module_name: str, prefix: str = '') → None[source]

Register all compatible functions from a module.

Functions are automatically detected based on their signatures: - Products functions: must have ‘config’ parameter - Metrics functions: must have ‘products’ and ‘config’ parameters

online_retail_simulator.simulate(config_path: str, products_df: DataFrame | None = None) → JobInfo[source]

Runs simulate_products (or uses provided products), optionally simulate_product_details, and simulate_metrics.

All results are automatically saved to a job-based directory structure under the configured storage path.

Parameters:

config_path – Path to configuration file
products_df – Optional DataFrame of existing products. If provided, skips product generation and uses this DataFrame instead. Expected columns: product_identifier, category, price

Returns:

Information about the saved job

Return type:

JobInfo

online_retail_simulator.simulate_metrics(job_info, config_path: str)[source]

Simulate product metrics using the backend specified in config.

Parameters:

job_info – JobInfo containing products.csv
config_path – Path to configuration file

Returns:

Same job, now also containing metrics.csv

Return type:

JobInfo

online_retail_simulator.simulate_product_details(job_info: JobInfo, config_path: str) → JobInfo[source]

Simulate product details using configured backend.

Loads existing products, enriches with title/description/brand/features, and saves back to the same job.

Config example:

PRODUCT_DETAILS:: FUNCTION: simulate_product_details_mock # or simulate_product_details_ollama

Parameters:

job_info – Job containing products.csv
config_path – Path to configuration file

Returns:

Same job with updated products.csv

Return type:

JobInfo

online_retail_simulator.simulate_products(config_path: str)[source]

Simulate products using the backend specified in config.

Parameters:: config_path – Path to configuration file
Returns:: Job containing products.csv
Return type:: JobInfo

Simulation Module

Simulation module for generating synthetic retail data.

online_retail_simulator.simulate.get_simulation_function(func_type: str, name: str) → Callable[source]

Get a registered simulation function.

Parameters:

func_type – Type of function (‘products’ or ‘metrics’)
name – Name of the function

Returns:

The registered function

online_retail_simulator.simulate.list_simulation_functions() → Dict[str, List[str]][source]: List all registered simulation functions.

online_retail_simulator.simulate.register_metrics_function(name: str, func: Callable) → None[source]: Register a metrics generation function.

online_retail_simulator.simulate.register_products_function(name: str, func: Callable) → None[source]: Register a products generation function.

online_retail_simulator.simulate.register_simulation_module(module_name: str, prefix: str = '') → None[source]

Register all compatible functions from a module.

Functions are automatically detected based on their signatures: - Products functions: must have ‘config’ parameter - Metrics functions: must have ‘products’ and ‘config’ parameters

online_retail_simulator.simulate.simulate(config_path: str, products_df: DataFrame | None = None) → JobInfo[source]

Runs simulate_products (or uses provided products), optionally simulate_product_details, and simulate_metrics.

All results are automatically saved to a job-based directory structure under the configured storage path.

Parameters:

config_path – Path to configuration file
products_df – Optional DataFrame of existing products. If provided, skips product generation and uses this DataFrame instead. Expected columns: product_identifier, category, price

Returns:

Information about the saved job

Return type:

JobInfo

online_retail_simulator.simulate.simulate_metrics(job_info, config_path: str)[source]

Simulate product metrics using the backend specified in config.

Parameters:

job_info – JobInfo containing products.csv
config_path – Path to configuration file

Returns:

Same job, now also containing metrics.csv

Return type:

JobInfo

online_retail_simulator.simulate.simulate_product_details(job_info: JobInfo, config_path: str) → JobInfo[source]

Simulate product details using configured backend.

Loads existing products, enriches with title/description/brand/features, and saves back to the same job.

Config example:

PRODUCT_DETAILS:: FUNCTION: simulate_product_details_mock # or simulate_product_details_ollama

Parameters:

job_info – Job containing products.csv
config_path – Path to configuration file

Returns:

Same job with updated products.csv

Return type:

JobInfo

online_retail_simulator.simulate.simulate_products(config_path: str)[source]

Simulate products using the backend specified in config.

Parameters:: config_path – Path to configuration file
Returns:: Job containing products.csv
Return type:: JobInfo

Products Generation

Interface for simulating products. Dispatches to appropriate backend based on config.

online_retail_simulator.simulate.products.simulate_products(config_path: str)[source]

Simulate products using the backend specified in config.

Parameters:: config_path – Path to configuration file
Returns:: Job containing products.csv
Return type:: JobInfo

Rule-based product simulation.

online_retail_simulator.simulate.products_rule_based.generate_random_product_identifier(rng: Generator, prefix: str = 'B') → str[source]: Generate a random product identifier. - 10 characters total - Alphanumeric - Defaults to starting with ‘B’

online_retail_simulator.simulate.products_rule_based.simulate_products_rule_based(config: Dict) → DataFrame[source]

Generate synthetic products (rule-based). :param config: Complete configuration dictionary

Returns:: DataFrame of products

Synthesizer-based product simulation. Reads a DataFrame from the path specified in config[‘SYNTHESIZER’][‘dataframe_path’]. No error handling, hard failures only.

online_retail_simulator.simulate.products_synthesizer_based.simulate_products_synthesizer_based(config: Dict) → DataFrame[source]

Generate synthetic products using Gaussian Copula synthesizer. :param config: Complete configuration dictionary

Returns:: DataFrame of synthetic products

Metrics Generation

Interface for simulating product metrics. Dispatches to appropriate backend based on config.

online_retail_simulator.simulate.metrics.simulate_metrics(job_info, config_path: str)[source]

Simulate product metrics using the backend specified in config.

Parameters:

job_info – JobInfo containing products.csv
config_path – Path to configuration file

Returns:

Same job, now also containing metrics.csv

Return type:

JobInfo

Rule-based product metrics simulation (minimal skeleton).

online_retail_simulator.simulate.metrics_rule_based.simulate_metrics_rule_based(products: DataFrame, config: Dict) → DataFrame[source]

Generate synthetic product metrics with customer journey funnel (rule-based).

Simulates a realistic conversion funnel: impressions → visits → cart adds → orders.

Parameters:

products – DataFrame of products
config – Complete configuration dictionary

Returns:

DataFrame of product metrics (one row per product per time period). Columns: product_identifier, category, price, date, impressions, visits, cart_adds, ordered_units, revenue.

Synthesizer-based simulation backend for metrics. Takes products DataFrame and config path. No error handling, hard failures only.

online_retail_simulator.simulate.metrics_synthesizer_based.simulate_metrics_synthesizer_based(products: DataFrame, config: Dict) → DataFrame[source]

Generate synthetic product metrics using Gaussian Copula synthesizer. :param products: DataFrame of products (unused in current implementation) :param config: Complete configuration dictionary

Returns:: DataFrame of synthetic metrics

Enrichment Module

Enrichment module for applying treatments to sales data.

online_retail_simulator.enrich.clear_enrichment_registry() → None[source]: Clear all registered enrichment functions.

online_retail_simulator.enrich.enrich(config_path: str, job_info: JobInfo) → JobInfo[source]

Apply enrichment to metrics data using a config file.

Saves enriched results to the same job directory.

Parameters:

config_path – Path to enrichment config (YAML or JSON)
job_info – JobInfo object to load metrics data from

Returns:

Same job, now also containing enriched.csv and optionally potential_outcomes.csv

Return type:

JobInfo

online_retail_simulator.enrich.list_enrichment_functions() → List[str][source]: List all registered enrichment functions.

online_retail_simulator.enrich.register_enrichment_function(name: str, func: Callable) → None[source]: Register an enrichment function.

online_retail_simulator.enrich.register_enrichment_module(module_name: str) → None[source]: Register all compatible functions from a module.

Interface for applying enrichment treatments to metrics data. Dispatches to impact-based implementation based on config.

online_retail_simulator.enrich.enrichment.apply_enrichment_to_metrics(metrics: List[Dict], enriched_products: List[Dict], enrichment_start: str, effect_function: Callable, **kwargs) → List[Dict][source]

Apply enrichment treatment effect to metrics data.

Parameters:

metrics – List of metric record dictionaries
enriched_products – List of products with ‘enriched’ field
enrichment_start – Start date of enrichment (YYYY-MM-DD)
effect_function – Treatment effect function to apply
**kwargs – Additional parameters to pass to effect function

Returns:

List of modified metrics with treatment effect applied

online_retail_simulator.enrich.enrichment.assign_enrichment(products: List[Dict], fraction: float, seed: int = None) → List[Dict][source]

Assign enrichment treatment to a fraction of products.

Parameters:

products – List of product dictionaries
fraction – Fraction of products to enrich (0.0 to 1.0)
seed – Random seed for reproducibility

Returns:

List of products with added ‘enriched’ boolean field

online_retail_simulator.enrich.enrichment.enrich(config_path: str, df: DataFrame, job_info=None, products_df=None) → tuple[source]

Apply enrichment to a DataFrame using a config file.

Parameters:

config_path – Path to enrichment config (YAML or JSON, local or S3)
df – DataFrame with metrics data (must include product_identifier)
job_info – Optional JobInfo for product-aware enrichment functions
products_df – Optional products DataFrame for product-aware enrichment functions

Returns:

enriched_df: DataFrame with enrichment applied (factual version)
potential_outcomes_df: DataFrame with Y0/Y1 for all products, or None if not provided

Return type:

Tuple of (enriched_df, potential_outcomes_df)

online_retail_simulator.enrich.enrichment.parse_impact_spec(impact_spec: Dict) → Tuple[str, str, Dict[str, Any]][source]

Parse IMPACT specification into module, function, and params.

Supports dict format with capitalized keys: {“FUNCTION”: “product_detail_boost”, “PARAMS”: {“effect_size”: 0.5, “ramp_days”: 7}} {“MODULE”: “my_module”, “FUNCTION”: “my_func”, “PARAMS”: {…}} # MODULE ignored, kept for compatibility

Parameters:: impact_spec – IMPACT specification from config (must be dict)
Returns:: Tuple of (module_name, function_name, params_dict)

Library of predefined treatment effect functions for catalog enrichment.

online_retail_simulator.enrich.enrichment_library.probability_boost(metrics: list, **kwargs) → tuple[source]

Boost sale probability (simulated by ordered units increase as proxy).

Parameters:

metrics – List of metric record dictionaries
**kwargs – Same parameters as quantity_boost

Returns:

Tuple of (treated_metrics, potential_outcomes_df) - same as quantity_boost

online_retail_simulator.enrich.enrichment_library.product_detail_boost(metrics: list, **kwargs) → tuple[source]

Product detail regeneration and metrics boost for enrichment experiments.

Selects a fraction of products for treatment, regenerates their product details (title, description, features) while preserving brand/category/price, and applies metrics boost effect.

Parameters:

metrics – List of metric record dictionaries
**kwargs – Parameters including: - job_info: JobInfo for saving product artifacts (required for saving) - products: List of product dictionaries (required for product details) - effect_size: Percentage increase in ordered units (default: 0.5) - ramp_days: Number of days for ramp-up period (default: 7) - enrichment_fraction: Fraction of products to enrich (default: 0.3) - enrichment_start: Start date of enrichment (default: “2024-11-15”) - seed: Random seed for product selection (default: 42) - prompt_path: Path to custom prompt template file (optional) - backend: Backend to use for regeneration (“mock” or “ollama”, default: “mock”)

Returns:

treated_metrics: List of modified metric dictionaries with treatment applied
potential_outcomes_df: DataFrame with Y0_revenue and Y1_revenue for all products

Return type:

Tuple of (treated_metrics, potential_outcomes_df)

online_retail_simulator.enrich.enrichment_library.quantity_boost(metrics: list, **kwargs) → tuple[source]

Boost ordered units by a percentage for enriched products.

Parameters:

metrics – List of metric record dictionaries
**kwargs – Parameters including: - effect_size: Percentage increase in ordered units (default: 0.5 for 50% boost) - enrichment_fraction: Fraction of products to enrich (default: 0.3) - enrichment_start: Start date of enrichment (default: “2024-11-15”) - seed: Random seed for product selection (default: 42) - min_units: Minimum units for enriched products with zero sales (default: 1)

Returns:

treated_metrics: List of modified metric dictionaries with treatment applied
potential_outcomes_df: DataFrame with Y0_revenue and Y1_revenue for all products

Return type:

Tuple of (treated_metrics, potential_outcomes_df)

Impact-based enrichment registry for custom user-defined enrichment functions.

This module provides a registration system that allows users to register their own impact-based enrichment functions.

online_retail_simulator.enrich.enrichment_registry.clear_enrichment_registry() → None[source]: Clear all registered enrichment functions.

online_retail_simulator.enrich.enrichment_registry.list_enrichment_functions() → List[str][source]: List all registered enrichment functions.

online_retail_simulator.enrich.enrichment_registry.load_effect_function(module_name: str, function_name: str) → Callable[source]

Load treatment effect function from registry.

Parameters:

module_name – Name of module (ignored, kept for backward compatibility)
function_name – Name of function in registry

Returns:

Treatment effect function

online_retail_simulator.enrich.enrichment_registry.register_enrichment_function(name: str, func: Callable) → None[source]: Register an enrichment function.

online_retail_simulator.enrich.enrichment_registry.register_enrichment_module(module_name: str) → None[source]: Register all compatible functions from a module.

Configuration Module

Configuration processing with defaults and validation.

online_retail_simulator.config_processor.deep_merge(base: Dict, override: Dict) → Dict[source]

Deep merge two dictionaries, with override values taking precedence.

Parameters:

base – Base dictionary (defaults)
override – Override dictionary (user config)

Returns:

Merged dictionary

online_retail_simulator.config_processor.get_impact_defaults(function_name: str) → Dict[str, Any][source]

Get default parameters for an IMPACT enrichment function.

Parameters:: function_name – Name of the enrichment function (e.g., “product_detail_boost”)
Returns:: Dictionary of default parameters for the function, or empty dict if not found

online_retail_simulator.config_processor.load_defaults() → Dict[str, Any][source]: Load default configuration from package.

online_retail_simulator.config_processor.process_config(config_path: str) → Dict[str, Any][source]

Load, merge with defaults, and validate configuration.

Parameters:

config_path – Path to user configuration file (local or S3)

Returns:

Complete validated configuration

Raises:

FileNotFoundError – If config file doesn’t exist
ValueError – If configuration is invalid

online_retail_simulator.config_processor.validate_config(config: Dict[str, Any]) → None[source]: Validate configuration has required fields and valid parameters.