API
This section provides detailed documentation for all public APIs in the Online Retail Simulator.
Core Functions
Online Retail Simulator - Generate synthetic retail data for experimentation.
- class online_retail_simulator.JobInfo(job_id: str, storage_path: str)[source]
Bases:
objectInformation about a simulation job and its storage location.
- online_retail_simulator.cleanup_old_jobs(storage_path: str = '.', keep_count: int = 10) List[str][source]
Clean up old job directories, keeping only the most recent ones.
- Parameters:
storage_path – Base path where job directories are stored
keep_count – Number of recent jobs to keep
- Returns:
List of removed job IDs
- online_retail_simulator.clear_enrichment_registry() None[source]
Clear all registered enrichment functions.
- online_retail_simulator.create_job(config: Dict, config_path: str, job_id: str | None = None) JobInfo[source]
Create a new job directory with config.
- Parameters:
config – Configuration dictionary (expects STORAGE.PATH)
config_path – Path to original config file
job_id – Optional job ID, auto-generated if not provided
- Returns:
Information about the created job
- Return type:
- online_retail_simulator.enrich(config_path: str, job_info: JobInfo) JobInfo[source]
Apply enrichment to metrics data using a config file.
Saves enriched results to the same job directory.
- Parameters:
config_path – Path to enrichment config (YAML or JSON)
job_info – JobInfo object to load metrics data from
- Returns:
Same job, now also containing enriched.csv and optionally potential_outcomes.csv
- Return type:
- online_retail_simulator.get_simulation_function(func_type: str, name: str) Callable[source]
Get a registered simulation function.
- Parameters:
func_type – Type of function (‘products’ or ‘metrics’)
name – Name of the function
- Returns:
The registered function
- online_retail_simulator.list_enrichment_functions() List[str][source]
List all registered enrichment functions.
- online_retail_simulator.list_jobs(storage_path: str = '.') List[str][source]
List all available job IDs in a storage path.
- Parameters:
storage_path – Base path where job directories are stored
- Returns:
List of job IDs sorted by creation time (newest first)
- online_retail_simulator.list_simulation_functions() Dict[str, List[str]][source]
List all registered simulation functions.
- online_retail_simulator.load_job_metadata(job_info: JobInfo) Dict[source]
Load metadata for a job.
- Parameters:
job_info – JobInfo containing job details
- Returns:
Job metadata
- Return type:
Dict
- Raises:
FileNotFoundError – If job directory or metadata file doesn’t exist
- online_retail_simulator.load_job_results(job_info: JobInfo) Dict[str, DataFrame][source]
Load simulation results for a job.
- Parameters:
job_info – JobInfo containing job details
- Returns:
‘products’, ‘metrics’, ‘enriched’
- Return type:
Dict with available DataFrames
- Raises:
FileNotFoundError – If job directory doesn’t exist
- online_retail_simulator.register_enrichment_function(name: str, func: Callable) None[source]
Register an enrichment function.
- online_retail_simulator.register_enrichment_module(module_name: str) None[source]
Register all compatible functions from a module.
- online_retail_simulator.register_metrics_function(name: str, func: Callable) None[source]
Register a metrics generation function.
- online_retail_simulator.register_products_function(name: str, func: Callable) None[source]
Register a products generation function.
- online_retail_simulator.register_simulation_module(module_name: str, prefix: str = '') None[source]
Register all compatible functions from a module.
Functions are automatically detected based on their signatures: - Products functions: must have ‘config’ parameter - Metrics functions: must have ‘products’ and ‘config’ parameters
- online_retail_simulator.simulate(config_path: str, products_df: DataFrame | None = None) JobInfo[source]
Runs simulate_products (or uses provided products), optionally simulate_product_details, and simulate_metrics.
All results are automatically saved to a job-based directory structure under the configured storage path.
- Parameters:
config_path – Path to configuration file
products_df – Optional DataFrame of existing products. If provided, skips product generation and uses this DataFrame instead. Expected columns: product_identifier, category, price
- Returns:
Information about the saved job
- Return type:
- online_retail_simulator.simulate_metrics(job_info, config_path: str)[source]
Simulate product metrics using the backend specified in config.
- Parameters:
job_info – JobInfo containing products.csv
config_path – Path to configuration file
- Returns:
Same job, now also containing metrics.csv
- Return type:
- online_retail_simulator.simulate_product_details(job_info: JobInfo, config_path: str) JobInfo[source]
Simulate product details using configured backend.
Loads existing products, enriches with title/description/brand/features, and saves back to the same job.
- Config example:
- PRODUCT_DETAILS:
FUNCTION: simulate_product_details_mock # or simulate_product_details_ollama
- Parameters:
job_info – Job containing products.csv
config_path – Path to configuration file
- Returns:
Same job with updated products.csv
- Return type:
Simulation Module
Simulation module for generating synthetic retail data.
- online_retail_simulator.simulate.get_simulation_function(func_type: str, name: str) Callable[source]
Get a registered simulation function.
- Parameters:
func_type – Type of function (‘products’ or ‘metrics’)
name – Name of the function
- Returns:
The registered function
- online_retail_simulator.simulate.list_simulation_functions() Dict[str, List[str]][source]
List all registered simulation functions.
- online_retail_simulator.simulate.register_metrics_function(name: str, func: Callable) None[source]
Register a metrics generation function.
- online_retail_simulator.simulate.register_products_function(name: str, func: Callable) None[source]
Register a products generation function.
- online_retail_simulator.simulate.register_simulation_module(module_name: str, prefix: str = '') None[source]
Register all compatible functions from a module.
Functions are automatically detected based on their signatures: - Products functions: must have ‘config’ parameter - Metrics functions: must have ‘products’ and ‘config’ parameters
- online_retail_simulator.simulate.simulate(config_path: str, products_df: DataFrame | None = None) JobInfo[source]
Runs simulate_products (or uses provided products), optionally simulate_product_details, and simulate_metrics.
All results are automatically saved to a job-based directory structure under the configured storage path.
- Parameters:
config_path – Path to configuration file
products_df – Optional DataFrame of existing products. If provided, skips product generation and uses this DataFrame instead. Expected columns: product_identifier, category, price
- Returns:
Information about the saved job
- Return type:
- online_retail_simulator.simulate.simulate_metrics(job_info, config_path: str)[source]
Simulate product metrics using the backend specified in config.
- Parameters:
job_info – JobInfo containing products.csv
config_path – Path to configuration file
- Returns:
Same job, now also containing metrics.csv
- Return type:
- online_retail_simulator.simulate.simulate_product_details(job_info: JobInfo, config_path: str) JobInfo[source]
Simulate product details using configured backend.
Loads existing products, enriches with title/description/brand/features, and saves back to the same job.
- Config example:
- PRODUCT_DETAILS:
FUNCTION: simulate_product_details_mock # or simulate_product_details_ollama
- Parameters:
job_info – Job containing products.csv
config_path – Path to configuration file
- Returns:
Same job with updated products.csv
- Return type:
- online_retail_simulator.simulate.simulate_products(config_path: str)[source]
Simulate products using the backend specified in config.
- Parameters:
config_path – Path to configuration file
- Returns:
Job containing products.csv
- Return type:
Products Generation
Interface for simulating products. Dispatches to appropriate backend based on config.
- online_retail_simulator.simulate.products.simulate_products(config_path: str)[source]
Simulate products using the backend specified in config.
- Parameters:
config_path – Path to configuration file
- Returns:
Job containing products.csv
- Return type:
Rule-based product simulation.
- online_retail_simulator.simulate.products_rule_based.generate_random_product_identifier(rng: Generator, prefix: str = 'B') str[source]
Generate a random product identifier. - 10 characters total - Alphanumeric - Defaults to starting with ‘B’
- online_retail_simulator.simulate.products_rule_based.simulate_products_rule_based(config: Dict) DataFrame[source]
Generate synthetic products (rule-based). :param config: Complete configuration dictionary
- Returns:
DataFrame of products
Synthesizer-based product simulation. Reads a DataFrame from the path specified in config[‘SYNTHESIZER’][‘dataframe_path’]. No error handling, hard failures only.
Metrics Generation
Interface for simulating product metrics. Dispatches to appropriate backend based on config.
- online_retail_simulator.simulate.metrics.simulate_metrics(job_info, config_path: str)[source]
Simulate product metrics using the backend specified in config.
- Parameters:
job_info – JobInfo containing products.csv
config_path – Path to configuration file
- Returns:
Same job, now also containing metrics.csv
- Return type:
Rule-based product metrics simulation (minimal skeleton).
- online_retail_simulator.simulate.metrics_rule_based.simulate_metrics_rule_based(products: DataFrame, config: Dict) DataFrame[source]
Generate synthetic product metrics with customer journey funnel (rule-based).
Simulates a realistic conversion funnel: impressions → visits → cart adds → orders.
- Parameters:
products – DataFrame of products
config – Complete configuration dictionary
- Returns:
DataFrame of product metrics (one row per product per time period). Columns: product_identifier, category, price, date, impressions, visits, cart_adds, ordered_units, revenue.
Synthesizer-based simulation backend for metrics. Takes products DataFrame and config path. No error handling, hard failures only.
- online_retail_simulator.simulate.metrics_synthesizer_based.simulate_metrics_synthesizer_based(products: DataFrame, config: Dict) DataFrame[source]
Generate synthetic product metrics using Gaussian Copula synthesizer. :param products: DataFrame of products (unused in current implementation) :param config: Complete configuration dictionary
- Returns:
DataFrame of synthetic metrics
Enrichment Module
Enrichment module for applying treatments to sales data.
- online_retail_simulator.enrich.clear_enrichment_registry() None[source]
Clear all registered enrichment functions.
- online_retail_simulator.enrich.enrich(config_path: str, job_info: JobInfo) JobInfo[source]
Apply enrichment to metrics data using a config file.
Saves enriched results to the same job directory.
- Parameters:
config_path – Path to enrichment config (YAML or JSON)
job_info – JobInfo object to load metrics data from
- Returns:
Same job, now also containing enriched.csv and optionally potential_outcomes.csv
- Return type:
- online_retail_simulator.enrich.list_enrichment_functions() List[str][source]
List all registered enrichment functions.
- online_retail_simulator.enrich.register_enrichment_function(name: str, func: Callable) None[source]
Register an enrichment function.
- online_retail_simulator.enrich.register_enrichment_module(module_name: str) None[source]
Register all compatible functions from a module.
Interface for applying enrichment treatments to metrics data. Dispatches to impact-based implementation based on config.
- online_retail_simulator.enrich.enrichment.apply_enrichment_to_metrics(metrics: List[Dict], enriched_products: List[Dict], enrichment_start: str, effect_function: Callable, **kwargs) List[Dict][source]
Apply enrichment treatment effect to metrics data.
- Parameters:
metrics – List of metric record dictionaries
enriched_products – List of products with ‘enriched’ field
enrichment_start – Start date of enrichment (YYYY-MM-DD)
effect_function – Treatment effect function to apply
**kwargs – Additional parameters to pass to effect function
- Returns:
List of modified metrics with treatment effect applied
- online_retail_simulator.enrich.enrichment.assign_enrichment(products: List[Dict], fraction: float, seed: int = None) List[Dict][source]
Assign enrichment treatment to a fraction of products.
- Parameters:
products – List of product dictionaries
fraction – Fraction of products to enrich (0.0 to 1.0)
seed – Random seed for reproducibility
- Returns:
List of products with added ‘enriched’ boolean field
- online_retail_simulator.enrich.enrichment.enrich(config_path: str, df: DataFrame, job_info=None, products_df=None) tuple[source]
Apply enrichment to a DataFrame using a config file.
- Parameters:
config_path – Path to enrichment config (YAML or JSON, local or S3)
df – DataFrame with metrics data (must include product_identifier)
job_info – Optional JobInfo for product-aware enrichment functions
products_df – Optional products DataFrame for product-aware enrichment functions
- Returns:
enriched_df: DataFrame with enrichment applied (factual version)
potential_outcomes_df: DataFrame with Y0/Y1 for all products, or None if not provided
- Return type:
Tuple of (enriched_df, potential_outcomes_df)
- online_retail_simulator.enrich.enrichment.parse_impact_spec(impact_spec: Dict) Tuple[str, str, Dict[str, Any]][source]
Parse IMPACT specification into module, function, and params.
Supports dict format with capitalized keys: {“FUNCTION”: “product_detail_boost”, “PARAMS”: {“effect_size”: 0.5, “ramp_days”: 7}} {“MODULE”: “my_module”, “FUNCTION”: “my_func”, “PARAMS”: {…}} # MODULE ignored, kept for compatibility
- Parameters:
impact_spec – IMPACT specification from config (must be dict)
- Returns:
Tuple of (module_name, function_name, params_dict)
Library of predefined treatment effect functions for catalog enrichment.
- online_retail_simulator.enrich.enrichment_library.probability_boost(metrics: list, **kwargs) tuple[source]
Boost sale probability (simulated by ordered units increase as proxy).
- Parameters:
metrics – List of metric record dictionaries
**kwargs – Same parameters as quantity_boost
- Returns:
Tuple of (treated_metrics, potential_outcomes_df) - same as quantity_boost
- online_retail_simulator.enrich.enrichment_library.product_detail_boost(metrics: list, **kwargs) tuple[source]
Product detail regeneration and metrics boost for enrichment experiments.
Selects a fraction of products for treatment, regenerates their product details (title, description, features) while preserving brand/category/price, and applies metrics boost effect.
- Parameters:
metrics – List of metric record dictionaries
**kwargs – Parameters including: - job_info: JobInfo for saving product artifacts (required for saving) - products: List of product dictionaries (required for product details) - effect_size: Percentage increase in ordered units (default: 0.5) - ramp_days: Number of days for ramp-up period (default: 7) - enrichment_fraction: Fraction of products to enrich (default: 0.3) - enrichment_start: Start date of enrichment (default: “2024-11-15”) - seed: Random seed for product selection (default: 42) - prompt_path: Path to custom prompt template file (optional) - backend: Backend to use for regeneration (“mock” or “ollama”, default: “mock”)
- Returns:
treated_metrics: List of modified metric dictionaries with treatment applied
potential_outcomes_df: DataFrame with Y0_revenue and Y1_revenue for all products
- Return type:
Tuple of (treated_metrics, potential_outcomes_df)
- online_retail_simulator.enrich.enrichment_library.quantity_boost(metrics: list, **kwargs) tuple[source]
Boost ordered units by a percentage for enriched products.
- Parameters:
metrics – List of metric record dictionaries
**kwargs – Parameters including: - effect_size: Percentage increase in ordered units (default: 0.5 for 50% boost) - enrichment_fraction: Fraction of products to enrich (default: 0.3) - enrichment_start: Start date of enrichment (default: “2024-11-15”) - seed: Random seed for product selection (default: 42) - min_units: Minimum units for enriched products with zero sales (default: 1)
- Returns:
treated_metrics: List of modified metric dictionaries with treatment applied
potential_outcomes_df: DataFrame with Y0_revenue and Y1_revenue for all products
- Return type:
Tuple of (treated_metrics, potential_outcomes_df)
Impact-based enrichment registry for custom user-defined enrichment functions.
This module provides a registration system that allows users to register their own impact-based enrichment functions.
- online_retail_simulator.enrich.enrichment_registry.clear_enrichment_registry() None[source]
Clear all registered enrichment functions.
- online_retail_simulator.enrich.enrichment_registry.list_enrichment_functions() List[str][source]
List all registered enrichment functions.
- online_retail_simulator.enrich.enrichment_registry.load_effect_function(module_name: str, function_name: str) Callable[source]
Load treatment effect function from registry.
- Parameters:
module_name – Name of module (ignored, kept for backward compatibility)
function_name – Name of function in registry
- Returns:
Treatment effect function
Configuration Module
Configuration processing with defaults and validation.
- online_retail_simulator.config_processor.deep_merge(base: Dict, override: Dict) Dict[source]
Deep merge two dictionaries, with override values taking precedence.
- Parameters:
base – Base dictionary (defaults)
override – Override dictionary (user config)
- Returns:
Merged dictionary
- online_retail_simulator.config_processor.get_impact_defaults(function_name: str) Dict[str, Any][source]
Get default parameters for an IMPACT enrichment function.
- Parameters:
function_name – Name of the enrichment function (e.g., “product_detail_boost”)
- Returns:
Dictionary of default parameters for the function, or empty dict if not found
- online_retail_simulator.config_processor.load_defaults() Dict[str, Any][source]
Load default configuration from package.
- online_retail_simulator.config_processor.process_config(config_path: str) Dict[str, Any][source]
Load, merge with defaults, and validate configuration.
- Parameters:
config_path – Path to user configuration file (local or S3)
- Returns:
Complete validated configuration
- Raises:
FileNotFoundError – If config file doesn’t exist
ValueError – If configuration is invalid