API Reference

Top-level Exports

Confidence scoring and artifact review for the impact engine pipeline.

class impact_engine_evaluate.EvaluateResult(initiative_id, confidence, confidence_range, strategy, report='')[source]

Output of the EVALUATE pipeline stage (both strategies).

Parameters:
  • initiative_id (str) – Initiative identifier.

  • confidence (float) – Confidence score between 0.0 and 1.0.

  • confidence_range (tuple[float, float]) – (lower, upper) bounds from the method reviewer.

  • strategy (str) – Strategy that produced this result ("score" or "review").

  • report (str | ReviewResult) – Descriptive string for the score strategy; full ReviewResult for the review strategy.

initiative_id: str
confidence: float
confidence_range: tuple[float, float]
strategy: str
report: str | ReviewResult = ''
__init__(initiative_id, confidence, confidence_range, strategy, report='')
Parameters:
Return type:

None

impact_engine_evaluate.evaluate_confidence(config, job_dir, *, cost_to_scale=None)[source]

Evaluate the confidence of a job directory.

Reads the job directory, dispatches on evaluate_strategy from the manifest, and returns an EvaluateResult. Both the deterministic score path and the LLM review path are supported — the manifest controls which is used.

Parameters:
  • config (str | Path | dict | None) – Backend configuration for the review strategy (path to a YAML file or an inline dict). Ignored for the score strategy.

  • job_dir (str | Path) – Path to the job directory containing manifest.json and upstream artifacts.

  • cost_to_scale (float | None) – Optional override for the initiative’s cost-to-scale value. When provided, replaces the value stored in the job directory artifacts before scoring.

Returns:

Confidence score, strategy used, and a strategy-specific report. Also writes evaluate_result.json (and score_result.json for the score strategy) to job_dir.

Return type:

EvaluateResult

Examples

>>> result = evaluate_confidence("review_config.yaml", "path/to/rct_job/")
>>> print(result.confidence)
0.75
class impact_engine_evaluate.MethodReviewer[source]

Base class for methodology-specific artifact reviewers.

Each method reviewer bundles its own prompt template, knowledge base content, and artifact loading logic. The default load_artifact reads all files listed in the manifest and attempts to extract sample_size from the first JSON file. Subclasses may override if they need method-specific loading.

name

Registry key (e.g. "experiment").

Type:

str

prompt_name

Filename stem of the prompt template YAML.

Type:

str

description

Human-readable description of the methodology.

Type:

str

confidence_range

(lower, upper) bounds for deterministic confidence scoring.

Type:

tuple[float, float]

name: str = ''
prompt_name: str = ''
description: str = ''
confidence_range: tuple[float, float] = (0.0, 0.0)
load_artifact(manifest, job_dir)[source]

Read artifact files per manifest and return a payload.

The default implementation reads every file entry in manifest, concatenates their contents, and extracts sample_size from the first JSON file that contains one. Subclasses may override for method-specific loading.

Parameters:
  • manifest (Manifest) – Parsed job manifest.

  • job_dir (Path) – Path to the job directory.

Return type:

ArtifactPayload

Raises:

ValueError – If the manifest contains no file entries.

prompt_template_dir()[source]

Directory containing this reviewer’s YAML prompt files.

Returns:

None means no method-specific prompts.

Return type:

Path | None

knowledge_content_dir()[source]

Directory containing this reviewer’s knowledge files.

Returns:

None means no method-specific knowledge.

Return type:

Path | None

class impact_engine_evaluate.MethodReviewerRegistry[source]

Discover and instantiate registered method reviewers.

classmethod register(name)[source]

Class decorator that registers a method reviewer under name.

Parameters:

name (str) – Lookup key (typically the model_type value from manifests).

Returns:

The original class, unmodified.

Return type:

Callable

classmethod create(name, **kwargs)[source]

Instantiate a registered method reviewer.

Parameters:
  • name (str) – Registered method name.

  • **kwargs – Forwarded to the reviewer constructor.

Return type:

MethodReviewer

Raises:

KeyError – If name is not registered.

classmethod available()[source]

Return sorted list of registered method names.

Return type:

list[str]

classmethod confidence_map()[source]

Return {name: confidence_range} for all registered methods.

Return type:

dict[str, tuple[float, float]]

class impact_engine_evaluate.KnowledgeBase[source]

Abstract base class for knowledge base implementations.

Subclass and implement load() to provide custom knowledge base content from any source (filesystem, database, API, etc.).

abstractmethod load()[source]

Return the knowledge base content as a string.

Returns:

Combined knowledge content.

Return type:

str

class impact_engine_evaluate.DirectoryKnowledgeBase(directory)[source]

Knowledge base backed by a directory of .md and .txt files.

Parameters:

directory (str | Path) – Path to a directory containing .md or .txt knowledge files.

__init__(directory)[source]
Parameters:

directory (str | Path)

Return type:

None

load()[source]

Return concatenated content of all .md and .txt files in the directory.

Returns:

Combined content separated by section dividers.

Return type:

str

class impact_engine_evaluate.KnowledgeBaseRegistry[source]

Registry mapping names to KnowledgeBase instances.

__init__()[source]
Return type:

None

register(name, knowledge_base)[source]

Register a knowledge base under name.

Parameters:
  • name (str) – Registry key used to look up this knowledge base.

  • knowledge_base (KnowledgeBase) – Knowledge base instance to register.

Return type:

None

load(name)[source]

Load content from the knowledge base registered under name.

Parameters:

name (str) – Registered knowledge base name.

Returns:

Combined knowledge content.

Return type:

str

Raises:

KeyError – If name is not registered.

list()[source]

Return sorted list of registered knowledge base names.

Return type:

list[str]

clear()[source]

Reset the registry and defaults flag.

Intended for use in tests to ensure a clean state.

Return type:

None

class impact_engine_evaluate.Prompt[source]

Abstract base class for prompt template implementations.

Subclass and implement load() to provide custom prompt content from any source (filesystem, database, generated dynamically, etc.).

abstractmethod load()[source]

Return the prompt specification.

Return type:

PromptSpec

class impact_engine_evaluate.FilePrompt(path)[source]

Prompt backed by a YAML template file.

Parameters:

path (str | Path) – Path to a YAML prompt template file.

__init__(path)[source]
Parameters:

path (str | Path)

Return type:

None

load()[source]

Load and return the PromptSpec from the YAML file.

Return type:

PromptSpec

Raises:

FileNotFoundError – If the path does not exist.

class impact_engine_evaluate.PromptRegistry[source]

Registry mapping names to Prompt instances.

__init__()[source]
Return type:

None

register(name, prompt)[source]

Register a prompt under name.

Parameters:
  • name (str) – Registry key used to look up this prompt.

  • prompt (Prompt) – Prompt instance to register.

Return type:

None

load(name)[source]

Load the PromptSpec registered under name.

Parameters:

name (str) – Registered prompt name.

Return type:

PromptSpec

Raises:

KeyError – If name is not registered.

list()[source]

Return sorted list of registered prompt names.

Return type:

list[str]

clear()[source]

Reset the registry and defaults flag.

Intended for use in tests to ensure a clean state.

Return type:

None

Scorer

Job Reader

Shared job directory reader: build scorer events from job artifacts.

impact_engine_evaluate.job_reader.load_scorer_event(manifest, job_dir, overrides=None)[source]

Build a scorer event dict from a job directory’s impact_results.json.

Parameters:
  • manifest (Manifest) – Parsed job manifest.

  • job_dir (str | Path) – Path to the job directory.

  • overrides (dict[str, Any] | None) – Optional overrides (e.g. cost_to_scale from the orchestrator event).

Returns:

Flat dict with keys initiative_id, model_type, ci_upper, effect_estimate, ci_lower, cost_to_scale, and sample_size.

Return type:

dict[str, Any]

Raises:

FileNotFoundError – If impact_results.json is not found in the job directory.

Adapter

Configuration

Unified configuration for the review subsystem.

class impact_engine_evaluate.config.BackendConfig(model='claude-sonnet-4-5-20250929', temperature=0.0, max_tokens=4096, extra=<factory>)[source]

Bases: object

LLM backend configuration.

Parameters:
  • model (str) – Model identifier passed to litellm.completion().

  • temperature (float) – Sampling temperature.

  • max_tokens (int) – Maximum tokens per completion.

  • extra (dict) – Additional kwargs forwarded to litellm.completion().

__init__(model='claude-sonnet-4-5-20250929', temperature=0.0, max_tokens=4096, extra=<factory>)
Parameters:
Return type:

None

class impact_engine_evaluate.config.MethodConfig(prompt='', knowledge_base='')[source]

Bases: object

Per-method prompt and knowledge base selection.

Parameters:
  • prompt (str) – Name of a registered prompt spec. Empty string uses the reviewer’s default prompt_template_dir().

  • knowledge_base (str) – Name of a registered knowledge base. Empty string uses the reviewer’s default knowledge_content_dir().

__init__(prompt='', knowledge_base='')
Parameters:
  • prompt (str)

  • knowledge_base (str)

Return type:

None

class impact_engine_evaluate.config.ReviewConfig(backend=<factory>, methods=<factory>)[source]

Bases: object

Top-level configuration for the review subsystem.

Parameters:
__init__(backend=<factory>, methods=<factory>)
Parameters:
Return type:

None

impact_engine_evaluate.config.load_config(source=None)[source]

Load review configuration from a YAML file, dict, or environment variables.

Parameters:

source (str | Path | dict | None) – A path to a YAML file, a raw dict, or None to use only environment variable overrides on defaults.

Returns:

Fully validated configuration dictionary.

Return type:

dict

Review API

Public API: review a job directory.

impact_engine_evaluate.review.api.compute_review(job_dir, *, config=None)[source]

Compute a review of a job directory without writing results.

Suitable for evaluation loops and batch processing where writing back to the job directory is unwanted.

Prompt and knowledge base are resolved in this order:

  1. If config.methods[model_type].prompt is set, the named prompt is loaded from the prompt registry.

  2. Otherwise the reviewer’s default prompt_template_dir() is used.

The same precedence applies to knowledge_base.

Parameters:
  • job_dir (str | Path) – Path to the job directory containing manifest.json.

  • config (ReviewConfig | dict | str | None) – Backend and method configuration. A ReviewConfig, a dict, a YAML file path, or None for defaults.

Return type:

ReviewResult

Raises:
  • FileNotFoundError – If the manifest or prompt template is missing.

  • KeyError – If the manifest’s model_type has no registered method reviewer, or a configured prompt / knowledge base name is not registered.

impact_engine_evaluate.review.api.review(job_dir, *, config=None)[source]

Review a job directory and write results back.

Calls compute_review() then writes review_result.json to the job directory.

Parameters:
  • job_dir (str | Path) – Path to the job directory containing manifest.json.

  • config (ReviewConfig | dict | str | None) – Backend and method configuration. A ReviewConfig, a dict, a YAML file path, or None for defaults.

Return type:

ReviewResult

Raises:
  • FileNotFoundError – If the manifest or prompt template is missing.

  • KeyError – If the manifest’s model_type has no registered method reviewer, or a configured prompt / knowledge base name is not registered.

Review Engine

ReviewEngine: orchestrates a single artifact review.

class impact_engine_evaluate.review.engine.PromptBuilder[source]

Bases: object

Load prompt specs and knowledge, then render chat messages.

Encapsulates the Jinja2 template rendering and knowledge loading steps shared across all method reviewers. This is the shared entry layer inside the Evaluation Engine that runs before any LLM call.

load_spec(path)[source]

Load a PromptSpec from a YAML file.

Parameters:

path (Path) – Path to a YAML prompt template file.

Return type:

PromptSpec

Raises:

FileNotFoundError – If path does not exist.

load_knowledge(directory)[source]

Concatenate all .md and .txt files in a directory.

Parameters:

directory (Path) – Directory containing knowledge files.

Returns:

Combined content separated by section dividers.

Return type:

str

build(spec, variables)[source]

Render a prompt spec into chat messages.

Parameters:
  • spec (PromptSpec) – The prompt template to render.

  • variables (dict[str, Any]) – Template variables.

Returns:

Chat messages suitable for LLM completion.

Return type:

list[dict[str, str]]

class impact_engine_evaluate.review.engine.ResultsBuilder[source]

Bases: object

Parse structured LLM output into a ReviewResult.

Translates the raw Pydantic ReviewResponse from LiteLLM into the ReviewResult dataclass used downstream. This is the shared exit layer inside the Evaluation Engine that runs after every LLM call.

parse(artifact, spec, model, response)[source]

Parse a LiteLLM structured response into a ReviewResult.

Parameters:
  • artifact (ArtifactPayload) – The artifact that was reviewed.

  • spec (PromptSpec) – Prompt spec used for the review.

  • model (str) – Model identifier that produced the response.

  • response (Any) – Raw LiteLLM completion response. Either choices[0].message.parsed (OpenAI-style structured output) or choices[0].message.content (JSON string, as returned by ollama and other backends) is accepted.

Return type:

ReviewResult

class impact_engine_evaluate.review.engine.ReviewEngine(*, default_model='claude-sonnet-4-5-20250929', default_temperature=0.0, default_max_tokens=4096, litellm_extra=None)[source]

Bases: object

Execute an artifact review via LiteLLM.

Parameters:
  • default_model (str) – Default model identifier for completions.

  • default_temperature (float) – Default temperature for completions.

  • default_max_tokens (int) – Default max tokens for completions.

  • litellm_extra (dict[str, Any] | None) – Additional kwargs forwarded to litellm.completion().

__init__(*, default_model='claude-sonnet-4-5-20250929', default_temperature=0.0, default_max_tokens=4096, litellm_extra=None)[source]
Parameters:
  • default_model (str)

  • default_temperature (float)

  • default_max_tokens (int)

  • litellm_extra (dict[str, Any] | None)

Return type:

None

classmethod from_config(config=None)[source]

Construct a ReviewEngine from a config dict or raw source.

Parameters:

config (dict | str | None) – A config dict, a YAML file path, or None for defaults.

Return type:

ReviewEngine

review(artifact, spec, knowledge_context='', *, model=None, temperature=None, max_tokens=None)[source]

Execute a review of the given artifact.

Parameters:
  • artifact (ArtifactPayload) – The artifact to review.

  • spec (PromptSpec) – Prompt template specification.

  • knowledge_context (str) – Pre-loaded domain knowledge text.

  • model (str | None) – Model override for this call.

  • temperature (float | None) – Temperature override for this call.

  • max_tokens (int | None) – Max tokens override for this call.

Return type:

ReviewResult

impact_engine_evaluate.review.engine.load_prompt_spec(path)[source]

Load a PromptSpec from a YAML file.

Parameters:

path (Path) – Path to a YAML prompt template file.

Return type:

PromptSpec

impact_engine_evaluate.review.engine.render(spec, variables)[source]

Render a prompt spec into chat messages.

Parameters:
  • spec (PromptSpec) – The prompt template to render.

  • variables (dict[str, Any]) – Template variables.

Return type:

list[dict[str, str]]

impact_engine_evaluate.review.engine.load_knowledge(directory)[source]

Concatenate all .md and .txt files in a directory.

Parameters:

directory (Path) – Directory containing knowledge files.

Return type:

str

Review Models

Data models for artifact review.

class impact_engine_evaluate.review.models.DimensionResponse(*, name, score, justification)[source]

Bases: BaseModel

Single dimension in a structured review response.

Parameters:
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class impact_engine_evaluate.review.models.ReviewResponse(*, dimensions, overall)[source]

Bases: BaseModel

Structured response schema for LLM review output.

Parameters:
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class impact_engine_evaluate.review.models.ReviewDimension(name, score, justification)[source]

Bases: object

A single scored dimension of an artifact review.

Parameters:
  • name (str) – Dimension identifier (e.g. "internal_validity").

  • score (float) – Score between 0.0 and 1.0.

  • justification (str) – Free-text explanation of the score.

__init__(name, score, justification)
Parameters:
Return type:

None

class impact_engine_evaluate.review.models.ReviewResult(initiative_id, prompt_name, prompt_version, backend_name, model, dimensions=<factory>, overall_score=0.0, raw_response='', timestamp='')[source]

Bases: object

Complete result of an artifact review.

Parameters:
  • initiative_id (str) – Identifier of the reviewed initiative.

  • prompt_name (str) – Name of the prompt template used.

  • prompt_version (str) – Version string of the prompt template.

  • backend_name (str) – Registered name of the LLM backend.

  • model (str) – Model identifier used for completion.

  • dimensions (list[ReviewDimension]) – Per-dimension scores and justifications.

  • overall_score (float) – Aggregated score across dimensions (mean).

  • raw_response (str) – Full LLM output retained for audit.

  • timestamp (str) – ISO-8601 timestamp of the review.

__init__(initiative_id, prompt_name, prompt_version, backend_name, model, dimensions=<factory>, overall_score=0.0, raw_response='', timestamp='')
Parameters:
Return type:

None

class impact_engine_evaluate.review.models.ArtifactPayload(initiative_id, artifact_text, model_type='', sample_size=0, metadata=<factory>)[source]

Bases: object

Typed input envelope for an artifact to review.

Parameters:
  • initiative_id (str) – Unique initiative identifier.

  • artifact_text (str) – The artifact content to review.

  • model_type (str) – Causal inference methodology label.

  • sample_size (int) – Sample size of the study.

  • metadata (dict) – Additional key-value pairs forwarded to the prompt template.

classmethod from_event(event)[source]

Construct a payload from a pipeline event dict.

Parameters:

event (dict) – Pipeline event. Must contain initiative_id and artifact_text. All other keys are passed through as metadata.

Return type:

ArtifactPayload

__init__(initiative_id, artifact_text, model_type='', sample_size=0, metadata=<factory>)
Parameters:
  • initiative_id (str)

  • artifact_text (str)

  • model_type (str)

  • sample_size (int)

  • metadata (dict)

Return type:

None

class impact_engine_evaluate.review.models.PromptSpec(name, version, description, dimensions=<factory>, system_template='', user_template='')[source]

Bases: object

Metadata and template content for a review prompt.

Parameters:
  • name (str) – Unique prompt identifier.

  • version (str) – Semver-style version string.

  • description (str) – Human-readable description.

  • dimensions (list[str]) – Names of scoring dimensions this prompt expects.

  • system_template (str) – Jinja2 template for the system message.

  • user_template (str) – Jinja2 template for the user message.

__init__(name, version, description, dimensions=<factory>, system_template='', user_template='')
Parameters:
Return type:

None

Manifest

Job directory manifest: load, validate, and update.

class impact_engine_evaluate.review.manifest.FileEntry(path, format)[source]

Bases: object

A single file reference within a manifest.

Parameters:
  • path (str) – Relative path to the file within the job directory.

  • format (str) – File format identifier (e.g. "json", "yaml", "csv").

__init__(path, format)
Parameters:
Return type:

None

class impact_engine_evaluate.review.manifest.Manifest(model_type, created_at='', files=<factory>, initiative_id='', evaluate_strategy='review')[source]

Bases: object

Parsed manifest for a job directory.

Parameters:
  • model_type (str) – Causal inference methodology label.

  • created_at (str) – ISO-8601 creation timestamp.

  • files (dict[str, FileEntry]) – Mapping of logical names to file entries.

  • initiative_id (str) – Initiative identifier. Defaults to the job directory name.

  • evaluate_strategy (str) – Evaluation strategy: "review" (LLM review) or "score" (deterministic confidence). Defaults to "review".

__init__(model_type, created_at='', files=<factory>, initiative_id='', evaluate_strategy='review')
Parameters:
Return type:

None

impact_engine_evaluate.review.manifest.load_manifest(job_dir)[source]

Load and validate a manifest from a job directory.

Parameters:

job_dir (str | Path) – Path to the job directory containing manifest.json.

Return type:

Manifest

Raises:

Method Reviewer Base

Abstract method reviewer and registry.

class impact_engine_evaluate.review.methods.base.MethodReviewer[source]

Bases: ABC

Base class for methodology-specific artifact reviewers.

Each method reviewer bundles its own prompt template, knowledge base content, and artifact loading logic. The default load_artifact reads all files listed in the manifest and attempts to extract sample_size from the first JSON file. Subclasses may override if they need method-specific loading.

name

Registry key (e.g. "experiment").

Type:

str

prompt_name

Filename stem of the prompt template YAML.

Type:

str

description

Human-readable description of the methodology.

Type:

str

confidence_range

(lower, upper) bounds for deterministic confidence scoring.

Type:

tuple[float, float]

load_artifact(manifest, job_dir)[source]

Read artifact files per manifest and return a payload.

The default implementation reads every file entry in manifest, concatenates their contents, and extracts sample_size from the first JSON file that contains one. Subclasses may override for method-specific loading.

Parameters:
  • manifest (Manifest) – Parsed job manifest.

  • job_dir (Path) – Path to the job directory.

Return type:

ArtifactPayload

Raises:

ValueError – If the manifest contains no file entries.

prompt_template_dir()[source]

Directory containing this reviewer’s YAML prompt files.

Returns:

None means no method-specific prompts.

Return type:

Path | None

knowledge_content_dir()[source]

Directory containing this reviewer’s knowledge files.

Returns:

None means no method-specific knowledge.

Return type:

Path | None

class impact_engine_evaluate.review.methods.base.MethodReviewerRegistry[source]

Bases: object

Discover and instantiate registered method reviewers.

classmethod register(name)[source]

Class decorator that registers a method reviewer under name.

Parameters:

name (str) – Lookup key (typically the model_type value from manifests).

Returns:

The original class, unmodified.

Return type:

Callable

classmethod create(name, **kwargs)[source]

Instantiate a registered method reviewer.

Parameters:
  • name (str) – Registered method name.

  • **kwargs – Forwarded to the reviewer constructor.

Return type:

MethodReviewer

Raises:

KeyError – If name is not registered.

classmethod available()[source]

Return sorted list of registered method names.

Return type:

list[str]

classmethod confidence_map()[source]

Return {name: confidence_range} for all registered methods.

Return type:

dict[str, tuple[float, float]]

Backend Base