{ "cells": [ { "cell_type": "markdown", "id": "cell-0", "metadata": {}, "source": "# Local LLM Review with Ollama\n\nThis tutorial demonstrates the review evaluation path using a locally hosted\nmodel via [Ollama](https://ollama.com). No API key or internet connection is\nrequired \u2014 the model runs entirely on your machine.\n\n```{note}\nThis notebook requires Ollama to be running locally and is not executed\nduring the docs build. Code cells include pre-computed output.\n```\n\n## Workflow overview\n\n1. Inspect the job directory \u2014 a synthetic RCT with realistic artifacts\n2. Configure the backend to call `ollama_chat/llama3.2`\n3. Run `review()`\n4. Inspect the `ReviewResult`\n5. Examine the output file" }, { "cell_type": "markdown", "metadata": {}, "source": "## Workflow overview\n\n1. Inspect the job directory\n2. Configure the backend\n3. Run `evaluate_confidence()`\n4. Inspect the `EvaluateResult`\n5. Examine the output file" }, { "cell_type": "markdown", "id": "cell-1", "metadata": {}, "source": "## Initial Setup\n\nInstall and start Ollama, then pull a model:\n\n```bash\nollama pull llama3.2\nollama serve # already running if the desktop app is open\n```\n\nNo extra Python dependencies are needed beyond the base install:\n\n```bash\npip install impact-engine-evaluate\n```" }, { "cell_type": "markdown", "id": "cell-2", "metadata": {}, "source": "## Step 1 \u2014 Inspect the job directory\n\nThe `rct_job/` directory alongside this notebook is a synthetic early-literacy\nRCT. It contains:\n\n- `manifest.json` \u2014 metadata, file references, and evaluation strategy\n- `impact_results.json` \u2014 summary statistics (effect estimate, CI, sample size)\n- `regression_output.json` \u2014 full OLS output with balance check and attrition data" }, { "cell_type": "code", "execution_count": 1, "id": "cell-3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Job directory: rct_job\n", "Files: ['impact_results.json', 'manifest.json', 'regression_output.json', 'review_result.json']\n", "\n", "manifest.json:\n", "{\n", " \"schema_version\": \"2.0\",\n", " \"model_type\": \"experiment\",\n", " \"evaluate_strategy\": \"review\",\n", " \"initiative_id\": \"literacy-rct-2024\",\n", " \"created_at\": \"2025-03-15T09:00:00+00:00\",\n", " \"files\": {\n", " \"impact_results\": {\"path\": \"impact_results.json\", \"format\": \"json\"},\n", " \"regression_output\": {\"path\": \"regression_output.json\", \"format\": \"json\"}\n", " }\n", "}\n" ] } ], "source": [ "import json\n", "from pathlib import Path\n", "\n", "JOB_DIR = Path(\"rct_job\")\n", "\n", "print(f\"Job directory: {JOB_DIR}\")\n", "print(f\"Files: {sorted(p.name for p in JOB_DIR.iterdir())}\")\n", "print()\n", "print(\"manifest.json:\")\n", "manifest = json.loads((JOB_DIR / \"manifest.json\").read_text())\n", "print(json.dumps(manifest, indent=2))" ] }, { "cell_type": "markdown", "id": "cell-4", "metadata": {}, "source": "## Step 2 \u2014 Configure the backend\n\nCreate a `review_config.yaml` file alongside this notebook to specify the\nmodel and backend parameters. A copy is provided \u2014 inspect it now:" }, { "cell_type": "code", "execution_count": null, "id": "cell-5", "metadata": {}, "outputs": [], "source": "from impact_engine_evaluate.config import load_config\n\nCONFIG_FILE = Path(\"review_config.yaml\")\nprint(CONFIG_FILE.read_text())\n\nconfig = load_config(CONFIG_FILE)\nprint(f\"Backend : {config.backend.model}\")\nprint(f\"Settings: temperature={config.backend.temperature}, max_tokens={config.backend.max_tokens}\")" }, { "cell_type": "markdown", "id": "cell-6", "metadata": {}, "source": "## Step 3 \u2014 Run `evaluate_confidence()`\n\n`evaluate_confidence()` is the package-level entry point, symmetric with\n`evaluate_impact()` in the measure component:\n\n1. Reads `manifest.json` and loads the registered `ExperimentReviewer`\n2. Concatenates all artifact files into a single text payload\n3. Renders the prompt with domain knowledge from `knowledge/`\n4. Calls the model via litellm and parses the structured JSON response\n5. Writes `evaluate_result.json` and `review_result.json` to the job directory" }, { "cell_type": "code", "execution_count": null, "id": "cell-7", "metadata": {}, "outputs": [], "source": "from impact_engine_evaluate import evaluate_confidence\n\nresult = evaluate_confidence(CONFIG_FILE, JOB_DIR)\nprint(f\"Review complete. Overall score: {result.confidence:.2f}\")" }, { "cell_type": "markdown", "id": "cell-8", "metadata": {}, "source": "## Step 4 \u2014 Inspect the EvaluateResult\n\nThe result contains the confidence score, strategy used, and a per-dimension\nbreakdown accessible via `result.report`:" }, { "cell_type": "code", "execution_count": null, "id": "cell-9", "metadata": {}, "outputs": [], "source": "print(f\"Initiative : {result.initiative_id}\")\nprint(f\"Strategy : {result.strategy}\")\nprint(f\"Confidence : {result.confidence:.3f}\")\nprint(f\"Range : [{result.confidence_range[0]:.2f}, {result.confidence_range[1]:.2f}]\")\nprint()\nprint(\"Dimensions (from result.report):\")\nfor dim in result.report[\"dimensions\"]:\n bar = \"#\" * int(dim[\"score\"] * 20)\n print(f\" {dim['name']:<30} {dim['score']:.3f} |{bar:<20}|\")\n print(f\" {dim['justification']}\")\n print()" }, { "cell_type": "markdown", "id": "cell-10", "metadata": {}, "source": "The experiment reviewer evaluates five dimensions:\n\n| Dimension | What it checks |\n|-----------|---------------|\n| `randomization_integrity` | Attrition, balance, differential dropout |\n| `specification_adequacy` | OLS formula, covariates, robust SEs |\n| `statistical_inference` | CIs, p-values, F-statistic, multiple testing |\n| `threats_to_validity` | Spillover, non-compliance, SUTVA, Hawthorne |\n| `effect_size_plausibility` | Whether the treatment effect is realistic |" }, { "cell_type": "markdown", "id": "cell-11", "metadata": {}, "source": "## Step 5 \u2014 Examine the output file\n\n`review()` writes `review_result.json` to the job directory alongside the\noriginal artifacts. The manifest is treated as read-only." }, { "cell_type": "code", "execution_count": 5, "id": "cell-12", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Job directory contents: ['impact_results.json', 'manifest.json', 'regression_output.json', 'review_result.json']\n", "\n", "review_result.json keys: ['initiative_id', 'prompt_name', 'prompt_version', 'backend_name', 'model', 'dimensions', 'overall_score', 'raw_response', 'timestamp']\n", "Overall score : 0.75\n", "Dimensions : 5\n" ] } ], "source": [ "print(f\"Job directory contents: {sorted(p.name for p in JOB_DIR.iterdir())}\")\n", "print()\n", "review_data = json.loads((JOB_DIR / \"review_result.json\").read_text())\n", "print(f\"review_result.json keys: {list(review_data.keys())}\")\n", "print(f\"Overall score : {review_data['overall_score']}\")\n", "print(f\"Dimensions : {len(review_data['dimensions'])}\")" ] }, { "cell_type": "markdown", "id": "cell-13", "metadata": {}, "source": "The job directory now contains:\n\n```\nrct_job/\n\u251c\u2500\u2500 manifest.json # read-only (created by the producer)\n\u251c\u2500\u2500 impact_results.json # summary statistics\n\u251c\u2500\u2500 regression_output.json # full OLS output\n\u2514\u2500\u2500 review_result.json # structured review written by evaluate\n```" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.12.0" }, "nbsphinx": { "execute": "never" } }, "nbformat": 4, "nbformat_minor": 5 }