{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-0",
   "metadata": {},
   "source": "# Local LLM Review with Ollama\n\nThis tutorial demonstrates the review evaluation path using a locally hosted\nmodel via [Ollama](https://ollama.com). No API key or internet connection is\nrequired \u2014 the model runs entirely on your machine.\n\n```{note}\nThis notebook requires Ollama to be running locally and is not executed\nduring the docs build. Code cells include pre-computed output.\n```\n\n## Workflow overview\n\n1. Inspect the job directory \u2014 a synthetic RCT with realistic artifacts\n2. Configure the backend to call `ollama_chat/llama3.2`\n3. Run `review()`\n4. Inspect the `ReviewResult`\n5. Examine the output file"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Workflow overview\n\n1. Inspect the job directory\n2. Configure the backend\n3. Run `evaluate_confidence()`\n4. Inspect the `EvaluateResult`\n5. Examine the output file"
  },
  {
   "cell_type": "markdown",
   "id": "cell-1",
   "metadata": {},
   "source": "## Initial Setup\n\nInstall and start Ollama, then pull a model:\n\n```bash\nollama pull llama3.2\nollama serve          # already running if the desktop app is open\n```\n\nNo extra Python dependencies are needed beyond the base install:\n\n```bash\npip install impact-engine-evaluate\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-2",
   "metadata": {},
   "source": "## Step 1 \u2014 Inspect the job directory\n\nThe `rct_job/` directory alongside this notebook is a synthetic early-literacy\nRCT. It contains:\n\n- `manifest.json` \u2014 metadata, file references, and evaluation strategy\n- `impact_results.json` \u2014 summary statistics (effect estimate, CI, sample size)\n- `regression_output.json` \u2014 full OLS output with balance check and attrition data"
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "cell-3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Job directory: rct_job\n",
      "Files: ['impact_results.json', 'manifest.json', 'regression_output.json', 'review_result.json']\n",
      "\n",
      "manifest.json:\n",
      "{\n",
      "  \"schema_version\": \"2.0\",\n",
      "  \"model_type\": \"experiment\",\n",
      "  \"evaluate_strategy\": \"review\",\n",
      "  \"initiative_id\": \"literacy-rct-2024\",\n",
      "  \"created_at\": \"2025-03-15T09:00:00+00:00\",\n",
      "  \"files\": {\n",
      "    \"impact_results\": {\"path\": \"impact_results.json\", \"format\": \"json\"},\n",
      "    \"regression_output\": {\"path\": \"regression_output.json\", \"format\": \"json\"}\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "from pathlib import Path\n",
    "\n",
    "JOB_DIR = Path(\"rct_job\")\n",
    "\n",
    "print(f\"Job directory: {JOB_DIR}\")\n",
    "print(f\"Files: {sorted(p.name for p in JOB_DIR.iterdir())}\")\n",
    "print()\n",
    "print(\"manifest.json:\")\n",
    "manifest = json.loads((JOB_DIR / \"manifest.json\").read_text())\n",
    "print(json.dumps(manifest, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-4",
   "metadata": {},
   "source": "## Step 2 \u2014 Configure the backend\n\nCreate a `review_config.yaml` file alongside this notebook to specify the\nmodel and backend parameters. A copy is provided \u2014 inspect it now:"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5",
   "metadata": {},
   "outputs": [],
   "source": "from impact_engine_evaluate.config import load_config\n\nCONFIG_FILE = Path(\"review_config.yaml\")\nprint(CONFIG_FILE.read_text())\n\nconfig = load_config(CONFIG_FILE)\nprint(f\"Backend : {config.backend.model}\")\nprint(f\"Settings: temperature={config.backend.temperature}, max_tokens={config.backend.max_tokens}\")"
  },
  {
   "cell_type": "markdown",
   "id": "cell-6",
   "metadata": {},
   "source": "## Step 3 \u2014 Run `evaluate_confidence()`\n\n`evaluate_confidence()` is the package-level entry point, symmetric with\n`evaluate_impact()` in the measure component:\n\n1. Reads `manifest.json` and loads the registered `ExperimentReviewer`\n2. Concatenates all artifact files into a single text payload\n3. Renders the prompt with domain knowledge from `knowledge/`\n4. Calls the model via litellm and parses the structured JSON response\n5. Writes `evaluate_result.json` and `review_result.json` to the job directory"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-7",
   "metadata": {},
   "outputs": [],
   "source": "from impact_engine_evaluate import evaluate_confidence\n\nresult = evaluate_confidence(CONFIG_FILE, JOB_DIR)\nprint(f\"Review complete. Overall score: {result.confidence:.2f}\")"
  },
  {
   "cell_type": "markdown",
   "id": "cell-8",
   "metadata": {},
   "source": "## Step 4 \u2014 Inspect the EvaluateResult\n\nThe result contains the confidence score, strategy used, and a per-dimension\nbreakdown accessible via `result.report`:"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-9",
   "metadata": {},
   "outputs": [],
   "source": "print(f\"Initiative  : {result.initiative_id}\")\nprint(f\"Strategy    : {result.strategy}\")\nprint(f\"Confidence  : {result.confidence:.3f}\")\nprint(f\"Range       : [{result.confidence_range[0]:.2f}, {result.confidence_range[1]:.2f}]\")\nprint()\nprint(\"Dimensions (from result.report):\")\nfor dim in result.report[\"dimensions\"]:\n    bar = \"#\" * int(dim[\"score\"] * 20)\n    print(f\"  {dim['name']:<30} {dim['score']:.3f}  |{bar:<20}|\")\n    print(f\"    {dim['justification']}\")\n    print()"
  },
  {
   "cell_type": "markdown",
   "id": "cell-10",
   "metadata": {},
   "source": "The experiment reviewer evaluates five dimensions:\n\n| Dimension | What it checks |\n|-----------|---------------|\n| `randomization_integrity` | Attrition, balance, differential dropout |\n| `specification_adequacy` | OLS formula, covariates, robust SEs |\n| `statistical_inference` | CIs, p-values, F-statistic, multiple testing |\n| `threats_to_validity` | Spillover, non-compliance, SUTVA, Hawthorne |\n| `effect_size_plausibility` | Whether the treatment effect is realistic |"
  },
  {
   "cell_type": "markdown",
   "id": "cell-11",
   "metadata": {},
   "source": "## Step 5 \u2014 Examine the output file\n\n`review()` writes `review_result.json` to the job directory alongside the\noriginal artifacts. The manifest is treated as read-only."
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "cell-12",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Job directory contents: ['impact_results.json', 'manifest.json', 'regression_output.json', 'review_result.json']\n",
      "\n",
      "review_result.json keys: ['initiative_id', 'prompt_name', 'prompt_version', 'backend_name', 'model', 'dimensions', 'overall_score', 'raw_response', 'timestamp']\n",
      "Overall score : 0.75\n",
      "Dimensions    : 5\n"
     ]
    }
   ],
   "source": [
    "print(f\"Job directory contents: {sorted(p.name for p in JOB_DIR.iterdir())}\")\n",
    "print()\n",
    "review_data = json.loads((JOB_DIR / \"review_result.json\").read_text())\n",
    "print(f\"review_result.json keys: {list(review_data.keys())}\")\n",
    "print(f\"Overall score : {review_data['overall_score']}\")\n",
    "print(f\"Dimensions    : {len(review_data['dimensions'])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-13",
   "metadata": {},
   "source": "The job directory now contains:\n\n```\nrct_job/\n\u251c\u2500\u2500 manifest.json           # read-only (created by the producer)\n\u251c\u2500\u2500 impact_results.json     # summary statistics\n\u251c\u2500\u2500 regression_output.json  # full OLS output\n\u2514\u2500\u2500 review_result.json      # structured review written by evaluate\n```"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.12.0"
  },
  "nbsphinx": {
   "execute": "never"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}