diff --git a/AI_GUIDE.md b/AI_GUIDE.md
index 6f042ce..79ff6d5 100644
--- a/AI_GUIDE.md
+++ b/AI_GUIDE.md
@@ -348,6 +348,33 @@ During training (90%+ of time), the agent does NOT call the LLM. It only does:
 - **Writing Agent**: generates reports (3 tools)
 - Only 1 worker active at a time, others cost $0
 
+### Tool-Use Protocol (provider-agnostic)
+
+Workers do not use each provider's native SDK tool-use protocol. Instead the
+framework injects a plain-text schema into the system prompt and the worker
+emits tool calls as `<tool_call>{...}</tool_call>` blocks. The dispatcher
+parses the blocks, runs each through `ToolRegistry.execute_tool`, and feeds
+results back as `<tool_result name="...">...</tool_result>` in the next user
+turn. The loop runs until the worker produces a response with no tool calls
+(the final answer) or `max_turns` is reached.
+
+Why this design:
+
+- **One protocol, four providers** — the Anthropic and OpenAI SDK paths use
+  the same text protocol as `claude_cli` and `codex_cli`. No per-provider
+  branching in the execution loop.
+- **Authoritative PID / log_file** — the EXECUTE → MONITOR handoff reads
+  `pid` and `log_file` directly from the `launch_experiment` tool's JSON
+  result, not from regex-scraping the model's prose.
+- **Provider-lock-down** — for `claude_cli` the framework passes
+  `--tools ""` so the CLI cannot bypass the protocol with its own built-in
+  tools. `codex_cli` has no equivalent flag and will silently ignore the
+  protocol; a runtime warning is emitted when it is used as a worker, and
+  users should pick one of the other three providers for worker dispatches.
+- **Fence stripping** — tool-call blocks inside triple-backtick code fences
+  are ignored, so a model's illustrative example in its prose is never
+  accidentally executed.
+
 ### Safety
 - Mandatory dry-run before every real training
 - Protected files can't be overwritten
diff --git a/CLAUDE.md b/CLAUDE.md
index a5f74f3..8bda4ba 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -396,6 +396,30 @@ During training (90%+ of time), the agent does NOT call the LLM. It only does:
 - **Writing Agent**: generates reports (3 tools)
 - Only 1 worker active at a time, others cost $0
 
+### Tool-Use Protocol (provider-agnostic)
+
+Workers do not use each provider's native SDK tool-use protocol. Instead the
+framework injects a plain-text schema into the system prompt and the worker
+emits tool calls as `<tool_call>{...}</tool_call>` blocks. The dispatcher
+parses the blocks, runs each through `ToolRegistry.execute_tool`, and feeds
+results back as `<tool_result name="...">...</tool_result>` in the next user
+turn. The loop runs until the worker produces a response with no tool calls
+(the final answer) or `max_turns` is reached.
+
+Key properties:
+
+- One text protocol works identically across all four providers — no
+  per-provider branching in the execution loop.
+- `launch_experiment` PID and log_file come authoritatively from the tool
+  result's JSON, not from regex-scraping the model's prose.
+- For `claude_cli` the framework passes `--tools ""` so the CLI cannot
+  bypass the protocol with its own built-in tools. `codex_cli` has no
+  equivalent flag and may silently ignore the protocol; a runtime warning
+  is emitted when it is used as a worker, and users should pick one of the
+  other three providers for worker dispatches.
+- Tool-call blocks inside triple-backtick code fences are ignored, so a
+  model's illustrative example in prose is never accidentally executed.
+
 ### Safety
 - Mandatory dry-run before every real training
 - Protected files can't be overwritten
diff --git a/README.md b/README.md
index 0b4666d..a3a1a0e 100644
--- a/README.md
+++ b/README.md
@@ -37,6 +37,14 @@
 
 ## Recent Updates
 
+**2026-04-19**
+- Workers now execute tools through a real multi-turn tool-use loop. The dispatcher injects the tool schema into the system prompt, parses `<tool_call>` blocks from the LLM response, runs each through `ToolRegistry.execute_tool`, feeds results back as `<tool_result>` in the next turn, and iterates until the worker produces a response with no tool calls or `max_turns` is hit. Previously the `tools` argument was accepted and silently dropped, and worker output was regex-scraped for PIDs — closes the gap raised in issue #13.
+- `launch_experiment` PIDs and log file paths are now surfaced directly from the tool result (authoritative), with the old free-text regex retained only as a fallback for pre-protocol responses.
+- `claude_cli` is forced into pure-text mode via `claude -p --tools ""`, so its responses reliably go through the framework's protocol.
+- `codex_cli` cannot be forced into pure-text mode by any current flag; when used as a worker provider the framework now emits a clear warning (see the updated compatibility table in *Supported LLM Providers*).
+- Tool-call blocks inside triple-backtick code fences are stripped before parsing, so illustrative examples in the LLM's prose are no longer accidentally executed.
+- Dead parameters (`tools`, `max_turns`) removed from `_call_llm`. They were never forwarded to the SDK; this aligns the code with what it actually does.
+
 **2026-04-18**
 - Added two new `provider` modes that reuse existing flat-rate subscriptions instead of per-token API billing: `claude_cli` (via the local `claude -p` CLI) and `codex_cli` (via the local `codex exec` CLI). Much cheaper when running multiple 24/7 agents in parallel. See the updated *Supported LLM Providers* section for the full API-vs-subscription trade-off table.
 - Provider validation added at dispatcher construction; unknown provider values now fail fast with a clear error instead of silently falling through.
@@ -863,18 +871,21 @@ Works with **both Anthropic and OpenAI** out of the box, and can run on a
 
 ### Authentication mode: API key vs. subscription
 
-| Mode | `provider` value | Billing | Requires |
-|------|------------------|---------|----------|
-| API — Anthropic | `anthropic` | Per-token, via `ANTHROPIC_API_KEY` | `pip install anthropic` |
-| API — OpenAI | `openai` | Per-token, via `OPENAI_API_KEY` | `pip install openai` |
-| **Subscription — Claude** | `claude_cli` | Flat-rate, uses your Claude Code / Pro / Max plan | `claude` CLI installed and logged in |
-| **Subscription — ChatGPT** | `codex_cli` | Flat-rate, uses your ChatGPT Plus / Pro plan | `codex` CLI installed and logged in |
-
-The `*_cli` modes shell out to the headless CLI (`claude -p` / `codex exec`) and
-share your existing subscription quota. This is much cheaper than per-token
-billing when you run multiple agents in parallel or do heavy Think/Reflect
-cycles. Trade-off: no native prompt caching or tool-use protocol — the CLI is
-used as a plain text-in / text-out oracle.
+| Mode | `provider` value | Billing | Requires | Tool-use support |
+|------|------------------|---------|----------|------------------|
+| API — Anthropic | `anthropic` | Per-token, via `ANTHROPIC_API_KEY` | `pip install anthropic` | ✅ Full |
+| API — OpenAI | `openai` | Per-token, via `OPENAI_API_KEY` | `pip install openai` | ✅ Full |
+| **Subscription — Claude** | `claude_cli` | Flat-rate, uses your Claude Code / Pro / Max plan | `claude` CLI installed and logged in | ✅ Full |
+| **Subscription — ChatGPT** | `codex_cli` | Flat-rate, uses your ChatGPT Plus / Pro plan | `codex` CLI installed and logged in | ⚠️ Leader only |
+
+Tool execution is driven by a text-based `<tool_call>` protocol injected
+into the worker's system prompt. All three "Full" providers can be forced
+into pure text-oracle mode so they honor the protocol (for `claude_cli`
+the framework passes `--tools ""` to disable built-in CLI tools). The
+`codex` CLI currently offers no equivalent flag — its internal agentic
+loop will bypass the protocol and the framework cannot recover PIDs from
+experiments it launches. Use `codex_cli` only for the leader/think path
+where no tools are needed.
 
 Switch provider in `config.yaml`:
 ```yaml
diff --git a/config.yaml b/config.yaml
index 45949de..36e6918 100644
--- a/config.yaml
+++ b/config.yaml
@@ -8,14 +8,23 @@ project:
 
 agent:
   # Provider:
-  #   "anthropic"  — Anthropic SDK, per-token API billing (ANTHROPIC_API_KEY)
-  #   "openai"     — OpenAI SDK, per-token API billing (OPENAI_API_KEY)
-  #   "claude_cli" — `claude -p` subprocess, reuses Claude Code / Pro / Max subscription
-  #   "codex_cli"  — `codex exec` subprocess, reuses ChatGPT Plus / Pro subscription
+  #   "anthropic"  — SDK, per-token API billing (ANTHROPIC_API_KEY)
+  #   "openai"     — SDK, per-token API billing (OPENAI_API_KEY)
+  #   "claude_cli" — subprocess, reuses Claude Code / Pro / Max subscription
+  #   "codex_cli"  — subprocess, reuses ChatGPT Plus / Pro subscription
   #
-  # The *_cli options are much cheaper when running many agents in parallel,
-  # because they share your existing subscription quota instead of per-token billing.
-  # They require the corresponding CLI to be installed and logged in once on this host.
+  # The *_cli options are much cheaper when running many agents in parallel
+  # because they share your existing subscription quota instead of per-token
+  # billing. They require the corresponding CLI to be installed and logged
+  # in once on this host.
+  #
+  # Tool-use compatibility: the worker path drives tools through a text-based
+  # <tool_call> protocol. Compatible providers: "anthropic", "openai",
+  # "claude_cli" (we force `--tools ""` so the CLI cannot bypass it).
+  # "codex_cli" cannot be forced into pure-text mode by any current CLI flag,
+  # so it will silently bypass the ToolRegistry and lose PID tracking. Use it
+  # for the leader/think path only; keep one of the three other providers for
+  # worker dispatches.
   provider: "anthropic"
 
   # Model selection per provider:
diff --git a/core/agents.py b/core/agents.py
index d0fe267..11f38aa 100644
--- a/core/agents.py
+++ b/core/agents.py
@@ -6,10 +6,20 @@
 - Workers: Specialized agents (idea/code/writing), spawned on demand
 
 Only ONE worker runs at a time. Others idle at zero token cost.
+
+Tool use is implemented via a provider-agnostic text protocol. The LLM
+emits <tool_call>{...}</tool_call> blocks, the dispatcher executes each
+call through the ToolRegistry, and results are fed back as
+<tool_result name="...">...</tool_result> blocks in the next user turn.
+The loop runs until the worker produces a response with no tool calls
+(the final answer) or max_turns is exceeded. This works uniformly
+across all four providers — the API SDKs don't use their native
+tool-use protocol, and the CLI providers are simply text oracles.
 """
 
 import json
 import logging
+import re
 from pathlib import Path
 from typing import Optional
 
@@ -20,6 +30,14 @@
 AGENTS_DIR = Path(__file__).parent.parent / "agents"
 
 
+# Tool-use text protocol
+_TOOL_CALL_RE = re.compile(r"<tool_call>\s*(\{.*?\})\s*</tool_call>", re.DOTALL)
+# Triple-backtick fenced blocks are stripped before parsing so that LLMs can
+# illustrate the protocol inside code fences without triggering real tool
+# execution. Matches ``` with an optional language tag through the next ```.
+_FENCED_BLOCK_RE = re.compile(r"```[^\n]*\n.*?```", re.DOTALL)
+
+
 class AgentDispatcher:
     """Dispatches tasks to specialized agents.
 
@@ -96,47 +114,109 @@ def dispatch_leader(self, task: str, context: dict) -> dict:
             "content": self._format_leader_input(task, context),
         })
 
-        response = self._call_llm(
-            system=system_prompt,
-            messages=messages,
-            max_turns=10,
-        )
+        response = self._call_llm(system=system_prompt, messages=messages)
 
         # Persist conversation for within-cycle coherence
         self._leader_history = messages + [{"role": "assistant", "content": response}]
 
         return self._parse_leader_response(response)
 
-    def dispatch_worker(self, agent_type: str, task: str, tools: list) -> dict:
-        """Dispatch a task to a worker agent.
+    def dispatch_worker(self, agent_type: str, task: str, tool_registry) -> dict:
+        """Dispatch a task to a worker agent and run its tool-use loop.
 
-        Workers are stateless — each dispatch is independent.
-        This keeps token costs predictable.
+        Workers are stateless across dispatches — each call starts with a
+        fresh conversation. Within a single dispatch the conversation is
+        multi-turn: the worker may emit tool calls, receive results, and
+        continue reasoning until it produces a final answer (a response
+        containing no <tool_call> blocks).
 
         Args:
-            agent_type: "idea", "code", or "writing"
-            task: Task description from the Leader
-            tools: Tool definitions to provide
+            agent_type: "idea", "code", or "writing".
+            task: Task description from the Leader.
+            tool_registry: ToolRegistry that provides `get_tools_for` and
+                `execute_tool`. The registry itself is passed in so this
+                module does not have a hard import dependency on tools.py.
 
         Returns:
-            Worker's result as a dict
+            Dict with at minimum `agent` and `response`. If the worker
+            called `launch_experiment`, the PID and log_file from that
+            tool result are also surfaced at the top level so the loop's
+            EXECUTE → MONITOR handoff keeps working.
         """
         if agent_type not in self.WORKER_CONFIGS:
             raise ValueError(f"Unknown agent type: {agent_type}")
+        if tool_registry is None:
+            raise TypeError(
+                "dispatch_worker requires a tool_registry with "
+                "`get_tools_for(agent_type)` and `execute_tool(name, args)` "
+                "methods. Pass a ToolRegistry configured with an empty tool "
+                "list if you want a tool-less worker."
+            )
 
         config = self.WORKER_CONFIGS[agent_type]
-        system_prompt = self._load_prompt(config["prompt_file"])
+        base_prompt = self._load_prompt(config["prompt_file"])
+        tool_defs = tool_registry.get_tools_for(agent_type)
+        system_prompt = base_prompt + "\n\n" + self._render_tools_section(tool_defs)
+        max_turns = config["max_turns"]
+
+        # codex_cli hard-codes its own agentic tool loop; it will ignore the
+        # <tool_call> protocol and silently act on its own. That breaks the
+        # EXECUTE → MONITOR handoff (no PID, no log_file from ToolRegistry).
+        # Leader/think dispatches are fine (they do not use tools) but worker
+        # dispatches will likely return a non-authoritative summary. Warn once
+        # per dispatch so users see it in the log without it becoming noise.
+        if self.provider == "codex_cli" and tool_defs:
+            logger.warning(
+                "codex_cli is being used as a worker provider; its CLI does "
+                "not support disabling built-in tools, so it may bypass the "
+                "ToolRegistry and the resulting PID/log_file cannot be "
+                "recovered. For worker dispatches prefer claude_cli, "
+                "anthropic, or openai."
+            )
 
         logger.info(f"Dispatching {agent_type} agent: {task[:100]}...")
 
-        response = self._call_llm(
-            system=system_prompt,
-            messages=[{"role": "user", "content": task}],
-            tools=tools,
-            max_turns=config["max_turns"],
-        )
+        messages = [{"role": "user", "content": task}]
+        last_response = ""
+        tool_results_log: list[dict] = []
+
+        for turn in range(1, max_turns + 1):
+            last_response = self._call_llm(system=system_prompt, messages=messages)
+
+            tool_calls = self._parse_tool_calls(last_response)
+            if not tool_calls:
+                # No tool calls → worker has produced its final answer.
+                break
+
+            # Echo the assistant turn so the next LLM call sees the history.
+            messages.append({"role": "assistant", "content": last_response})
+
+            # Execute each call and build a single user turn with all results.
+            result_blocks = []
+            for call in tool_calls:
+                name = call.get("name", "")
+                args = call.get("args", {}) or {}
+                if not isinstance(args, dict):
+                    tool_output = json.dumps({"error": "`args` must be a JSON object"})
+                else:
+                    tool_output = tool_registry.execute_tool(name, args)
+                tool_results_log.append({"name": name, "args": args, "output": tool_output})
+                result_blocks.append(
+                    f'<tool_result name="{name}">\n{tool_output}\n</tool_result>'
+                )
+
+            messages.append({
+                "role": "user",
+                "content": "\n\n".join(result_blocks),
+            })
+        else:
+            # for/else: executed only when the loop exhausts max_turns without break.
+            logger.warning(
+                f"Worker {agent_type} hit max_turns={max_turns} "
+                f"with tool calls still pending; returning last response."
+            )
 
-        result = self._parse_worker_response(response, agent_type)
+        result = self._parse_worker_response(last_response, agent_type, tool_results_log)
         logger.info(f"Worker {agent_type} completed: {str(result)[:200]}")
         return result
 
@@ -144,7 +224,88 @@ def reset_leader_history(self):
         """Clear leader conversation history between cycles."""
         self._leader_history = []
 
-    def _call_llm(self, system: str, messages: list, tools: list = None, max_turns: int = 10) -> str:
+    @staticmethod
+    def _parse_tool_calls(text: str) -> list[dict]:
+        """Extract <tool_call>{...}</tool_call> blocks from an LLM response.
+
+        Silently skips blocks whose JSON body is malformed. An empty list
+        means the response is a final answer (no tool calls requested).
+
+        Tool-call blocks inside triple-backtick code fences are deliberately
+        ignored: LLMs routinely illustrate the protocol inside fenced blocks
+        when explaining what they are about to do, and executing those
+        illustrations as real side-effectful calls has caused accidental
+        writes in practice.
+        """
+        stripped = _FENCED_BLOCK_RE.sub("", text or "")
+        calls: list[dict] = []
+        for match in _TOOL_CALL_RE.finditer(stripped):
+            body = match.group(1)
+            try:
+                parsed = json.loads(body)
+            except json.JSONDecodeError as exc:
+                logger.warning(f"Skipping malformed tool_call block: {exc}")
+                continue
+            if isinstance(parsed, dict) and parsed.get("name"):
+                calls.append(parsed)
+            else:
+                logger.warning(
+                    "Skipping tool_call without a string `name` field: "
+                    f"{str(parsed)[:120]}"
+                )
+        return calls
+
+    @staticmethod
+    def _render_tools_section(tool_defs: list[dict]) -> str:
+        """Render tool schemas as a plain-text block appended to the system prompt.
+
+        The worker's own prompt already has a short 'Tools Available' list;
+        this auto-generated section provides the exact machine-readable
+        schemas and protocol instructions so the LLM emits calls in the
+        format the dispatcher can parse.
+        """
+        if not tool_defs:
+            return ""
+
+        lines = [
+            "## Tool-Use Protocol",
+            "",
+            "You have NO direct access to the filesystem, shell, or network.",
+            "To act on the environment you MUST emit `<tool_call>` blocks and",
+            "wait for the framework to return `<tool_result>` blocks in the",
+            "next user turn. Example:",
+            "",
+            "    <tool_call>",
+            '    {"name": "read_file", "args": {"path": "config.yaml"}}',
+            "    </tool_call>",
+            "",
+            "You may emit multiple `<tool_call>` blocks in one message; each",
+            "will be executed and its result returned. When you are finished,",
+            "produce a plain-text message with NO `<tool_call>` blocks — that",
+            "is how the framework knows you are done.",
+            "",
+            "Emit `<tool_call>` blocks at the top level of the message. Do NOT",
+            "wrap them in triple-backtick code fences — fenced blocks are",
+            "treated as illustration, not as real calls.",
+            "",
+            "### Available tools",
+            "",
+        ]
+        for tool in tool_defs:
+            name = tool.get("name", "<unnamed>")
+            desc = tool.get("description", "")
+            schema = tool.get("input_schema", {})
+            lines.append(f"- `{name}` — {desc}")
+            props = schema.get("properties", {}) or {}
+            required = set(schema.get("required", []) or [])
+            for pname, pspec in props.items():
+                ptype = pspec.get("type", "any")
+                pdesc = pspec.get("description", "")
+                flag = "required" if pname in required else "optional"
+                lines.append(f"    - `{pname}` ({ptype}, {flag}): {pdesc}")
+        return "\n".join(lines)
+
+    def _call_llm(self, system: str, messages: list) -> str:
         """Call the LLM. Four providers are supported.
 
         - "anthropic":  Claude SDK, per-token API billing
@@ -154,8 +315,9 @@ def _call_llm(self, system: str, messages: list, tools: list = None, max_turns:
 
         CLI providers let you reuse existing subscriptions instead of paying per-token,
         which is much cheaper when running many agents in parallel or doing heavy
-        Think/Reflect cycles. Trade-off: no native prompt caching, no tool-use protocol —
-        the CLI is used as a plain text-in / text-out oracle.
+        Think/Reflect cycles. Trade-off: no native prompt caching, no native tool-use
+        protocol — the LLM is driven purely as a text-in / text-out oracle, and tool
+        use is layered on top via the <tool_call> text protocol (see dispatch_worker).
         """
         if self.provider == "claude_cli":
             return self._call_claude_cli(system, messages)
@@ -289,12 +451,20 @@ def _run_cli(self, argv: list, prompt: str, tool_label: str, install_hint: str,
         return (result.stdout or "").strip()
 
     def _call_claude_cli(self, system: str, messages: list) -> str:
-        """Headless dispatch via the `claude` CLI, billed against a Pro / Max subscription."""
+        """Headless dispatch via the `claude` CLI, billed against a Pro / Max subscription.
+
+        `--tools ""` disables every built-in tool so the CLI degrades to a
+        pure text oracle. This is required for our <tool_call> protocol
+        to work: the CLI must be unable to act on its own, otherwise it
+        will bypass our ToolRegistry (and the loop loses visibility over
+        what actually happened, especially for launch_experiment PIDs).
+
+        The prompt is piped via stdin to sidestep argv-size limits on
+        large conversation histories.
+        """
         prompt = self._flatten_for_cli(system, messages)
-        # claude -p reads the prompt from stdin when no argument is given,
-        # which avoids argv size limits for long memory contexts.
         return self._run_cli(
-            argv=["claude", "-p", "--output-format", "text"],
+            argv=["claude", "-p", "--output-format", "text", "--tools", ""],
             prompt=prompt,
             tool_label="claude",
             install_hint="npm i -g @anthropic-ai/claude-code && run `claude` once to sign in",
@@ -302,15 +472,74 @@ def _call_claude_cli(self, system: str, messages: list) -> str:
         )
 
     def _call_codex_cli(self, system: str, messages: list) -> str:
-        """Headless dispatch via the `codex` CLI, billed against a ChatGPT subscription."""
+        """Headless dispatch via the `codex` CLI, billed against a ChatGPT subscription.
+
+        Unlike `claude -p`, `codex exec` is fully agentic by default — it runs
+        its own internal tool-use loop and there is no CLI flag to disable
+        the built-in tools. That means the framework's <tool_call> protocol
+        is unreliable under this provider: codex will often act on its own
+        and return a final summary. Workers that need to launch experiments
+        (and recover a PID from the ToolRegistry) should therefore prefer
+        claude_cli / anthropic / openai; codex_cli is best kept for the
+        leader/think path where we only need free-text output.
+
+        Flags:
+          - `-o <tempfile>`       captures only the final assistant message
+                                  instead of the full agentic trace,
+          - `--skip-git-repo-check` allows codex to run in arbitrary dirs
+                                    (the workspace is typically not a repo).
+        """
+        import subprocess
+        import tempfile
+
         prompt = self._flatten_for_cli(system, messages)
-        return self._run_cli(
-            argv=["codex", "exec"],
-            prompt=prompt,
-            tool_label="codex",
-            install_hint="brew install codex (or see upstream project) then run `codex login`",
-            use_stdin=False,
-        )
+
+        try:
+            with tempfile.NamedTemporaryFile("w+", suffix=".txt", delete=False) as out:
+                out_path = out.name
+            try:
+                result = subprocess.run(
+                    [
+                        "codex", "exec",
+                        "--skip-git-repo-check",
+                        "-o", out_path,
+                        prompt,
+                    ],
+                    capture_output=True,
+                    text=True,
+                    timeout=600,
+                    check=False,
+                )
+            except FileNotFoundError:
+                logger.warning(
+                    "codex CLI not found on PATH. "
+                    "Install: brew install codex (or see upstream) then `codex login`. "
+                    "Falling back to mock response."
+                )
+                return json.dumps({"action": "wait", "reason": "codex CLI missing"})
+            except subprocess.TimeoutExpired:
+                logger.error("codex CLI timed out after 600s")
+                return json.dumps({"action": "wait", "reason": "codex CLI timeout"})
+
+            if result.returncode != 0:
+                stderr_tail = (result.stderr or "").strip().splitlines()[-5:]
+                logger.error(
+                    f"codex CLI exited {result.returncode}. "
+                    f"Stderr tail: {' | '.join(stderr_tail)}"
+                )
+                return json.dumps({"action": "wait", "reason": "codex CLI error"})
+
+            try:
+                with open(out_path, "r") as f:
+                    return f.read().strip()
+            except OSError:
+                # Fall back to stdout if --output-last-message didn't produce a file.
+                return (result.stdout or "").strip()
+        finally:
+            try:
+                Path(out_path).unlink(missing_ok=True)
+            except OSError:
+                pass
 
     def _load_prompt(self, filename: str) -> str:
         """Load agent prompt from agents/ directory."""
@@ -358,16 +587,42 @@ def _parse_leader_response(self, response: str) -> dict:
             "task": response,
         }
 
-    def _parse_worker_response(self, response: str, agent_type: str) -> dict:
-        """Parse worker response into structured result."""
+    def _parse_worker_response(self, response: str, agent_type: str,
+                               tool_results: Optional[list] = None) -> dict:
+        """Parse worker response into a structured result dict.
+
+        When the worker used the `launch_experiment` tool, the PID and
+        log_file come directly from that tool's JSON result — this is
+        authoritative. The regex-on-free-text path is retained as a
+        fallback for responses that report an experiment launch purely
+        in prose (or for older prompts that predate the tool-use loop).
+        """
         result = {"agent": agent_type, "response": response}
+        if tool_results:
+            result["tool_calls"] = len(tool_results)
 
-        # Check for experiment launch indicators
         if agent_type == "code":
-            if "PID" in response or "launched" in response.lower():
+            # Prefer authoritative tool-result data over text parsing.
+            launch_result = None
+            if tool_results:
+                for entry in reversed(tool_results):
+                    if entry.get("name") == "launch_experiment":
+                        launch_result = entry
+                        break
+            if launch_result is not None:
+                try:
+                    payload = json.loads(launch_result.get("output", "{}"))
+                except (json.JSONDecodeError, TypeError):
+                    payload = {}
+                if isinstance(payload, dict) and payload.get("pid") is not None:
+                    result["experiment_launched"] = True
+                    result["pid"] = int(payload["pid"])
+                    if payload.get("log_file"):
+                        result["log_file"] = payload["log_file"]
+
+            # Fallback: scrape PID from free-text response.
+            if "pid" not in result and ("PID" in response or "launched" in response.lower()):
                 result["experiment_launched"] = True
-                # Try to extract PID
-                import re
                 pid_match = re.search(r"PID[=:\s]+(\d+)", response)
                 if pid_match:
                     result["pid"] = int(pid_match.group(1))
diff --git a/core/loop.py b/core/loop.py
index 7131a57..e0d323c 100644
--- a/core/loop.py
+++ b/core/loop.py
@@ -208,7 +208,7 @@ def _execute(self, plan: dict) -> dict:
         result = self.dispatcher.dispatch_worker(
             agent_type=agent_type,
             task=task_description,
-            tools=self.tools.get_tools_for(agent_type),
+            tool_registry=self.tools,
         )
 
         return result
diff --git a/docs/architecture.md b/docs/architecture.md
index 4234b81..e5e934e 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -96,7 +96,45 @@ No LLM API calls until training completes.
 - 4 tools = 800 extra tokens per call
 - Over 100 API calls/day, that's 220K tokens saved
 
-### 6. GPU Utilities (`gpu/`)
+### 6. Tool-Use Protocol (`core/agents.py::dispatch_worker`)
+
+Workers drive tool calls through a provider-agnostic text protocol rather
+than each SDK's native tool-use API:
+
+1. The dispatcher renders the worker's tool schemas as a plain-text
+   `## Tool-Use Protocol` section and appends it to the system prompt.
+2. The worker emits zero or more `<tool_call>{"name": "...", "args": {...}}</tool_call>`
+   blocks in its response.
+3. For each block, the dispatcher calls `ToolRegistry.execute_tool` and
+   packages the JSON result into a `<tool_result name="...">...</tool_result>`
+   block appended to the next user turn.
+4. The loop iterates until the worker returns a message with no tool calls
+   (the final answer) or `max_turns` is reached.
+
+Design rationale:
+
+- **Uniform behaviour across four providers.** The same protocol works
+  whether the LLM is reached via the Anthropic SDK, the OpenAI SDK, the
+  `claude` CLI, or the `codex` CLI. The execution loop contains no
+  per-provider branching.
+- **Authoritative experiment hand-off.** `pid` and `log_file` flow from
+  the `launch_experiment` tool result (structured JSON) to
+  `_parse_worker_response`, which promotes them onto the top-level result
+  dict read by `loop._monitor_experiment`. Regex-on-prose remains as a
+  fallback only.
+- **CLI lock-down.** `claude_cli` is invoked with `--tools ""` so the
+  Claude Code CLI cannot bypass the protocol using its built-in tools.
+  `codex_cli` has no equivalent flag, so it may silently act on its own;
+  `dispatch_worker` logs a warning when `codex_cli` is used as a worker
+  provider, and the README compatibility table flags it accordingly.
+- **Fence stripping.** Tool-call blocks inside triple-backtick code fences
+  are removed before parsing so that models illustrating the protocol in
+  their prose do not trigger real side-effectful tool execution.
+- **Bounded execution.** `max_turns` is configured per-worker
+  (`idea=12`, `code=40`, `writing=30`); on overflow the loop exits cleanly
+  and the last response is returned with a warning.
+
+### 7. GPU Utilities (`gpu/`)
 
 - **detect.py**: Auto-detect GPUs, check availability, reserve last GPU
 - **keeper.py**: Keep cloud instances alive with minimal GPU activity
diff --git a/tests/integration_cli_tool_use.py b/tests/integration_cli_tool_use.py
new file mode 100644
index 0000000..2de0f86
--- /dev/null
+++ b/tests/integration_cli_tool_use.py
@@ -0,0 +1,58 @@
+"""Live integration check: drive a real CLI-provider worker end-to-end.
+
+Run manually:  python -m tests.integration_cli_tool_use
+
+Burns one subscription round-trip per provider. Skipped automatically
+if the CLI is not on PATH. Not wired into the normal unittest suite.
+"""
+
+import shutil
+import tempfile
+from pathlib import Path
+
+from core.agents import AgentDispatcher
+from core.tools import ToolRegistry
+
+
+TASK = (
+    "Your one job: create a file named hello.txt in the workspace "
+    "containing exactly the three-word sentence 'integration test ok', "
+    "then confirm by listing the files. Once done, reply with a short "
+    "success message and no further tool calls."
+)
+
+
+def _run(provider: str) -> dict:
+    binary = {"claude_cli": "claude", "codex_cli": "codex"}[provider]
+    if shutil.which(binary) is None:
+        return {"provider": provider, "skipped": f"{binary} not on PATH"}
+
+    dispatcher = AgentDispatcher(provider=provider)
+    with tempfile.TemporaryDirectory() as tmp:
+        workspace = Path(tmp)
+        registry = ToolRegistry(workspace)
+        try:
+            result = dispatcher.dispatch_worker("writing", TASK, registry)
+        except Exception as exc:
+            return {"provider": provider, "error": repr(exc)}
+
+        hello = workspace / "hello.txt"
+        return {
+            "provider": provider,
+            "tool_calls": result.get("tool_calls", 0),
+            "file_created": hello.exists(),
+            "file_content": hello.read_text() if hello.exists() else None,
+            "response_tail": (result.get("response", "") or "")[-200:],
+        }
+
+
+def main():
+    for provider in ("claude_cli", "codex_cli"):
+        print(f"\n=== {provider} ===")
+        outcome = _run(provider)
+        for k, v in outcome.items():
+            print(f"  {k}: {v}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/test_tool_use_loop.py b/tests/test_tool_use_loop.py
new file mode 100644
index 0000000..b6c295f
--- /dev/null
+++ b/tests/test_tool_use_loop.py
@@ -0,0 +1,295 @@
+"""Tests for the worker tool-use loop in core/agents.py.
+
+The loop must:
+  - run multiple turns until the model stops emitting <tool_call> blocks,
+  - feed each tool's output back as a <tool_result> in the next user turn,
+  - respect max_turns as a hard ceiling,
+  - surface PID/log_file from a launch_experiment tool call so the EXECUTE
+    → MONITOR handoff in core/loop.py still works,
+  - ignore malformed <tool_call> JSON bodies without crashing.
+"""
+
+import json
+import tempfile
+import unittest
+from pathlib import Path
+from unittest.mock import patch
+
+from core.agents import AgentDispatcher
+from core.tools import ToolRegistry
+
+
+def _make_dispatcher():
+    # Provider choice is irrelevant because we stub _call_llm everywhere.
+    return AgentDispatcher(provider="anthropic")
+
+
+class ParseToolCallsTests(unittest.TestCase):
+    def test_extracts_multiple_blocks_in_order(self):
+        text = """
+        I will take two actions.
+        <tool_call>
+        {"name": "read_file", "args": {"path": "a.txt"}}
+        </tool_call>
+        Some prose in between.
+        <tool_call>
+        {"name": "write_file", "args": {"path": "b.txt", "content": "hi"}}
+        </tool_call>
+        """
+        calls = AgentDispatcher._parse_tool_calls(text)
+        self.assertEqual([c["name"] for c in calls], ["read_file", "write_file"])
+        self.assertEqual(calls[1]["args"]["path"], "b.txt")
+
+    def test_empty_when_no_blocks(self):
+        self.assertEqual(AgentDispatcher._parse_tool_calls("final answer"), [])
+
+    def test_skips_malformed_json(self):
+        text = """
+        <tool_call>{"name": "ok", "args": {}}</tool_call>
+        <tool_call>{not valid json</tool_call>
+        <tool_call>"just a string"</tool_call>
+        """
+        calls = AgentDispatcher._parse_tool_calls(text)
+        self.assertEqual(len(calls), 1)
+        self.assertEqual(calls[0]["name"], "ok")
+
+    def test_ignores_tool_calls_inside_code_fences(self):
+        """LLMs often illustrate the protocol inside fenced blocks when
+        explaining themselves; those illustrations must NOT execute."""
+        text = '''
+        Here is how you would call the tool:
+
+        ```
+        <tool_call>
+        {"name": "write_file", "args": {"path": "DANGER.txt", "content": "pwned"}}
+        </tool_call>
+        ```
+
+        But for this task I do not need any tools.
+        '''
+        self.assertEqual(AgentDispatcher._parse_tool_calls(text), [])
+
+    def test_mix_of_fenced_illustration_and_real_call(self):
+        """A real top-level call must still be picked up even if the message
+        also contains an illustrative fenced example."""
+        text = '''
+        For reference, the general form looks like:
+
+        ```
+        <tool_call>
+        {"name": "write_file", "args": {"path": "EXAMPLE.txt", "content": "x"}}
+        </tool_call>
+        ```
+
+        Now I will do the real call:
+
+        <tool_call>
+        {"name": "read_file", "args": {"path": "actual.txt"}}
+        </tool_call>
+        '''
+        calls = AgentDispatcher._parse_tool_calls(text)
+        self.assertEqual(len(calls), 1)
+        self.assertEqual(calls[0]["name"], "read_file")
+        self.assertEqual(calls[0]["args"]["path"], "actual.txt")
+
+    def test_tolerates_missing_args_key(self):
+        """A tool_call without an `args` field is valid; args default to {}."""
+        text = '<tool_call>{"name": "list_files"}</tool_call>'
+        calls = AgentDispatcher._parse_tool_calls(text)
+        self.assertEqual(len(calls), 1)
+        self.assertEqual(calls[0]["name"], "list_files")
+
+    def test_rejects_args_that_is_not_a_dict(self):
+        """If the LLM emits args as a string or list, the dispatcher must
+        surface a structured error rather than crashing on **kwargs."""
+        dispatcher = _make_dispatcher()
+        registry = _FakeRegistry(
+            tools=[{"name": "read_file", "description": "", "input_schema": {}}],
+            outputs={},
+        )
+        turns = [
+            '<tool_call>{"name": "read_file", "args": "not-a-dict"}</tool_call>',
+            "giving up",
+        ]
+        with patch.object(dispatcher, "_call_llm", side_effect=turns):
+            dispatcher.dispatch_worker("writing", "t", registry)
+        # The registry must NOT have been called with a non-dict args payload.
+        self.assertEqual(registry.calls, [])
+
+
+class RenderToolsSectionTests(unittest.TestCase):
+    def test_renders_schema_properties(self):
+        tool = {
+            "name": "search_papers",
+            "description": "Search Semantic Scholar.",
+            "input_schema": {
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string", "description": "Search query"},
+                    "limit": {"type": "integer", "description": "Max results"},
+                },
+                "required": ["query"],
+            },
+        }
+        rendered = AgentDispatcher._render_tools_section([tool])
+        self.assertIn("search_papers", rendered)
+        self.assertIn("<tool_call>", rendered)
+        self.assertIn("query", rendered)
+        self.assertIn("required", rendered)
+        self.assertIn("Max results", rendered)
+
+    def test_empty_when_no_tools(self):
+        self.assertEqual(AgentDispatcher._render_tools_section([]), "")
+
+
+class _FakeRegistry:
+    """Minimal ToolRegistry stub that records calls and returns canned outputs."""
+
+    def __init__(self, tools, outputs):
+        self._tools = tools
+        self._outputs = outputs  # dict: tool_name -> json string
+        self.calls: list[tuple[str, dict]] = []
+
+    def get_tools_for(self, agent_type):
+        return self._tools
+
+    def execute_tool(self, name, args):
+        self.calls.append((name, args))
+        return self._outputs.get(name, json.dumps({"ok": True}))
+
+
+class DispatchWorkerLoopTests(unittest.TestCase):
+    def test_none_registry_raises_clear_typeerror(self):
+        """Passing tool_registry=None must fail at the boundary with a
+        clear TypeError, not with a cryptic AttributeError deep in the
+        loop. External callers of dispatch_worker will hit this edge."""
+        dispatcher = _make_dispatcher()
+        with self.assertRaises(TypeError) as ctx:
+            dispatcher.dispatch_worker("writing", "task", None)
+        self.assertIn("tool_registry", str(ctx.exception))
+        self.assertIn("get_tools_for", str(ctx.exception))
+
+    def test_unknown_agent_type_raises_before_touching_registry(self):
+        """Agent-type validation should happen first so that a bad
+        agent_type produces ValueError regardless of registry state."""
+        dispatcher = _make_dispatcher()
+        with self.assertRaises(ValueError):
+            dispatcher.dispatch_worker("bogus_agent", "task", None)
+
+    def test_terminates_when_response_has_no_tool_calls(self):
+        dispatcher = _make_dispatcher()
+        registry = _FakeRegistry(tools=[], outputs={})
+
+        with patch.object(dispatcher, "_call_llm", return_value="all done, no tools"):
+            result = dispatcher.dispatch_worker("writing", "task", registry)
+
+        self.assertEqual(registry.calls, [])
+        self.assertEqual(result["agent"], "writing")
+        self.assertIn("all done", result["response"])
+
+    def test_executes_tools_and_feeds_results_back(self):
+        dispatcher = _make_dispatcher()
+        fake_tools = [{"name": "read_file", "description": "read",
+                       "input_schema": {"type": "object", "properties": {}}}]
+        registry = _FakeRegistry(
+            tools=fake_tools,
+            outputs={"read_file": json.dumps({"content": "file body"})},
+        )
+
+        turns = [
+            '<tool_call>{"name": "read_file", "args": {"path": "a.txt"}}</tool_call>',
+            "Done reading, here is my summary.",
+        ]
+        call_log: list[list] = []
+
+        def fake_call(system, messages):
+            call_log.append(list(messages))
+            return turns.pop(0)
+
+        with patch.object(dispatcher, "_call_llm", side_effect=fake_call):
+            result = dispatcher.dispatch_worker("writing", "task", registry)
+
+        # One tool call was executed with the right args.
+        self.assertEqual(registry.calls, [("read_file", {"path": "a.txt"})])
+        # Second LLM turn saw the assistant's tool_call and a user tool_result.
+        self.assertEqual(len(call_log), 2)
+        second_turn_messages = call_log[1]
+        self.assertEqual(second_turn_messages[0]["role"], "user")  # original task
+        self.assertEqual(second_turn_messages[1]["role"], "assistant")  # tool call echo
+        self.assertEqual(second_turn_messages[2]["role"], "user")  # tool_result block
+        self.assertIn("<tool_result", second_turn_messages[2]["content"])
+        self.assertIn("file body", second_turn_messages[2]["content"])
+        # Final result captures the summary response and tool-call count.
+        self.assertEqual(result["tool_calls"], 1)
+        self.assertIn("summary", result["response"])
+
+    def test_max_turns_hard_ceiling(self):
+        dispatcher = _make_dispatcher()
+        registry = _FakeRegistry(
+            tools=[{"name": "read_file", "description": "", "input_schema": {}}],
+            outputs={},
+        )
+
+        # Every turn keeps requesting another tool call → would loop forever.
+        infinite = '<tool_call>{"name": "read_file", "args": {"path": "x"}}</tool_call>'
+
+        with patch.object(dispatcher, "_call_llm", return_value=infinite):
+            # Override max_turns on the 'writing' worker to something small for the test.
+            with patch.dict(AgentDispatcher.WORKER_CONFIGS["writing"], {"max_turns": 3}):
+                result = dispatcher.dispatch_worker("writing", "task", registry)
+
+        # Exactly 3 tool executions, no more.
+        self.assertEqual(len(registry.calls), 3)
+        self.assertEqual(result["response"], infinite)
+
+    def test_surfaces_pid_from_launch_experiment_tool_result(self):
+        """The EXECUTE → MONITOR handoff in core/loop.py reads result['pid']
+        and result['log_file']. These must come from the tool result (which
+        is authoritative) rather than regex-scraping the model's prose."""
+        dispatcher = _make_dispatcher()
+        launch_output = json.dumps({"pid": 4321, "log_file": "/tmp/exp.log", "status": "launched"})
+        registry = _FakeRegistry(
+            tools=[{"name": "launch_experiment", "description": "", "input_schema": {}}],
+            outputs={"launch_experiment": launch_output},
+        )
+
+        turns = [
+            '<tool_call>{"name": "launch_experiment", '
+            '"args": {"command": "python train.py", "log_file": "exp.log"}}'
+            '</tool_call>',
+            # Deliberately give a prose reply that lies about the PID — tool result should win.
+            "Training started, PID=99999 (this number is wrong).",
+        ]
+
+        with patch.object(dispatcher, "_call_llm", side_effect=turns):
+            result = dispatcher.dispatch_worker("code", "launch it", registry)
+
+        self.assertTrue(result["experiment_launched"])
+        self.assertEqual(result["pid"], 4321)
+        self.assertEqual(result["log_file"], "/tmp/exp.log")
+
+    def test_end_to_end_with_real_registry(self):
+        """Smoke test with a real ToolRegistry and a temp workspace."""
+        dispatcher = _make_dispatcher()
+        with tempfile.TemporaryDirectory() as tmp:
+            workspace = Path(tmp)
+            registry = ToolRegistry(workspace)
+
+            turns = [
+                '<tool_call>{"name": "write_file", '
+                '"args": {"path": "note.txt", "content": "hello"}}</tool_call>',
+                '<tool_call>{"name": "read_file", '
+                '"args": {"path": "note.txt"}}</tool_call>',
+                "I wrote and read the file successfully.",
+            ]
+
+            with patch.object(dispatcher, "_call_llm", side_effect=turns):
+                result = dispatcher.dispatch_worker("writing", "do it", registry)
+
+            self.assertEqual(result["tool_calls"], 2)
+            self.assertTrue((workspace / "note.txt").exists())
+            self.assertEqual((workspace / "note.txt").read_text(), "hello")
+
+
+if __name__ == "__main__":
+    unittest.main()