Implement real tool-use loop for worker dispatches (closes #13)#14
Merged
Conversation
Rewrite dispatch_worker from a single-shot text call into a true
iterative tool-use loop. The worker's tool schemas are rendered as
plain text and injected into the system prompt; worker responses are
scanned for <tool_call>{...}</tool_call> blocks which are executed
through ToolRegistry.execute_tool, and results are fed back as
<tool_result> blocks in the next user turn. The loop runs until the
worker produces a final answer with no tool calls, or max_turns is
hit. This works identically across all four `provider` values —
the two SDK paths and the two `*_cli` subprocess paths.
Before this change, the `tools` argument threaded through
dispatch_worker was silently dropped by the backend implementations,
ToolRegistry.execute_tool was only reachable from tests, and PIDs
were recovered by regex-scraping the worker's prose rather than by
reading the launch_experiment tool result. All three gaps are now
closed:
- dispatch_worker is a multi-turn loop; each turn sends the growing
conversation through _call_llm and acts on emitted tool_call blocks.
- launch_experiment's structured JSON result is the authoritative
source of pid/log_file, surfaced onto the top-level EXECUTE result
dict that core/loop.py passes to the monitor. Prose regex is kept
only as a fallback for pre-protocol responses.
- The dead `tools` and `max_turns` parameters on _call_llm have been
removed so the signature matches what the backends actually use.
- Tool-call blocks inside triple-backtick code fences are stripped
before parsing, so illustrative examples in the model's prose
cannot trigger real side-effectful execution.
- dispatch_worker raises TypeError with a clear message when called
with tool_registry=None, instead of a cryptic NoneType attribute
error deep in the loop.
- For the `claude_cli` provider the subprocess is launched with
`--tools ""` to disable built-in tool-use so the CLI is forced
into pure text-oracle mode and honors the text protocol. The
`codex_cli` provider has no equivalent flag and may bypass the
protocol; a runtime warning is emitted when it is used as a worker
provider, and the docs now steer users toward the other three
providers for worker dispatches.
Documentation:
- README.md compatibility table gets a "Tool-use support" column
- docs/architecture.md gains a new section describing the protocol
- AI_GUIDE.md and the in-repo guide add matching subsections
- config.yaml comments now explain the tool-use compatibility matrix
Tests:
- 12 new unit tests covering parse, render, loop termination,
max_turns ceiling, pid extraction authority, fence stripping,
mixed fenced-and-real calls, missing args, non-dict args,
none-registry guard, and unknown-agent-type guard
- A real-CLI integration harness at tests/integration_cli_tool_use.py
that is skipped by default
- Existing test_tools_security.py and test_loop_fallback.py continue
to pass with zero regressions (18 -> 24 total)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrite
dispatch_workerinto a real iterative tool-use loop that works identically across all four provider modes. Closes #13.Before: the
toolsargument was silently dropped,ToolRegistry.execute_toolwas only reached from tests, and PIDs were recovered by regex-scraping the worker's prose.After:
dispatch_workerdrives a multi-turn loop,execute_toolis the authoritative execution path, and PIDs flow from structured JSON tool results.For #13 specifically
xiexiexxsidentified two real design gaps:_call_llmaccepted atoolsparameter that was never forwarded to any SDK backend — correct, removed.ToolRegistry.execute_toolwas only reachable fromtests/test_tools_security.py, never from the loop — correct, now reached through the main worker path.Design
## Tool-Use Protocolsection describing the text protocol and rendering each tool's schema.<tool_call>{"name": "...", "args": {...}}</tool_call>blocks.ToolRegistry.execute_tool, wraps results as<tool_result name="...">...</tool_result>, appends to the next user turn.max_turnsis hit.launch_experiment's JSON result authoritatively populatespid/log_fileon the top-level EXECUTE result dict.Hardening
argsvalues that are not dicts produce a structured error without crashing on**kwargs.tool_registry=NoneraisesTypeErrorwith a clear message at the boundary.agent_typeraisesValueErrorbefore touching the registry.claude_clithe subprocess is launched with--tools ""to disable built-in CLI tool-use so the protocol is honored.codex_clihas no equivalent flag; when used as a worker provider, a runtime warning is emitted and docs steer users to the other three providers.Breaking changes
dispatch_workersignature:tools: list→tool_registry. Only one internal caller (core/loop.py::_execute), updated in the same commit._call_llmdrops the unusedtoolsandmax_turnsparameters.Test plan
tests/integration_cli_tool_use.py, not wired into default suite):claude_clicleanly emits 2 tool calls and creates the target file with exact content;codex_clitriggers the documented warning as expected.write_fileside effect verified on disk, mockedlaunch_experimentreturns PID=55555), MONITOR receives the authoritative PID.Documentation
README.md— compatibility table adds a "Tool-use support" column; Recent Updates entry for 2026-04-19.docs/architecture.md— new section "6. Tool-Use Protocol" with full design rationale.AI_GUIDE.md/ in-repoCLAUDE.md— matching "Tool-Use Protocol (provider-agnostic)" subsection.config.yaml— provider comments explain the tool-use compatibility matrix.