Skip to content

Implement real tool-use loop for worker dispatches (closes #13)#14

Merged
Xiangyue-Zhang merged 1 commit into
mainfrom
feat/tool-use-protocol-loop
Apr 19, 2026
Merged

Implement real tool-use loop for worker dispatches (closes #13)#14
Xiangyue-Zhang merged 1 commit into
mainfrom
feat/tool-use-protocol-loop

Conversation

@Xiangyue-Zhang
Copy link
Copy Markdown
Owner

Summary

Rewrite dispatch_worker into a real iterative tool-use loop that works identically across all four provider modes. Closes #13.

Before: the tools argument was silently dropped, ToolRegistry.execute_tool was only reached from tests, and PIDs were recovered by regex-scraping the worker's prose.

After: dispatch_worker drives a multi-turn loop, execute_tool is the authoritative execution path, and PIDs flow from structured JSON tool results.

For #13 specifically

xiexiexxs identified two real design gaps:

  1. _call_llm accepted a tools parameter that was never forwarded to any SDK backend — correct, removed.
  2. ToolRegistry.execute_tool was only reachable from tests/test_tools_security.py, never from the loop — correct, now reached through the main worker path.

Design

  • System prompt gets an auto-injected ## Tool-Use Protocol section describing the text protocol and rendering each tool's schema.
  • Worker emits <tool_call>{"name": "...", "args": {...}}</tool_call> blocks.
  • Dispatcher parses them, runs each via ToolRegistry.execute_tool, wraps results as <tool_result name="...">...</tool_result>, appends to the next user turn.
  • Loop terminates when the response has zero tool calls (final answer) or max_turns is hit.
  • launch_experiment's JSON result authoritatively populates pid / log_file on the top-level EXECUTE result dict.

Hardening

  • Triple-backtick fenced blocks are stripped before parsing so LLMs illustrating the protocol in prose cannot trigger real side effects.
  • args values that are not dicts produce a structured error without crashing on **kwargs.
  • tool_registry=None raises TypeError with a clear message at the boundary.
  • Invalid agent_type raises ValueError before touching the registry.
  • For claude_cli the subprocess is launched with --tools "" to disable built-in CLI tool-use so the protocol is honored.
  • codex_cli has no equivalent flag; when used as a worker provider, a runtime warning is emitted and docs steer users to the other three providers.

Breaking changes

  • dispatch_worker signature: tools: listtool_registry. Only one internal caller (core/loop.py::_execute), updated in the same commit.
  • _call_llm drops the unused tools and max_turns parameters.

Test plan

  • 24 unit tests pass (12 new + 12 existing), including fence stripping, max_turns ceiling, PID authority, None-registry guard, malformed-JSON resilience.
  • Real-CLI integration harness (tests/integration_cli_tool_use.py, not wired into default suite): claude_cli cleanly emits 2 tool calls and creates the target file with exact content; codex_cli triggers the documented warning as expected.
  • Full-loop smoke test: THINK returns an experiment plan, EXECUTE runs 2 tool calls (real write_file side effect verified on disk, mocked launch_experiment returns PID=55555), MONITOR receives the authoritative PID.
  • Guard workflow check passes on this PR.

Documentation

  • README.md — compatibility table adds a "Tool-use support" column; Recent Updates entry for 2026-04-19.
  • docs/architecture.md — new section "6. Tool-Use Protocol" with full design rationale.
  • AI_GUIDE.md / in-repo CLAUDE.md — matching "Tool-Use Protocol (provider-agnostic)" subsection.
  • config.yaml — provider comments explain the tool-use compatibility matrix.

Rewrite dispatch_worker from a single-shot text call into a true
iterative tool-use loop. The worker's tool schemas are rendered as
plain text and injected into the system prompt; worker responses are
scanned for <tool_call>{...}</tool_call> blocks which are executed
through ToolRegistry.execute_tool, and results are fed back as
<tool_result> blocks in the next user turn. The loop runs until the
worker produces a final answer with no tool calls, or max_turns is
hit. This works identically across all four `provider` values —
the two SDK paths and the two `*_cli` subprocess paths.

Before this change, the `tools` argument threaded through
dispatch_worker was silently dropped by the backend implementations,
ToolRegistry.execute_tool was only reachable from tests, and PIDs
were recovered by regex-scraping the worker's prose rather than by
reading the launch_experiment tool result. All three gaps are now
closed:

- dispatch_worker is a multi-turn loop; each turn sends the growing
  conversation through _call_llm and acts on emitted tool_call blocks.
- launch_experiment's structured JSON result is the authoritative
  source of pid/log_file, surfaced onto the top-level EXECUTE result
  dict that core/loop.py passes to the monitor. Prose regex is kept
  only as a fallback for pre-protocol responses.
- The dead `tools` and `max_turns` parameters on _call_llm have been
  removed so the signature matches what the backends actually use.
- Tool-call blocks inside triple-backtick code fences are stripped
  before parsing, so illustrative examples in the model's prose
  cannot trigger real side-effectful execution.
- dispatch_worker raises TypeError with a clear message when called
  with tool_registry=None, instead of a cryptic NoneType attribute
  error deep in the loop.
- For the `claude_cli` provider the subprocess is launched with
  `--tools ""` to disable built-in tool-use so the CLI is forced
  into pure text-oracle mode and honors the text protocol. The
  `codex_cli` provider has no equivalent flag and may bypass the
  protocol; a runtime warning is emitted when it is used as a worker
  provider, and the docs now steer users toward the other three
  providers for worker dispatches.

Documentation:
- README.md compatibility table gets a "Tool-use support" column
- docs/architecture.md gains a new section describing the protocol
- AI_GUIDE.md and the in-repo guide add matching subsections
- config.yaml comments now explain the tool-use compatibility matrix

Tests:
- 12 new unit tests covering parse, render, loop termination,
  max_turns ceiling, pid extraction authority, fence stripping,
  mixed fenced-and-real calls, missing args, non-dict args,
  none-registry guard, and unknown-agent-type guard
- A real-CLI integration harness at tests/integration_cli_tool_use.py
  that is skipped by default
- Existing test_tools_security.py and test_loop_fallback.py continue
  to pass with zero regressions (18 -> 24 total)
@Xiangyue-Zhang Xiangyue-Zhang merged commit d1f357b into main Apr 19, 2026
1 check passed
@Xiangyue-Zhang Xiangyue-Zhang deleted the feat/tool-use-protocol-loop branch April 19, 2026 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

关于工具调用_call_llm和_execute_tool的疑问

1 participant