Implement real tool-use loop for worker dispatches (closes #13) by Xiangyue-Zhang · Pull Request #14 · Xiangyue-Zhang/auto-deep-researcher-24x7

Xiangyue-Zhang · 2026-04-19T06:17:24Z

Summary

Rewrite dispatch_worker into a real iterative tool-use loop that works identically across all four provider modes. Closes #13.

Before: the tools argument was silently dropped, ToolRegistry.execute_tool was only reached from tests, and PIDs were recovered by regex-scraping the worker's prose.

After: dispatch_worker drives a multi-turn loop, execute_tool is the authoritative execution path, and PIDs flow from structured JSON tool results.

For #13 specifically

xiexiexxs identified two real design gaps:

_call_llm accepted a tools parameter that was never forwarded to any SDK backend — correct, removed.
ToolRegistry.execute_tool was only reachable from tests/test_tools_security.py, never from the loop — correct, now reached through the main worker path.

Design

System prompt gets an auto-injected ## Tool-Use Protocol section describing the text protocol and rendering each tool's schema.
Worker emits <tool_call>{"name": "...", "args": {...}}</tool_call> blocks.
Dispatcher parses them, runs each via ToolRegistry.execute_tool, wraps results as <tool_result name="...">...</tool_result>, appends to the next user turn.
Loop terminates when the response has zero tool calls (final answer) or max_turns is hit.
launch_experiment's JSON result authoritatively populates pid / log_file on the top-level EXECUTE result dict.

Hardening

Triple-backtick fenced blocks are stripped before parsing so LLMs illustrating the protocol in prose cannot trigger real side effects.
args values that are not dicts produce a structured error without crashing on **kwargs.
tool_registry=None raises TypeError with a clear message at the boundary.
Invalid agent_type raises ValueError before touching the registry.
For claude_cli the subprocess is launched with --tools "" to disable built-in CLI tool-use so the protocol is honored.
codex_cli has no equivalent flag; when used as a worker provider, a runtime warning is emitted and docs steer users to the other three providers.

Breaking changes

dispatch_worker signature: tools: list → tool_registry. Only one internal caller (core/loop.py::_execute), updated in the same commit.
_call_llm drops the unused tools and max_turns parameters.

Test plan

24 unit tests pass (12 new + 12 existing), including fence stripping, max_turns ceiling, PID authority, None-registry guard, malformed-JSON resilience.
Real-CLI integration harness (tests/integration_cli_tool_use.py, not wired into default suite): claude_cli cleanly emits 2 tool calls and creates the target file with exact content; codex_cli triggers the documented warning as expected.
Full-loop smoke test: THINK returns an experiment plan, EXECUTE runs 2 tool calls (real write_file side effect verified on disk, mocked launch_experiment returns PID=55555), MONITOR receives the authoritative PID.
Guard workflow check passes on this PR.

Documentation

README.md — compatibility table adds a "Tool-use support" column; Recent Updates entry for 2026-04-19.
docs/architecture.md — new section "6. Tool-Use Protocol" with full design rationale.
AI_GUIDE.md / in-repo CLAUDE.md — matching "Tool-Use Protocol (provider-agnostic)" subsection.
config.yaml — provider comments explain the tool-use compatibility matrix.

Rewrite dispatch_worker from a single-shot text call into a true iterative tool-use loop. The worker's tool schemas are rendered as plain text and injected into the system prompt; worker responses are scanned for <tool_call>{...}</tool_call> blocks which are executed through ToolRegistry.execute_tool, and results are fed back as <tool_result> blocks in the next user turn. The loop runs until the worker produces a final answer with no tool calls, or max_turns is hit. This works identically across all four `provider` values — the two SDK paths and the two `*_cli` subprocess paths. Before this change, the `tools` argument threaded through dispatch_worker was silently dropped by the backend implementations, ToolRegistry.execute_tool was only reachable from tests, and PIDs were recovered by regex-scraping the worker's prose rather than by reading the launch_experiment tool result. All three gaps are now closed: - dispatch_worker is a multi-turn loop; each turn sends the growing conversation through _call_llm and acts on emitted tool_call blocks. - launch_experiment's structured JSON result is the authoritative source of pid/log_file, surfaced onto the top-level EXECUTE result dict that core/loop.py passes to the monitor. Prose regex is kept only as a fallback for pre-protocol responses. - The dead `tools` and `max_turns` parameters on _call_llm have been removed so the signature matches what the backends actually use. - Tool-call blocks inside triple-backtick code fences are stripped before parsing, so illustrative examples in the model's prose cannot trigger real side-effectful execution. - dispatch_worker raises TypeError with a clear message when called with tool_registry=None, instead of a cryptic NoneType attribute error deep in the loop. - For the `claude_cli` provider the subprocess is launched with `--tools ""` to disable built-in tool-use so the CLI is forced into pure text-oracle mode and honors the text protocol. The `codex_cli` provider has no equivalent flag and may bypass the protocol; a runtime warning is emitted when it is used as a worker provider, and the docs now steer users toward the other three providers for worker dispatches. Documentation: - README.md compatibility table gets a "Tool-use support" column - docs/architecture.md gains a new section describing the protocol - AI_GUIDE.md and the in-repo guide add matching subsections - config.yaml comments now explain the tool-use compatibility matrix Tests: - 12 new unit tests covering parse, render, loop termination, max_turns ceiling, pid extraction authority, fence stripping, mixed fenced-and-real calls, missing args, non-dict args, none-registry guard, and unknown-agent-type guard - A real-CLI integration harness at tests/integration_cli_tool_use.py that is skipped by default - Existing test_tools_security.py and test_loop_fallback.py continue to pass with zero regressions (18 -> 24 total)

Xiangyue-Zhang merged commit d1f357b into main Apr 19, 2026
1 check passed

Xiangyue-Zhang deleted the feat/tool-use-protocol-loop branch April 19, 2026 06:17

Xiangyue-Zhang mentioned this pull request Apr 19, 2026

关于工具调用_call_llm和_execute_tool的疑问 #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement real tool-use loop for worker dispatches (closes #13)#14

Implement real tool-use loop for worker dispatches (closes #13)#14
Xiangyue-Zhang merged 1 commit into
mainfrom
feat/tool-use-protocol-loop

Xiangyue-Zhang commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Xiangyue-Zhang commented Apr 19, 2026

Summary

For #13 specifically

Design

Hardening

Breaking changes

Test plan

Documentation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant