doc: add design spec for azd ai agent routine commands#8200
Conversation
📋 Prioritization NoteThanks for the contribution! The linked issue isn't in the current milestone yet. |
There was a problem hiding this comment.
Pull request overview
Adds a design specification for the proposed azd ai agent routine command subtree in the azure.ai.agents extension, covering command behavior, endpoint resolution, wire mappings, telemetry, and tests.
Changes:
- Adds the routine command design spec and v1 command surface.
- Documents endpoint resolution, API routes, output shapes, telemetry, and test plan.
- Adds a cspell override for terminology in the new design doc.
Reviewed changes
Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
cli/azd/docs/design/ai-routine-design-spec.md |
New design spec for routine commands. |
cli/azd/.vscode/cspell.yaml |
Adds a file-specific spelling override for the new doc. |
lindazqli
left a comment
There was a problem hiding this comment.
Reviewed against TypeSpec PR #43186 (adding routines, now merged into feature/foundry-release). Six issues found — four are blockers before shipping v1:
- [Line 56] TypeSpec PR reference is stale (#42779 → should be #43186).
- [Line 165]
enable/disableshould call the dedicated:enable/:disableroutes that already exist in the TypeSpec, not GET-then-PUT. - [Line 173] No sync
:dispatchroute exists in TypeSpec — onlyPOST :dispatch_async. The "sync by default" contract has no API backing. - [Line 249]
--agent-id/agent_idshould be--agent-name/agent_nameto match the TypeSpec field name. - [Line 201]
--orderbyhas noorderByquery param inListRoutineRunsParameters. - [Line 239] GitHub trigger:
--ownerserializes toownerbut TypeSpec field isassignee;--event-actionhas no TypeSpec counterpart.
lindazqli
left a comment
There was a problem hiding this comment.
Correction: the previous review was submitted as REQUEST_CHANGES by mistake — please treat all inline comments as informational notes for discussion, not as blockers. The comments remain valid observations to align the spec with TypeSpec PR #43186.
jongio
left a comment
There was a problem hiding this comment.
Spec follows the extension's existing patterns well for endpoint resolution, flag naming, and prompt/no-prompt behavior. The TypeSpec alignment feedback from existing reviews is the main thing to address before implementation.
One gap: error behavior isn't defined for any of the 9 commands. See inline comment.
|
Thanks everyone for the careful review! Here's what changed to address all the feedback: TypeSpec alignment (Linda's feedback against PR #43186)
Error behavior +
|
The design spec is tracked separately in PR #8200; this PR focuses on the implementation only.
wbreza
left a comment
There was a problem hiding this comment.
Code Review — Design Spec Quality Pass
Thanks for the thorough iteration on this spec @huimiu — the prior round of feedback (TypeSpec alignment, error section, --file re-scoping) is well incorporated. The findings below are net-new spec-quality items intended to reduce implementation ambiguity. The PR is approvable as-is; these are suggestions to tighten the spec before it becomes the implementation source of truth.
Note on review state: Posting as
COMMENTrather thanREQUEST_CHANGESsince @therealjohn has already approved and many items are clarifications, not blockers.
🔴 Critical — recommend addressing before merge or in a fast follow-up
-
github-issueenum out of sync across three locations.- §1 summary (L68) and §4.1 enum (L105) advertise
--trigger <recurring|timer|github-issue>as v1. - §5.1 trigger table marks
github-issueas "Deferred — pending workspace connection model." - §9 reference summary correctly omits it.
- Fix: Remove
github-issuefrom §1 and §4.1 enum; align with §9.
- §1 summary (L68) and §4.1 enum (L105) advertise
-
§4.9 error table lacks HTTP status codes.
The Type/Code columns are documented but implementers can't map service responses without explicit status codes. Notably,CodeRoutineAlreadyExists(line ~276) doesn't state409.
Fix: Add anHTTP Statuscolumn (400/404/409/429/503 as applicable). -
§9 command summary misrepresents conditional requirements.
Lines 508-514 show[--cron "0 8 * * *"]and[--at "..."]both as optional, but §5.1 makes each required per-trigger-type. The summary isn't copy-pasteable.
Fix: Split into separate recurring/timer examples, or add inline comments noting "--cronfor recurring,--atfor timer." -
Orphan
### Output shapes for state-changing verbsheading (between §4.8 and §4.9).
Unnumbered; §8 references it informally as[§4 table].
Fix: Number it (e.g., §4.8.1) and update the §8 anchor. -
dispatchstreaming/failure contract under-specified (§4.6).- Streaming protocol unspecified (SSE? chunked? line-buffered?).
- JSON output shape ambiguous when streaming aggregates.
- Polling timeout, max attempts, and reconnection semantics undefined.
- Mid-stream failure: is partial output flushed? what exit code?
- The leading
GET /routines/{name}implementation note doesn't specify what happens if that GET fails.
Fix: Add a dedicated subsection (or expand §4.6) covering the streaming contract and failure modes; reference an existing streaming command (e.g.,agent invoke) if one exists.
-
Broken/awkward anchor links across the doc.
Several internal links break across line wraps or include double-dashes from pipe-to-anchor conversion:- L105-108:
(#51-trigger\nr-flags…)— wrap artifact - L264-265: TypeSpec PR URL split with
#\n#diff-… - L200-201, 207, 241, 288, 348, 391: double-dash anchor IDs from
|chars in headings
Fix: Rewrap URLs onto single lines; verify all(#…)anchors resolve against actual heading IDs.
- L105-108:
-
Exit codes not specified for any verb.
Per AGENTS.md convention, errors should produce specific exit codes. Scripting consumers can't reliably distinguish validation errors from service errors.
Fix: Add a §4.9.1 (or table column) listing exit codes per scenario. -
--filepath resolution & validation under-specified.- Resolved against cwd or project root?
- Path traversal/symlink/absolute-path handling?
- Max file size?
- §8 lacks negative tests for malformed YAML/JSON.
Fix: Document path resolution rules in §4.1.1; add negative test cases in §8.
🟡 Important
-
--crondialect unspecified. Quartz vs Unix 5-field? Validated client-side or server-side? Error message format for invalid expressions? (§5.1) -
--atISO 8601 edge cases. Past time → error or schedule next occurrence?"2026-04-24T15:00:00"withoutZtreated as local or UTC? DST transitions? (§5.1) -
Concurrency races undefined. What happens for:
updateduring activedispatch?deletewhile routine isenabled? Concurrentenable/disable? At minimum, document expected server-side guarantees. (§4.2, §4.4, §4.5, §4.6) -
429/503 retry semantics + network timeouts not covered in §4.9. Does the CLI retry with backoff, or surface immediately?
-
403 RBAC vs 401 auth conflated under
CodeAuthFailed/CodeNotLoggedIn(L283). Different remediation paths — clarify mapping. -
create --forcewith conflicting immutable fields (different--triggeror--action) — replace silently, or fail? (§4.2) -
Telemetry "no PII" not defined (§6). Are agent names, conversation IDs,
--assigneeusernames, repository names captured? Enumerate explicitly. -
run listoutput formats (table/JSON) not specified (§4.7). Mirrorroutine listshape? -
Test plan gaps (§8): no explicit coverage for interactive prompts (
AZD_FORCE_TTY=false), wire-mapping table per trigger×action, dispatch streaming/backpressure, all 9 telemetry events,--filenegative cases,timertrigger in E2E smoke. -
--file+ per-trigger override flags (§4.1.1 L134-142). Spec says--filemutually exclusive with--triggerbut cooperates with--cron/--at— yet those are per-trigger-type. Clarify with valid/invalid examples.
🟢 Nice to have
-
--asynchelp text should explicitly say "returns immediately; useroutine run listfor status" — non-obvious otherwise. (§4.6) -
runsubgroup contains onlylistin v1. Empty-group pattern — considerroutine list-runsfor v1 and defer the fullrunsubgroup. (§1, §4.7-4.8) -
--inputguidance. Recommend documenting a size cap and a "don't pass secrets via--input" note. (§4.6) -
TOCTOU undefined on first use (L191). Add "(Time-of-Check-Time-of-Use)".
-
Endpoint cascade phrasing — described as "5-level" but step 5 is the structured error. Relabel as "4-level cascade with structured-error fallback." (§3 L77-82)
-
Missing newline in §1 (L4):
infra.Commandsrenders as one word. -
§9 reference summary inconsistency (L454):
[--enabled=false]shows a literal value while other flags use placeholders.
Reviewed against TypeSpec PR #43186 references in the spec and existing patterns in cli/azd/extensions/azure.ai.agents/ and sibling design docs.
Summary
Adds the design spec for the new
routinecommand subtree (cli/azd/docs/design/ai-routine-design-spec.md). A routine pairs one trigger (when) with one action (what) on a Foundry project, for example "every weekday at 8 AM UTC, invoke the daily report agent", without standing up Logic Apps, Functions, or cron infra.Related issues:
azd ai routine create/update/show/list/delete(plusenable,disable,dispatch, andrunsubcommands) to manage and run Foundry Routines from any directory #8159--fileforroutine create)Key points
azure.ai.agentsextension. Today the commands surface asazd ai agent routine …; they becomeazd ai routine …after the planned extension split (registration only). Noazure.yamlorregistry.jsonimpact, no new persistent state.create,update,show,list,delete,enable,disable,dispatch, andrun list.run showandrun deleteare deferred until the data plane lands them.connection,toolbox, andskill. TheRoutines=V1Previewopt in header is sent on every call.createfails on existing (or upserts with--force).updateis GET then PUT preserving untouched fields, with type switching of--trigger/--actionrejected client side.dispatchis sync by default and streams the agent response;--asyncswitches to the async route.recurring,agent-response,agent-invoke) read naturally and absorb upstream API renames via a single mapping table.