doc: add design spec for `azd ai agent routine` commands by huimiu · Pull Request #8200 · Azure/azure-dev

huimiu · 2026-05-15T06:23:53Z

Summary

Adds the design spec for the new routine command subtree (cli/azd/docs/design/ai-routine-design-spec.md). A routine pairs one trigger (when) with one action (what) on a Foundry project, for example "every weekday at 8 AM UTC, invoke the daily report agent", without standing up Logic Apps, Functions, or cron infra.

Related issues:

Feature: Add azd ai routine create/update/show/list/delete (plus enable, disable, dispatch, and run subcommands) to manage and run Foundry Routines from any directory #8159
Follow up: Support creating routines from a file #8187 (--file for routine create)

Key points

Placement: lives in the existing azure.ai.agents extension. Today the commands surface as azd ai agent routine …; they become azd ai routine … after the planned extension split (registration only). No azure.yaml or registry.json impact, no new persistent state.
v1 surface: create, update, show, list, delete, enable, disable, dispatch, and run list. run show and run delete are deferred until the data plane lands them.
Endpoint resolution: reuses the standard 5 level cascade and works standalone outside an azd project, matching connection, toolbox, and skill. The Routines=V1Preview opt in header is sent on every call.
Create vs. update: the data plane is a single idempotent PUT; the CLI splits it for usability. create fails on existing (or upserts with --force). update is GET then PUT preserving untouched fields, with type switching of --trigger / --action rejected client side.
enable / disable / dispatch: dedicated idempotent verbs hide the wire format. dispatch is sync by default and streams the agent response; --async switches to the async route.
Wire mapping: kebab case CLI aliases (recurring, agent-response, agent-invoke) read naturally and absorb upstream API renames via a single mapping table.
Telemetry and tests: one event per command (no PII), plus unit coverage for flag to wire mapping, update merge semantics, and dispatch route selection, with an E2E smoke covering the create through delete loop.

…hange

github-actions · 2026-05-15T07:06:26Z

📋 Prioritization Note

Thanks for the contribution! The linked issue isn't in the current milestone yet.
Review may take a bit longer — reach out to @rajeshkamal5050 or @kristenwomack if you'd like to discuss prioritization.

Copilot

Pull request overview

Adds a design specification for the proposed azd ai agent routine command subtree in the azure.ai.agents extension, covering command behavior, endpoint resolution, wire mappings, telemetry, and tests.

Changes:

Adds the routine command design spec and v1 command surface.
Documents endpoint resolution, API routes, output shapes, telemetry, and test plan.
Adds a cspell override for terminology in the new design doc.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.

File	Description
`cli/azd/docs/design/ai-routine-design-spec.md`	New design spec for routine commands.
`cli/azd/.vscode/cspell.yaml`	Adds a file-specific spelling override for the new doc.

lindazqli

Reviewed against TypeSpec PR #43186 (adding routines, now merged into feature/foundry-release). Six issues found — four are blockers before shipping v1:

[Line 56] TypeSpec PR reference is stale (#42779 → should be #43186).
[Line 165] enable/disable should call the dedicated :enable / :disable routes that already exist in the TypeSpec, not GET-then-PUT.
[Line 173] No sync :dispatch route exists in TypeSpec — only POST :dispatch_async. The "sync by default" contract has no API backing.
[Line 249] --agent-id / agent_id should be --agent-name / agent_name to match the TypeSpec field name.
[Line 201] --orderby has no orderBy query param in ListRoutineRunsParameters.
[Line 239] GitHub trigger: --owner serializes to owner but TypeSpec field is assignee; --event-action has no TypeSpec counterpart.

lindazqli

Correction: the previous review was submitted as REQUEST_CHANGES by mistake — please treat all inline comments as informational notes for discussion, not as blockers. The comments remain valid observations to align the spec with TypeSpec PR #43186.

jongio

Spec follows the extension's existing patterns well for endpoint resolution, flag naming, and prompt/no-prompt behavior. The TypeSpec alignment feedback from existing reviews is the main thing to address before implementation.

One gap: error behavior isn't defined for any of the 9 commands. See inline comment.

huimiu · 2026-05-18T12:24:20Z

Thanks everyone for the careful review! Here's what changed to address all the feedback:

TypeSpec alignment (Linda's feedback against PR #43186)

Updated all stale #42779 refs to #43186.
enable / disable now call the dedicated :enable / :disable routes directly — no more GET-then-PUT (avoids the TOCTOU race you flagged).
dispatch (both default and --async) now hits :dispatch_async. There's no sync route in TypeSpec, so the flag just controls client-side waiting/streaming.
Renamed --agent-id → --agent-name everywhere (spec body, command summary, telemetry) to match the agent_name TypeSpec field.
Dropped --orderby from v1 — no orderBy query param exists yet.
GitHub trigger: --owner → --assignee, removed --event-action.
Added a heads-up note in §5.1 for the upcoming generic + strong-typed triggers.
Clarified --input is plain text (not parsed as JSON), and added an inline pointer that --trigger is an enum, not free-form.
Noted that the run show API now exists in #43186 — keeping registration deferred for v1 but called out in §4.8.

Error behavior + --file (John and Jon's feedback)

@jongio: added a new §4.9 Error Behavior section with a scenario → type/code/suggestion table covering 409 (create without --force), 404s (show/delete/dispatch/enable/disable on missing routine), the update GET-succeeds-then-PUT-404 race, type-switch, auth, endpoint resolution, and 5xx. Reuses the existing connection/toolbox typed-error pattern. Test plan updated with negative cases.
@therealjohn: pulled --file back into v1 scope. New §4.1.1 covers the file shape, mutual exclusion with --trigger, flag-level override semantics, and points to Support creating routines from a file #8187 for the schema details. update --file is supported too. Command summary, telemetry (hasFile), and test plan updated to match.

The design spec is tracked separately in PR #8200; this PR focuses on the implementation only.

wbreza

Code Review — Design Spec Quality Pass

Thanks for the thorough iteration on this spec @huimiu — the prior round of feedback (TypeSpec alignment, error section, --file re-scoping) is well incorporated. The findings below are net-new spec-quality items intended to reduce implementation ambiguity. The PR is approvable as-is; these are suggestions to tighten the spec before it becomes the implementation source of truth.

Note on review state: Posting as COMMENT rather than REQUEST_CHANGES since @therealjohn has already approved and many items are clarifications, not blockers.

🔴 Critical — recommend addressing before merge or in a fast follow-up

github-issue enum out of sync across three locations.
- §1 summary (L68) and §4.1 enum (L105) advertise --trigger <recurring|timer|github-issue> as v1.
- §5.1 trigger table marks github-issue as "Deferred — pending workspace connection model."
- §9 reference summary correctly omits it.
- Fix: Remove github-issue from §1 and §4.1 enum; align with §9.
§4.9 error table lacks HTTP status codes.
The Type/Code columns are documented but implementers can't map service responses without explicit status codes. Notably, CodeRoutineAlreadyExists (line ~276) doesn't state 409.
Fix: Add an HTTP Status column (400/404/409/429/503 as applicable).
§9 command summary misrepresents conditional requirements.
Lines 508-514 show [--cron "0 8 * * *"] and [--at "..."] both as optional, but §5.1 makes each required per-trigger-type. The summary isn't copy-pasteable.
Fix: Split into separate recurring/timer examples, or add inline comments noting "--cron for recurring, --at for timer."
Orphan ### Output shapes for state-changing verbs heading (between §4.8 and §4.9).
Unnumbered; §8 references it informally as [§4 table].
Fix: Number it (e.g., §4.8.1) and update the §8 anchor.
dispatch streaming/failure contract under-specified (§4.6).
- Streaming protocol unspecified (SSE? chunked? line-buffered?).
- JSON output shape ambiguous when streaming aggregates.
- Polling timeout, max attempts, and reconnection semantics undefined.
- Mid-stream failure: is partial output flushed? what exit code?
- The leading GET /routines/{name} implementation note doesn't specify what happens if that GET fails.
  Fix: Add a dedicated subsection (or expand §4.6) covering the streaming contract and failure modes; reference an existing streaming command (e.g., agent invoke) if one exists.
Broken/awkward anchor links across the doc.
Several internal links break across line wraps or include double-dashes from pipe-to-anchor conversion:
- L105-108: (#51-trigger\nr-flags…) — wrap artifact
- L264-265: TypeSpec PR URL split with #\n#diff-…
- L200-201, 207, 241, 288, 348, 391: double-dash anchor IDs from | chars in headings
  Fix: Rewrap URLs onto single lines; verify all (#…) anchors resolve against actual heading IDs.
Exit codes not specified for any verb.
Per AGENTS.md convention, errors should produce specific exit codes. Scripting consumers can't reliably distinguish validation errors from service errors.
Fix: Add a §4.9.1 (or table column) listing exit codes per scenario.
--file path resolution & validation under-specified.
- Resolved against cwd or project root?
- Path traversal/symlink/absolute-path handling?
- Max file size?
- §8 lacks negative tests for malformed YAML/JSON.
  Fix: Document path resolution rules in §4.1.1; add negative test cases in §8.

🟡 Important

--cron dialect unspecified. Quartz vs Unix 5-field? Validated client-side or server-side? Error message format for invalid expressions? (§5.1)
--at ISO 8601 edge cases. Past time → error or schedule next occurrence? "2026-04-24T15:00:00" without Z treated as local or UTC? DST transitions? (§5.1)
Concurrency races undefined. What happens for: update during active dispatch? delete while routine is enabled? Concurrent enable/disable? At minimum, document expected server-side guarantees. (§4.2, §4.4, §4.5, §4.6)
429/503 retry semantics + network timeouts not covered in §4.9. Does the CLI retry with backoff, or surface immediately?
403 RBAC vs 401 auth conflated under CodeAuthFailed/CodeNotLoggedIn (L283). Different remediation paths — clarify mapping.
create --force with conflicting immutable fields (different --trigger or --action) — replace silently, or fail? (§4.2)
Telemetry "no PII" not defined (§6). Are agent names, conversation IDs, --assignee usernames, repository names captured? Enumerate explicitly.
run list output formats (table/JSON) not specified (§4.7). Mirror routine list shape?
Test plan gaps (§8): no explicit coverage for interactive prompts (AZD_FORCE_TTY=false), wire-mapping table per trigger×action, dispatch streaming/backpressure, all 9 telemetry events, --file negative cases, timer trigger in E2E smoke.
--file + per-trigger override flags (§4.1.1 L134-142). Spec says --file mutually exclusive with --trigger but cooperates with --cron/--at — yet those are per-trigger-type. Clarify with valid/invalid examples.

🟢 Nice to have

--async help text should explicitly say "returns immediately; use routine run list for status" — non-obvious otherwise. (§4.6)
run subgroup contains only list in v1. Empty-group pattern — consider routine list-runs for v1 and defer the full run subgroup. (§1, §4.7-4.8)
--input guidance. Recommend documenting a size cap and a "don't pass secrets via --input" note. (§4.6)
TOCTOU undefined on first use (L191). Add "(Time-of-Check-Time-of-Use)".
Endpoint cascade phrasing — described as "5-level" but step 5 is the structured error. Relabel as "4-level cascade with structured-error fallback." (§3 L77-82)
Missing newline in §1 (L4): infra.Commands renders as one word.
§9 reference summary inconsistency (L454): [--enabled=false] shows a literal value while other flags use placeholders.

Reviewed against TypeSpec PR #43186 references in the spec and existing patterns in cli/azd/extensions/azure.ai.agents/ and sibling design docs.

doc: add design spec for azd ai agent routine commands

6feb5c3

microsoft-github-policy-service Bot assigned huimiu May 15, 2026

huimiu added 2 commits May 15, 2026 14:30

doc: allow 'exterrors' in routine design spec cspell

eaf8cac

doc: drop routine run show/delete stub question from open questions

4ad815f

huimiu marked this pull request as ready for review May 15, 2026 07:02

Copilot AI review requested due to automatic review settings May 15, 2026 07:02

huimiu requested review from JeffreyCA, hemarina, jongio, rajeshkamal5050, tg-msft, vhvb1989, wbreza and weikanglim as code owners May 15, 2026 07:02

doc: avoid 'exterrors' tokens in routine design spec, revert cspell c…

4ec8bb4

…hange

Copilot started reviewing on behalf of huimiu May 15, 2026 07:04 View session

huimiu linked an issue May 15, 2026 that may be closed by this pull request

Add azd ai routine create/update/show/list/delete (plus enable, disable, dispatch, and run subcommands) to manage and run Foundry Routines from any directory #8159

Open

Copilot AI reviewed May 15, 2026

View reviewed changes

doc: address PR review comments on routine design spec

cac4b4a

therealjohn requested changes May 15, 2026

View reviewed changes

Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated