feat(routines): implement azd ai routine commands#8241
Conversation
Add the full v1 routine command subtree to the azure.ai.routines extension as specified in the design spec (PR #8200). Commands implemented: - routine create, update, show, list, delete - routine enable, disable (dedicated idempotent action routes) - routine dispatch (calls dispatch_async, --async flag for client-side wait) - routine run list (auto-paging, --top, --filter) New packages: - internal/exterrors/ -- structured error codes and helpers - internal/pkg/routines/ -- data-plane HTTP client and models - internal/cmd/endpoint.go -- 5-level project endpoint resolver Wire format: trigger/action as Record with 'default' key. All calls include x-ms-foundry-features-opt-in: Routines=V1Preview header. Also adds the design spec at cli/azd/docs/design/ai-routine-design-spec.md.
…dels - Add readAzdProjectSourcesFunc seam to endpoint.go for daemon isolation - endpoint_test.go: isFoundryHost, validateProjectEndpoint, full cascade tests - routine_create_test.go: buildTrigger and buildAction table tests - routine_manifest_test.go: readRoutineManifest (JSON/YAML), mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction - models_test.go: TriggerCLIToWire and ActionCLIToWire completeness - Add yaml struct tags to models.go for YAML manifest support
- Extract stubAzdProjectSources() helper (mirrors stubAzdHostedSources in agents) - isolateFromAzdDaemon now also clears AZD_SERVER env var - Add t.Parallel() to all pure-function tests (isFoundryHost, validateProjectEndpoint, buildTrigger, buildAction, mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction, TriggerCLIToWire/ActionCLIToWire map checks)
- cspell.yaml: add exterrors, sess, routineName, azdProjectSources to word list - endpoint.go: remove unused projectEndpointPathPrefix constant - routine_create.go: wrap long buildAction() call (line >125 chars) - routine_update.go: wrap long --file flag help text - routine_manifest_test.go: expand inline map literals to multi-line - client.go: wrap ListRoutineRuns signature to fit 125-char limit - Run gofmt -w and go fix on all files (codes.go, client.go, models.go formatting)
golangci-lint flagged ptrBool as unused. The function had no call sites; the //go:fix inline directive does not exempt it from the unused linter.
The design spec is tracked separately in PR #8200; this PR focuses on the implementation only.
…pagination - Extract getPage helper so resp.Body.Close runs per iteration in ListRoutines and ListRoutineRuns (defer-in-loop leaked FDs). - Preserve the filter query param across pages in ListRoutineRuns; previously page 2+ only carried pageToken and dropped the filter. - Correct dispatch command help text: the service always runs routines asynchronously, so the old 'waits and streams' wording was wrong.
b1c629f to
8e0d723
Compare
…ine' The extension namespace 'ai.routine' already mounts the extension under 'azd ai routine'. Adding a 'routine' subcommand group on top of that produced the redundant 'azd ai routine routine <cmd>' path. Move --project-endpoint persistent flag and all subcommands directly onto rootCmd so the correct usage is 'azd ai routine <cmd>'.
📋 Prioritization NoteThanks for the contribution! The linked issue isn't in the current milestone yet. |
There was a problem hiding this comment.
Pull request overview
Adds the initial azd ai routine command surface for the new azure.ai.routines extension, including a Foundry Routines data-plane client, models, endpoint resolution, and CLI commands for routine CRUD/dispatch/run history.
Changes:
- Introduces
internal/pkg/routines(models + HTTP client) andinternal/exterrorsstructured error helpers. - Implements routine commands (
create/update/show/list/delete/enable/disable/dispatch, plusrun list) with table/JSON output. - Adds a 5-level project endpoint resolver with unit tests, plus additional unit tests for manifest parsing and flag→wire mapping.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/extensions/azure.ai.routines/internal/pkg/routines/models.go | Routine/trigger/action/run models + CLI→wire mappings/constants. |
| cli/azd/extensions/azure.ai.routines/internal/pkg/routines/models_test.go | Unit tests for CLI→wire mappings and default keys. |
| cli/azd/extensions/azure.ai.routines/internal/pkg/routines/client.go | Data-plane HTTP client for routines operations (CRUD, enable/disable, dispatch, run listing). |
| cli/azd/extensions/azure.ai.routines/internal/exterrors/errors.go | Structured LocalError/ServiceError helpers and wrappers. |
| cli/azd/extensions/azure.ai.routines/internal/exterrors/codes.go | Error/operation code constants used across commands. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/root.go | Extension root command wiring (registers routine subcommands + persistent endpoint flag). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/endpoint.go | Project endpoint validation + 5-level endpoint resolution cascade (flag/env/config/envvar/error). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/endpoint_test.go | Unit tests for endpoint validation and cascade precedence. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine.go | (Currently unused) routine subcommand group definition. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create.go | routine create implementation (flags/manifest support, trigger/action building). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create_test.go | Unit tests for create flag→wire trigger/action builders. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_update.go | routine update implementation (GET-then-PUT merge semantics, type-switch guard). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_show.go | routine show implementation (table/JSON). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_list.go | routine list implementation (auto-pagination, table/JSON). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_delete.go | routine delete implementation (confirmation prompt, no-prompt/force guard). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_enable.go | routine enable implementation. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_disable.go | routine disable implementation. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_dispatch.go | routine dispatch implementation (dispatch_async call + output handling). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_run.go | routine run list implementation (run history listing with top/filter). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_helpers.go | Shared helpers (client creation, JSON printing, tabwriter formatting). |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_manifest.go | Manifest parsing and create/update merge helpers. |
| cli/azd/extensions/azure.ai.routines/internal/cmd/routine_manifest_test.go | Unit tests for manifest parsing/merge/update-flag application helpers. |
| cli/azd/extensions/azure.ai.routines/go.mod | Extension module dependencies updated. |
| cli/azd/extensions/azure.ai.routines/go.sum | Dependency checksum updates. |
| cli/azd/extensions/azure.ai.routines/cspell.yaml | Adds extension-local cspell words. |
| "provide an https:// URL", | ||
| ) | ||
| } | ||
|
|
| var body routines.Routine | ||
| body.Name = flags.name | ||
| body.Enabled = new(flags.enabled) | ||
| if flags.description != "" { | ||
| body.Description = flags.description |
| wireType, ok := routines.TriggerCLIToWire[flags.trigger] | ||
| if !ok { | ||
| return routines.RoutineTrigger{}, exterrors.Validation( | ||
| exterrors.CodeInvalidParameter, | ||
| fmt.Sprintf("unknown trigger type %q", flags.trigger), |
| // boolStr returns "true"/"false" for a *bool pointer. | ||
| func boolStr(b *bool) string { | ||
| if b == nil { | ||
| return "true" |
| if cronChanged { | ||
| if trigger == nil { | ||
| return 0, fmt.Errorf("cannot set --cron: routine has no default trigger") | ||
| } | ||
| trigger.Cron = cron |
|
|
||
| import ( | ||
| "github.com/azure/azure-dev/cli/azd/pkg/azdext" | ||
| "github.com/spf13/cobra" | ||
| ) | ||
|
|
||
| // newRoutineCommand creates the "routine" subcommand group. | ||
| func newRoutineCommand(extCtx *azdext.ExtensionContext) *cobra.Command { | ||
| cmd := &cobra.Command{ | ||
| Use: "routine <command> [options]", | ||
| Short: "Manage Microsoft Foundry Routines. (Preview)", | ||
| Long: `Manage Microsoft Foundry Routines from your terminal. | ||
|
|
||
| A routine pairs one trigger (when) with one action (what) on a Foundry project. | ||
| For example: "every weekday at 8 AM UTC, invoke the daily-report agent".`, | ||
| } | ||
|
|
||
| // -p / --project-endpoint is a persistent flag so all subcommands inherit it. | ||
| cmd.PersistentFlags().StringP("project-endpoint", "p", "", | ||
| "Foundry project endpoint URL (overrides env var and config)") | ||
|
|
||
| cmd.AddCommand(newRoutineCreateCommand(extCtx)) | ||
| cmd.AddCommand(newRoutineUpdateCommand(extCtx)) | ||
| cmd.AddCommand(newRoutineShowCommand(extCtx)) | ||
| cmd.AddCommand(newRoutineListCommand(extCtx)) | ||
| cmd.AddCommand(newRoutineDeleteCommand(extCtx)) | ||
| cmd.AddCommand(newRoutineEnableCommand(extCtx)) | ||
| cmd.AddCommand(newRoutineDisableCommand(extCtx)) | ||
| cmd.AddCommand(newRoutineDispatchCommand(extCtx)) | ||
| cmd.AddCommand(newRoutineRunCommand(extCtx)) | ||
|
|
||
| return cmd | ||
| } |
The first cut of the Routines client used header/route/field shapes that did not match the TypeSpec being merged in azure-rest-api-specs#42779. Realign the extension with the spec so requests round-trip cleanly: * Preview header renamed from 'x-ms-foundry-features-opt-in' to 'Foundry-Features' (the value 'Routines=V1Preview' was already correct). * Async dispatch route renamed ':dispatch_async' -> ':dispatchAsync'; the action segment is case-sensitive per spec. * Dropped the non-existent ':enable' / ':disable' action routes; enable/disable now read the routine and PUT it back with 'enabled' flipped (idempotent: no-op if already at the target value). * DispatchRoutineRequest wraps a discriminated 'payload' object whose 'type' must match the routine's action type; --conversation-id was removed from dispatch (the spec does not expose it). * Routine.Action is now a single discriminated object (not an 'actions' map keyed by name). * RoutineAction.AgentName -> AgentID; the CLI flag is renamed to --agent-id accordingly. * RoutineTrigger.Cron -> CronExpression to match the TypeSpec field. * PagedRoutine pagination follows the absolute 'nextLink' URL from Azure.Core.Page<Routine> instead of re-deriving a continuation query. * RoutineRun gains the additional fields documented in the spec (phase, trigger_type, attempt_source, action_type, triggered_at, dispatch_id, action_correlation_id, response_id, error_type, error_message); 'run list' now prints phase alongside status. * EventRoutineTrigger fields aligned to the spec: connection_id, owner, repository, actions[]; removed 'assignee'. * DispatchRoutineResponse drops the unused 'status' field. Tests, mock manifests, and the E2E driver were updated to the new contract (--agent-id, agent_id, cron_expression, single 'action'). Note: the live Foundry Routines preview endpoint still returns HTTP 500 on /routines?api-version=v1 even with the correct request shape; that is an upstream service bug tracked separately.
wbreza
left a comment
There was a problem hiding this comment.
Code Review (automated, supplements Copilot pre-review)
TL;DR
First v1 of the azd ai routine command surface in a new extension. Implements create/update/show/list/delete/enable/disable/dispatch plus run list, a 5-level endpoint resolver, structured exterrors, and a Foundry data-plane HTTP client. Code is well-organized and the design is sound. However: the HTTP client has zero tests, enable/disable are GET-then-PUT with no concurrency guard, and the PR description contradicts the actual implementation in two places (enable route, preview header name). Findings below are net-new — they do not duplicate Copilot's 6 pre-review comments (port stripping, --enabled masking manifest, github-issue trigger accepted, boolStr nil→"true", applyUpdateFlags using fmt.Errorf, unused newRoutineCommand), all of which look valid.
🔴 Critical
C1 — PR description contradicts the implementation in two places (internal/pkg/routines/client.go)
| PR description claims | Code actually does |
|---|---|
"routine enable/disable — Calls dedicated :enable route (idempotent)" |
setEnabled does GetRoutine → mutate Enabled → PutRoutine. The code comment itself states: "The Foundry Routines API does not expose a dedicated :enable route; the client mutates the routine resource directly." |
"Every request includes x-ms-foundry-features-opt-in: Routines=V1Preview" |
Header constant is routinesPreviewHeader = "Foundry-Features", so the actual header sent is Foundry-Features: Routines=V1Preview. |
Two consequences:
- Reviewers, docs writers, and future maintainers will form an incorrect mental model from the description.
- If the Foundry service actually expects the
x-ms-foundry-features-opt-inheader name, the preview opt-in is silently broken and the server is ignoring it. Please verify the server-side header name against the Foundry TypeSpec and fix either the code or the description.
🟠 High
H1 — setEnabled has a lost-update race and no If-Match guard (client.go, EnableRoutine/DisableRoutine/setEnabled)
Because enable/disable is GET-then-PUT against the full routine resource, any concurrent writer (another CLI invocation, another user, a workspace tool) that mutates the routine between the GET and the PUT will be silently clobbered. There is no If-Match/ETag conditional request to detect the race.
Fix options: (a) capture ETag from the GET response and send If-Match on the PUT, returning a structured "conflict" error on 412; (b) document this race in user-facing help text and emit a warning when the GET observes Enabled != desired; (c) work with the Foundry team to add a true :enable/:disable endpoint (preferred long-term, matches the PR description).
H2 — HTTP client (internal/pkg/routines/client.go) has zero tests
The data-plane integration boundary — every CRUD/enable/disable/dispatch/list call — has no unit coverage. This means:
- Pagination loop termination (
nextLinkfollow inListRoutines,pageTokeninListRoutineRuns) is unverified. - Same-origin checks on
nextLinkare unverified. - Error envelope parsing for 401 / 403 / 404 / 409 / 5xx is unverified.
- Request body marshaling for trigger/action records is unverified end-to-end.
- The client-side
Topcap in run-listing is not exercised.
The PR description acknowledges tests are "tracked as follow-up work." Given this is the wire boundary of a Preview API, at minimum httptest.Server-based happy paths for each method plus pagination termination should ship in v1. The current command-level tests only exercise pure helpers (buildTrigger, buildAction, applyUpdateFlags) and don't reach the client.
H3 — routine update may report "no changes" when only --file is used (routine_update.go, routine_manifest.go applyUpdateFlags)
applyUpdateFlags only increments its changed counter when individual CLI flags are set; the --file merge path does not contribute to that counter. The no-op early-return checks changed == 0, so routine update <name> --file manifest.yaml (no other flags) likely takes the no-op branch even though the manifest legitimately changed fields. Either snapshot/diff the routine before vs after the merge, or count merge-introduced changes.
🟡 Medium
M1 — Credential ignores tenant for guest / multi-tenant users (internal/cmd/routine_helpers.go)
newRoutineClient constructs azidentity.NewAzureDeveloperCLICredential(&AzureDeveloperCLICredentialOptions{}) with no TenantID. Per .github/instructions/extensions.instructions.md, extensions must use the user-access tenant (Subscription.UserTenantId, not the resource tenant). For multi-tenant or guest users the default tenant from azd auth may not match the Foundry project's tenant, causing AADSTS errors. Plumb a tenant hint (e.g., from azd context or by extracting from the endpoint/project metadata) into the credential options.
M2 — Pagination has no max-page safety cap (client.go, ListRoutines / ListRoutineRuns)
Both methods loop while NextLink/PageToken is non-empty. A misbehaving or compromised service that returns a self-referential or cycling token will cause an unbounded loop that consumes CPU and memory (each appended page grows the result slice). Add a defensive page-count cap (e.g., 1000) and return a clear "pagination exceeded N pages" error.
M3 — Unbounded io.ReadAll on response bodies (client.go, JSON decode helper)
The decode helper reads the entire response body into memory without io.LimitReader. A misbehaving server returning a multi-GB body will OOM the process. Wrap with a sane cap (e.g., io.LimitReader(body, 32<<20)) before decoding.
M4 — Token scope https://ai.azure.com/.default should be verified (client.go, NewClient)
The Bearer token scope is hardcoded. If the Foundry Routines data plane requires a project-specific or different audience (services.ai.azure.com, a regional audience, or a Foundry-specific resource ID), token acquisition will succeed but the server may reject with 401/403. Verify against the Foundry TypeSpec and add a code comment justifying the chosen scope.
M5 — assert.Error instead of require.Error in negative-path tests (routine_create_test.go)
Several "expect error" tests use assert.Error followed by further dereferences of the result. If the implementation regresses and returns no error, the test logs a failure but continues executing on a nil/zero value, potentially panicking or producing a confusing secondary failure that hides the real cause. Switch to require.Error.
🟢 Low
- L1 —
routine dispatch --asynchelp text ("Suppress descriptive output; useful for scripting") is ambiguous. Clarify that it prints only the dispatch ID and exits without polling. - L2 — Manifest file-format detection falls back to JSON for empty/unknown extensions, and the error message recommends only JSON, leaving YAML users guessing. Mention both formats in the error.
- L3 —
routine runsubcommand help doesn't mention thatrun show/run deleteare intentionally deferred. Add a one-line note inLong. - L4 —
printJSON/ table writers write directly toos.Stdout, making future output-assertion tests impossible without monkey-patching. Accept anio.Writerparameter.
Notes (not findings)
- This PR lives in an extension with its own
go.mod, so core-azdpatterns (IoC container,ActionDescriptor,ErrorWithSuggestion) do not apply — the extension uses cobra directly. new(expr)is valid in Go 1.26+ (seecli/azd/AGENTS.md).- Adding
DisallowUnknownFieldsto the JSON decoder is not recommended — it would break forward-compat with non-breaking Foundry API additions.
Generated review · supplements Copilot inline comments · posted as non-blocking COMMENT.
- endpoint: reject project endpoints with an explicit port so the normalized URL cannot silently strip a non-default port - routine create: only set Enabled from --enabled when the user explicitly passes the flag, so a manifest's enabled value is honored; default to enabled=true if neither source provides one - routine create: explicitly reject --trigger github-issue (deferred for v1) instead of producing an incomplete github_issue trigger - routine_helpers: boolStr now returns "unknown" for a nil pointer to avoid displaying "true" when the field is absent from the service response - routine_manifest: surface applyUpdateFlags user-input errors as exterrors.Validation (CodeInvalidParameter) for consistent CLI error shapes
wbreza
left a comment
There was a problem hiding this comment.
Re-review: commit 1ff54876 (fix(routines): address PR review feedback)
Scope: one new commit since my previous review — 6 files, +82/-9.
✅ Copilot's pre-review findings — all addressed correctly
| # | Issue | Fix |
|---|---|---|
| 1 | endpoint.go silently strips explicit ports |
Explicit port now rejected with structured error + 2 new tests |
| 2 | --enabled default masks manifest value |
Uses cmd.Flags().Changed("enabled"); default applied only after manifest merge |
| 3 | --trigger github-issue accepted but incomplete |
Explicit rejection in buildTrigger with deferred-v1 message |
| 4 | boolStr returns "true" for nil pointer |
Now returns "unknown" |
| 5 | applyUpdateFlags uses plain fmt.Errorf |
All 6 sites converted to exterrors.Validation(CodeInvalidParameter, …) |
(Copilot's 6th finding — dead newRoutineCommand — was already resolved in earlier commit b7d375b8.)
❌ Still open from my prior review
These weren't in the scope of this commit, just noting so they don't get lost:
- C1 — PR description still claims a dedicated
:enableroute andx-ms-foundry-features-opt-inheader. The actual code does GET-then-PUT and sendsFoundry-Features. Please reconcile description vs implementation (and verify the preview header name with the Foundry team — if the server expectsx-ms-foundry-features-opt-in, opt-in is silently broken). - H1 — Lost-update race in
setEnabled(noIf-Match/ETag). - H2 —
internal/pkg/routines/client.gostill has zero tests. - H3 —
routine update --filemay falsely report "no changes" because file-merge doesn't increment thechangedcounter. - M1 —
AzureDeveloperCLICredentialconstructed with noTenantID; will break for guest / multi-tenant users.
No new blocking issues introduced by this commit. Nice clean turnaround on the Copilot items.
Summary
Implements the
routinecommand subtree in theazure.ai.routinesextension, as specified in the design spec (PR #8200).A routine pairs one trigger (when) with one action (what) on a Foundry project — e.g. "every weekday at 8 AM UTC, invoke the
daily-reportagent" — without standing up Logic Apps, Functions, or cron infrastructure.Changes
New packages
internal/exterrors/Validation,Dependency,Auth,ServiceFromAzure) with routine-specific error codesinternal/pkg/routines/Routine,RoutineTrigger,RoutineAction,RoutineRunmodelsinternal/cmd/endpoint.goFOUNDRY_PROJECT_ENDPOINT→ error)internal/cmd/routine*.goCommands implemented
routine create <name>--filemanifest;--forcefor upsert; validates--trigger/--actionrequirementsroutine update <name>--trigger/--action;--filemergeroutine show <name>routine listcontinuationToken; table / JSON outputroutine delete <name>--forceto skip;--no-promptrequires--forceroutine enable <name>:enableroute (idempotent)routine disable <name>:disableroute (idempotent)routine dispatch <name>:dispatch_async;--asynccontrols client-side waiting;--input,--conversation-idroutine run list <routine>--topcap;--filterforwarded as query paramTrigger / action wire mapping
--triggertyperecurringschedule--cron,--time-zonetimertimer--at,--time-zone--actiontypeagent-response(default)invoke_agent_responses_api--agent-id/--agent-endpoint-idagent-invokeinvoke_agent_invocations_api--agent-endpoint-idTriggers and actions are emitted as
{ "default": { ... } }records (TypeSpecRecord<RoutineTrigger>shape). Every request includesx-ms-foundry-features-opt-in: Routines=V1Preview.What is deferred (per spec)
routine run show/routine run delete— API not yet stablegithub-issuetrigger — pending workspace connection modelroutine.yaml+azd provisionintegrationTesting
go build ./...andgo vet ./...pass cleanlyCloses #8159