Skip to content

feat(routines): implement azd ai routine commands#8241

Open
huimiu wants to merge 11 commits into
mainfrom
hui/add-routine-command
Open

feat(routines): implement azd ai routine commands#8241
huimiu wants to merge 11 commits into
mainfrom
hui/add-routine-command

Conversation

@huimiu
Copy link
Copy Markdown
Member

@huimiu huimiu commented May 19, 2026

Summary

Implements the routine command subtree in the azure.ai.routines extension, as specified in the design spec (PR #8200).

A routine pairs one trigger (when) with one action (what) on a Foundry project — e.g. "every weekday at 8 AM UTC, invoke the daily-report agent" — without standing up Logic Apps, Functions, or cron infrastructure.

Changes

New packages

Package Description
internal/exterrors/ Structured error helpers (Validation, Dependency, Auth, ServiceFromAzure) with routine-specific error codes
internal/pkg/routines/ Data-plane HTTP client + Routine, RoutineTrigger, RoutineAction, RoutineRun models
internal/cmd/endpoint.go 5-level project endpoint resolver (flag → azd env → global config → FOUNDRY_PROJECT_ENDPOINT → error)
internal/cmd/routine*.go All command implementations

Commands implemented

Command Notes
routine create <name> Flag-based or --file manifest; --force for upsert; validates --trigger/--action requirements
routine update <name> GET-then-PUT, only named flags change; type-switch guard for --trigger/--action; --file merge
routine show <name> Table / JSON output
routine list Auto-paging via continuationToken; table / JSON output
routine delete <name> Confirmation prompt; --force to skip; --no-prompt requires --force
routine enable <name> Calls dedicated :enable route (idempotent)
routine disable <name> Calls dedicated :disable route (idempotent)
routine dispatch <name> Calls :dispatch_async; --async controls client-side waiting; --input, --conversation-id
routine run list <routine> Auto-paging; --top cap; --filter forwarded as query param

Trigger / action wire mapping

CLI --trigger Wire type Required flags
recurring schedule --cron, --time-zone
timer timer --at, --time-zone
CLI --action Wire type Required flags
agent-response (default) invoke_agent_responses_api one of --agent-id / --agent-endpoint-id
agent-invoke invoke_agent_invocations_api --agent-endpoint-id

Triggers and actions are emitted as { "default": { ... } } records (TypeSpec Record<RoutineTrigger> shape). Every request includes x-ms-foundry-features-opt-in: Routines=V1Preview.

What is deferred (per spec)

  • routine run show / routine run delete — API not yet stable
  • github-issue trigger — pending workspace connection model
  • Declarative routine.yaml + azd provision integration

Testing

  • go build ./... and go vet ./... pass cleanly
  • Unit tests for flag→wire mapping, update merge semantics, and error shapes are tracked as follow-up work

Closes #8159

Add the full v1 routine command subtree to the azure.ai.routines
extension as specified in the design spec (PR #8200).

Commands implemented:
- routine create, update, show, list, delete
- routine enable, disable (dedicated idempotent action routes)
- routine dispatch (calls dispatch_async, --async flag for client-side wait)
- routine run list (auto-paging, --top, --filter)

New packages:
- internal/exterrors/ -- structured error codes and helpers
- internal/pkg/routines/ -- data-plane HTTP client and models
- internal/cmd/endpoint.go -- 5-level project endpoint resolver

Wire format: trigger/action as Record with 'default' key.
All calls include x-ms-foundry-features-opt-in: Routines=V1Preview header.

Also adds the design spec at cli/azd/docs/design/ai-routine-design-spec.md.
huimiu added 6 commits May 19, 2026 15:18
…dels

- Add readAzdProjectSourcesFunc seam to endpoint.go for daemon isolation
- endpoint_test.go: isFoundryHost, validateProjectEndpoint, full cascade tests
- routine_create_test.go: buildTrigger and buildAction table tests
- routine_manifest_test.go: readRoutineManifest (JSON/YAML), mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction
- models_test.go: TriggerCLIToWire and ActionCLIToWire completeness
- Add yaml struct tags to models.go for YAML manifest support
- Extract stubAzdProjectSources() helper (mirrors stubAzdHostedSources in agents)
- isolateFromAzdDaemon now also clears AZD_SERVER env var
- Add t.Parallel() to all pure-function tests (isFoundryHost, validateProjectEndpoint,
  buildTrigger, buildAction, mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction,
  TriggerCLIToWire/ActionCLIToWire map checks)
- cspell.yaml: add exterrors, sess, routineName, azdProjectSources to word list
- endpoint.go: remove unused projectEndpointPathPrefix constant
- routine_create.go: wrap long buildAction() call (line >125 chars)
- routine_update.go: wrap long --file flag help text
- routine_manifest_test.go: expand inline map literals to multi-line
- client.go: wrap ListRoutineRuns signature to fit 125-char limit
- Run gofmt -w and go fix on all files (codes.go, client.go, models.go formatting)
golangci-lint flagged ptrBool as unused. The function had no call sites; the //go:fix inline directive does not exempt it from the unused linter.
The design spec is tracked separately in PR #8200; this PR focuses on the implementation only.
…pagination

- Extract getPage helper so resp.Body.Close runs per iteration in
  ListRoutines and ListRoutineRuns (defer-in-loop leaked FDs).
- Preserve the filter query param across pages in ListRoutineRuns;
  previously page 2+ only carried pageToken and dropped the filter.
- Correct dispatch command help text: the service always runs routines
  asynchronously, so the old 'waits and streams' wording was wrong.
@huimiu huimiu force-pushed the hui/add-routine-command branch from b1c629f to 8e0d723 Compare May 19, 2026 09:00
…ine'

The extension namespace 'ai.routine' already mounts the extension under
'azd ai routine'. Adding a 'routine' subcommand group on top of that
produced the redundant 'azd ai routine routine <cmd>' path.

Move --project-endpoint persistent flag and all subcommands directly onto
rootCmd so the correct usage is 'azd ai routine <cmd>'.
@huimiu huimiu marked this pull request as ready for review May 19, 2026 09:46
Copilot AI review requested due to automatic review settings May 19, 2026 09:46
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

📋 Prioritization Note

Thanks for the contribution! The linked issue isn't in the current milestone yet.
Review may take a bit longer — reach out to @rajeshkamal5050 or @kristenwomack if you'd like to discuss prioritization.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the initial azd ai routine command surface for the new azure.ai.routines extension, including a Foundry Routines data-plane client, models, endpoint resolution, and CLI commands for routine CRUD/dispatch/run history.

Changes:

  • Introduces internal/pkg/routines (models + HTTP client) and internal/exterrors structured error helpers.
  • Implements routine commands (create/update/show/list/delete/enable/disable/dispatch, plus run list) with table/JSON output.
  • Adds a 5-level project endpoint resolver with unit tests, plus additional unit tests for manifest parsing and flag→wire mapping.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
cli/azd/extensions/azure.ai.routines/internal/pkg/routines/models.go Routine/trigger/action/run models + CLI→wire mappings/constants.
cli/azd/extensions/azure.ai.routines/internal/pkg/routines/models_test.go Unit tests for CLI→wire mappings and default keys.
cli/azd/extensions/azure.ai.routines/internal/pkg/routines/client.go Data-plane HTTP client for routines operations (CRUD, enable/disable, dispatch, run listing).
cli/azd/extensions/azure.ai.routines/internal/exterrors/errors.go Structured LocalError/ServiceError helpers and wrappers.
cli/azd/extensions/azure.ai.routines/internal/exterrors/codes.go Error/operation code constants used across commands.
cli/azd/extensions/azure.ai.routines/internal/cmd/root.go Extension root command wiring (registers routine subcommands + persistent endpoint flag).
cli/azd/extensions/azure.ai.routines/internal/cmd/endpoint.go Project endpoint validation + 5-level endpoint resolution cascade (flag/env/config/envvar/error).
cli/azd/extensions/azure.ai.routines/internal/cmd/endpoint_test.go Unit tests for endpoint validation and cascade precedence.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine.go (Currently unused) routine subcommand group definition.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create.go routine create implementation (flags/manifest support, trigger/action building).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_create_test.go Unit tests for create flag→wire trigger/action builders.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_update.go routine update implementation (GET-then-PUT merge semantics, type-switch guard).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_show.go routine show implementation (table/JSON).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_list.go routine list implementation (auto-pagination, table/JSON).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_delete.go routine delete implementation (confirmation prompt, no-prompt/force guard).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_enable.go routine enable implementation.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_disable.go routine disable implementation.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_dispatch.go routine dispatch implementation (dispatch_async call + output handling).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_run.go routine run list implementation (run history listing with top/filter).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_helpers.go Shared helpers (client creation, JSON printing, tabwriter formatting).
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_manifest.go Manifest parsing and create/update merge helpers.
cli/azd/extensions/azure.ai.routines/internal/cmd/routine_manifest_test.go Unit tests for manifest parsing/merge/update-flag application helpers.
cli/azd/extensions/azure.ai.routines/go.mod Extension module dependencies updated.
cli/azd/extensions/azure.ai.routines/go.sum Dependency checksum updates.
cli/azd/extensions/azure.ai.routines/cspell.yaml Adds extension-local cspell words.

"provide an https:// URL",
)
}

Comment on lines +101 to +105
var body routines.Routine
body.Name = flags.name
body.Enabled = new(flags.enabled)
if flags.description != "" {
body.Description = flags.description
Comment on lines +187 to +191
wireType, ok := routines.TriggerCLIToWire[flags.trigger]
if !ok {
return routines.RoutineTrigger{}, exterrors.Validation(
exterrors.CodeInvalidParameter,
fmt.Sprintf("unknown trigger type %q", flags.trigger),
Comment on lines +58 to +61
// boolStr returns "true"/"false" for a *bool pointer.
func boolStr(b *bool) string {
if b == nil {
return "true"
Comment on lines +94 to +98
if cronChanged {
if trigger == nil {
return 0, fmt.Errorf("cannot set --cron: routine has no default trigger")
}
trigger.Cron = cron
Comment on lines +5 to +37

import (
"github.com/azure/azure-dev/cli/azd/pkg/azdext"
"github.com/spf13/cobra"
)

// newRoutineCommand creates the "routine" subcommand group.
func newRoutineCommand(extCtx *azdext.ExtensionContext) *cobra.Command {
cmd := &cobra.Command{
Use: "routine <command> [options]",
Short: "Manage Microsoft Foundry Routines. (Preview)",
Long: `Manage Microsoft Foundry Routines from your terminal.

A routine pairs one trigger (when) with one action (what) on a Foundry project.
For example: "every weekday at 8 AM UTC, invoke the daily-report agent".`,
}

// -p / --project-endpoint is a persistent flag so all subcommands inherit it.
cmd.PersistentFlags().StringP("project-endpoint", "p", "",
"Foundry project endpoint URL (overrides env var and config)")

cmd.AddCommand(newRoutineCreateCommand(extCtx))
cmd.AddCommand(newRoutineUpdateCommand(extCtx))
cmd.AddCommand(newRoutineShowCommand(extCtx))
cmd.AddCommand(newRoutineListCommand(extCtx))
cmd.AddCommand(newRoutineDeleteCommand(extCtx))
cmd.AddCommand(newRoutineEnableCommand(extCtx))
cmd.AddCommand(newRoutineDisableCommand(extCtx))
cmd.AddCommand(newRoutineDispatchCommand(extCtx))
cmd.AddCommand(newRoutineRunCommand(extCtx))

return cmd
}
The first cut of the Routines client used header/route/field shapes that
did not match the TypeSpec being merged in azure-rest-api-specs#42779.
Realign the extension with the spec so requests round-trip cleanly:

* Preview header renamed from 'x-ms-foundry-features-opt-in' to
  'Foundry-Features' (the value 'Routines=V1Preview' was already correct).
* Async dispatch route renamed ':dispatch_async' -> ':dispatchAsync';
  the action segment is case-sensitive per spec.
* Dropped the non-existent ':enable' / ':disable' action routes;
  enable/disable now read the routine and PUT it back with 'enabled'
  flipped (idempotent: no-op if already at the target value).
* DispatchRoutineRequest wraps a discriminated 'payload' object whose
  'type' must match the routine's action type; --conversation-id was
  removed from dispatch (the spec does not expose it).
* Routine.Action is now a single discriminated object (not an 'actions'
  map keyed by name).
* RoutineAction.AgentName -> AgentID; the CLI flag is renamed to
  --agent-id accordingly.
* RoutineTrigger.Cron -> CronExpression to match the TypeSpec field.
* PagedRoutine pagination follows the absolute 'nextLink' URL from
  Azure.Core.Page<Routine> instead of re-deriving a continuation query.
* RoutineRun gains the additional fields documented in the spec
  (phase, trigger_type, attempt_source, action_type, triggered_at,
  dispatch_id, action_correlation_id, response_id, error_type,
  error_message); 'run list' now prints phase alongside status.
* EventRoutineTrigger fields aligned to the spec: connection_id, owner,
  repository, actions[]; removed 'assignee'.
* DispatchRoutineResponse drops the unused 'status' field.

Tests, mock manifests, and the E2E driver were updated to the new
contract (--agent-id, agent_id, cron_expression, single 'action').

Note: the live Foundry Routines preview endpoint still returns HTTP 500
on /routines?api-version=v1 even with the correct request shape; that is
an upstream service bug tracked separately.
@huimiu huimiu changed the title feat(routines): implement azd ai routine commands (v1) feat(routines): implement azd ai routine commands May 19, 2026
Copy link
Copy Markdown
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review (automated, supplements Copilot pre-review)

TL;DR

First v1 of the azd ai routine command surface in a new extension. Implements create/update/show/list/delete/enable/disable/dispatch plus run list, a 5-level endpoint resolver, structured exterrors, and a Foundry data-plane HTTP client. Code is well-organized and the design is sound. However: the HTTP client has zero tests, enable/disable are GET-then-PUT with no concurrency guard, and the PR description contradicts the actual implementation in two places (enable route, preview header name). Findings below are net-new — they do not duplicate Copilot's 6 pre-review comments (port stripping, --enabled masking manifest, github-issue trigger accepted, boolStr nil→"true", applyUpdateFlags using fmt.Errorf, unused newRoutineCommand), all of which look valid.


🔴 Critical

C1 — PR description contradicts the implementation in two places (internal/pkg/routines/client.go)

PR description claims Code actually does
"routine enable/disable — Calls dedicated :enable route (idempotent)" setEnabled does GetRoutine → mutate EnabledPutRoutine. The code comment itself states: "The Foundry Routines API does not expose a dedicated :enable route; the client mutates the routine resource directly."
"Every request includes x-ms-foundry-features-opt-in: Routines=V1Preview" Header constant is routinesPreviewHeader = "Foundry-Features", so the actual header sent is Foundry-Features: Routines=V1Preview.

Two consequences:

  1. Reviewers, docs writers, and future maintainers will form an incorrect mental model from the description.
  2. If the Foundry service actually expects the x-ms-foundry-features-opt-in header name, the preview opt-in is silently broken and the server is ignoring it. Please verify the server-side header name against the Foundry TypeSpec and fix either the code or the description.

🟠 High

H1 — setEnabled has a lost-update race and no If-Match guard (client.go, EnableRoutine/DisableRoutine/setEnabled)

Because enable/disable is GET-then-PUT against the full routine resource, any concurrent writer (another CLI invocation, another user, a workspace tool) that mutates the routine between the GET and the PUT will be silently clobbered. There is no If-Match/ETag conditional request to detect the race.

Fix options: (a) capture ETag from the GET response and send If-Match on the PUT, returning a structured "conflict" error on 412; (b) document this race in user-facing help text and emit a warning when the GET observes Enabled != desired; (c) work with the Foundry team to add a true :enable/:disable endpoint (preferred long-term, matches the PR description).

H2 — HTTP client (internal/pkg/routines/client.go) has zero tests

The data-plane integration boundary — every CRUD/enable/disable/dispatch/list call — has no unit coverage. This means:

  • Pagination loop termination (nextLink follow in ListRoutines, pageToken in ListRoutineRuns) is unverified.
  • Same-origin checks on nextLink are unverified.
  • Error envelope parsing for 401 / 403 / 404 / 409 / 5xx is unverified.
  • Request body marshaling for trigger/action records is unverified end-to-end.
  • The client-side Top cap in run-listing is not exercised.

The PR description acknowledges tests are "tracked as follow-up work." Given this is the wire boundary of a Preview API, at minimum httptest.Server-based happy paths for each method plus pagination termination should ship in v1. The current command-level tests only exercise pure helpers (buildTrigger, buildAction, applyUpdateFlags) and don't reach the client.

H3 — routine update may report "no changes" when only --file is used (routine_update.go, routine_manifest.go applyUpdateFlags)

applyUpdateFlags only increments its changed counter when individual CLI flags are set; the --file merge path does not contribute to that counter. The no-op early-return checks changed == 0, so routine update <name> --file manifest.yaml (no other flags) likely takes the no-op branch even though the manifest legitimately changed fields. Either snapshot/diff the routine before vs after the merge, or count merge-introduced changes.


🟡 Medium

M1 — Credential ignores tenant for guest / multi-tenant users (internal/cmd/routine_helpers.go)

newRoutineClient constructs azidentity.NewAzureDeveloperCLICredential(&AzureDeveloperCLICredentialOptions{}) with no TenantID. Per .github/instructions/extensions.instructions.md, extensions must use the user-access tenant (Subscription.UserTenantId, not the resource tenant). For multi-tenant or guest users the default tenant from azd auth may not match the Foundry project's tenant, causing AADSTS errors. Plumb a tenant hint (e.g., from azd context or by extracting from the endpoint/project metadata) into the credential options.

M2 — Pagination has no max-page safety cap (client.go, ListRoutines / ListRoutineRuns)

Both methods loop while NextLink/PageToken is non-empty. A misbehaving or compromised service that returns a self-referential or cycling token will cause an unbounded loop that consumes CPU and memory (each appended page grows the result slice). Add a defensive page-count cap (e.g., 1000) and return a clear "pagination exceeded N pages" error.

M3 — Unbounded io.ReadAll on response bodies (client.go, JSON decode helper)

The decode helper reads the entire response body into memory without io.LimitReader. A misbehaving server returning a multi-GB body will OOM the process. Wrap with a sane cap (e.g., io.LimitReader(body, 32<<20)) before decoding.

M4 — Token scope https://ai.azure.com/.default should be verified (client.go, NewClient)

The Bearer token scope is hardcoded. If the Foundry Routines data plane requires a project-specific or different audience (services.ai.azure.com, a regional audience, or a Foundry-specific resource ID), token acquisition will succeed but the server may reject with 401/403. Verify against the Foundry TypeSpec and add a code comment justifying the chosen scope.

M5 — assert.Error instead of require.Error in negative-path tests (routine_create_test.go)

Several "expect error" tests use assert.Error followed by further dereferences of the result. If the implementation regresses and returns no error, the test logs a failure but continues executing on a nil/zero value, potentially panicking or producing a confusing secondary failure that hides the real cause. Switch to require.Error.


🟢 Low

  • L1routine dispatch --async help text ("Suppress descriptive output; useful for scripting") is ambiguous. Clarify that it prints only the dispatch ID and exits without polling.
  • L2 — Manifest file-format detection falls back to JSON for empty/unknown extensions, and the error message recommends only JSON, leaving YAML users guessing. Mention both formats in the error.
  • L3routine run subcommand help doesn't mention that run show / run delete are intentionally deferred. Add a one-line note in Long.
  • L4printJSON / table writers write directly to os.Stdout, making future output-assertion tests impossible without monkey-patching. Accept an io.Writer parameter.

Notes (not findings)

  • This PR lives in an extension with its own go.mod, so core-azd patterns (IoC container, ActionDescriptor, ErrorWithSuggestion) do not apply — the extension uses cobra directly.
  • new(expr) is valid in Go 1.26+ (see cli/azd/AGENTS.md).
  • Adding DisallowUnknownFields to the JSON decoder is not recommended — it would break forward-compat with non-breaking Foundry API additions.

Generated review · supplements Copilot inline comments · posted as non-blocking COMMENT.

- endpoint: reject project endpoints with an explicit port so the
  normalized URL cannot silently strip a non-default port
- routine create: only set Enabled from --enabled when the user
  explicitly passes the flag, so a manifest's enabled value is
  honored; default to enabled=true if neither source provides one
- routine create: explicitly reject --trigger github-issue (deferred
  for v1) instead of producing an incomplete github_issue trigger
- routine_helpers: boolStr now returns "unknown" for a nil pointer
  to avoid displaying "true" when the field is absent from the
  service response
- routine_manifest: surface applyUpdateFlags user-input errors as
  exterrors.Validation (CodeInvalidParameter) for consistent CLI
  error shapes
Copy link
Copy Markdown
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: commit 1ff54876 (fix(routines): address PR review feedback)

Scope: one new commit since my previous review — 6 files, +82/-9.

✅ Copilot's pre-review findings — all addressed correctly

# Issue Fix
1 endpoint.go silently strips explicit ports Explicit port now rejected with structured error + 2 new tests
2 --enabled default masks manifest value Uses cmd.Flags().Changed("enabled"); default applied only after manifest merge
3 --trigger github-issue accepted but incomplete Explicit rejection in buildTrigger with deferred-v1 message
4 boolStr returns "true" for nil pointer Now returns "unknown"
5 applyUpdateFlags uses plain fmt.Errorf All 6 sites converted to exterrors.Validation(CodeInvalidParameter, …)

(Copilot's 6th finding — dead newRoutineCommand — was already resolved in earlier commit b7d375b8.)

❌ Still open from my prior review

These weren't in the scope of this commit, just noting so they don't get lost:

  • C1 — PR description still claims a dedicated :enable route and x-ms-foundry-features-opt-in header. The actual code does GET-then-PUT and sends Foundry-Features. Please reconcile description vs implementation (and verify the preview header name with the Foundry team — if the server expects x-ms-foundry-features-opt-in, opt-in is silently broken).
  • H1 — Lost-update race in setEnabled (no If-Match/ETag).
  • H2internal/pkg/routines/client.go still has zero tests.
  • H3routine update --file may falsely report "no changes" because file-merge doesn't increment the changed counter.
  • M1AzureDeveloperCLICredential constructed with no TenantID; will break for guest / multi-tenant users.

No new blocking issues introduced by this commit. Nice clean turnaround on the Copilot items.

@JeffreyCA JeffreyCA added the ext-agents azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensions label May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ext-agents azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensions

Projects

None yet

4 participants