diff --git a/.vscode/cspell.misc.yaml b/.vscode/cspell.misc.yaml index 043e51e8def..117342f3954 100644 --- a/.vscode/cspell.misc.yaml +++ b/.vscode/cspell.misc.yaml @@ -80,11 +80,16 @@ overrides: - filename: docs/architecture/**/*.md words: - azapi + - appinsights - Errorf - grpcbroker + - Pseudonymized + - Thhmmss + - unhashed - vsrpc - filename: docs/guides/**/*.md words: + - azdext - errorlint - Errorf - gofmt @@ -92,14 +97,37 @@ overrides: - gosec - jaegertracing - mycommand + - myfeature - mypackage - pflag + - Pseudonymized - staticcheck - stdlib + - unhashed - vsrpc - filename: docs/reference/**/*.md words: + - appinit - appservice - Buildpacks - containerapp + - dcount + - devdeviceid + - Entra + - Errorf + - exegraph + - oneauth + - postprovision + - preprovision + - postdeploy + - pseudonymized + - remotebuild + - resourcegroup + - springapp + - startswith - staticwebapp + - toreal + - tostring + - unhashed + - vsrpc + - whatif diff --git a/docs/README.md b/docs/README.md index f60a383ff37..a36a0eff4fe 100644 --- a/docs/README.md +++ b/docs/README.md @@ -22,6 +22,8 @@ Task-oriented how-tos for common contributor workflows. - [Adding a New Command](guides/adding-a-new-command.md) — End-to-end walkthrough for new CLI commands - [Creating an Extension](guides/creating-an-extension.md) — How to build and publish an azd extension - [Observability and Tracing](guides/observability.md) — Adding telemetry, traces, and debugging +- [Feature Telemetry Guide](guides/feature-telemetry.md) — End-to-end guide for instrumenting telemetry in new features +- [Telemetry Overview](guides/telemetry-overview.md) — Product-facing overview of azd telemetry metrics ## Reference @@ -30,6 +32,7 @@ Schemas, flags, environment variables, and configuration details. - [Environment Variables](reference/environment-variables.md) — All environment variables that configure azd behavior - [azure.yaml Schema](reference/azure-yaml-schema.md) — Project configuration file reference - [Feature Status](reference/feature-status.md) — Current maturity status of all features +- [Telemetry Data Reference](reference/telemetry-data.md) — Complete schema of all telemetry events and fields ## Architecture @@ -39,6 +42,7 @@ System overviews, design context, and decision records. - [Command Execution Model](architecture/command-execution-model.md) — How commands are registered, resolved, and run - [Extension Framework](architecture/extension-framework.md) — gRPC-based extension system architecture - [Provisioning Pipeline](architecture/provisioning-pipeline.md) — How infrastructure provisioning works +- [Telemetry Architecture](architecture/telemetry.md) — How azd collects and exports telemetry - [ADR Template](architecture/adr-template.md) — Template for lightweight architecture decision records --- diff --git a/docs/architecture/telemetry.md b/docs/architecture/telemetry.md new file mode 100644 index 00000000000..0cfec741830 --- /dev/null +++ b/docs/architecture/telemetry.md @@ -0,0 +1,237 @@ +# Azure Developer CLI — Telemetry Architecture + +> How azd collects, exports, and transmits telemetry data. + +## Overview + +azd emits OpenTelemetry spans for every command execution. Telemetry flows through a local pipeline: + +1. **Instrumentation** — CLI, VS Code extension, and extensions emit OTel spans +2. **Export** — Spans are converted to Application Insights envelopes and queued to disk +3. **Upload** — A background process transmits envelopes to Application Insights + +> Microsoft-internal dashboards, data pipelines, and reporting infrastructure are documented separately for internal maintainers. + +## Data Flow + +```mermaid +flowchart TB + subgraph Instrumentation ["Instrumentation"] + CLI["azd CLI
(Go + OpenTelemetry)"] + VSC["VS Code Extension
(@microsoft/vscode-azext-utils)"] + EXT["Extensions
(structured error reporting)"] + end + + subgraph Export ["CLI Export Pipeline"] + MW["Command Middleware
cli/azd/cmd/middleware/telemetry.go"] + OTel["OTel TracerProvider"] + AIExp["App Insights Exporter
SpanToEnvelope()"] + DiskQ["Disk Queue
~/.azd/telemetry/*.trn"] + Upload["azd telemetry upload
(background / deferred)"] + end + + subgraph Ingestion ["Azure Monitor"] + AppInsights["Application Insights"] + end + + CLI --> MW --> OTel --> AIExp --> DiskQ --> Upload --> AppInsights + VSC -->|VS Code telemetry framework| AppInsights + EXT -->|structured errors via host| MW +``` + +## CLI Telemetry Pipeline (Detail) + +### 1. Initialization + +**File:** `cli/azd/internal/telemetry/telemetry.go` + +When `azd` starts, the telemetry subsystem: + +1. Checks `AZURE_DEV_COLLECT_TELEMETRY` — if set to `"no"`, telemetry is disabled entirely +2. In Cloud Shell, shows a first-run consent notice (creates `~/.azd/first-run` marker) +3. Creates a `StorageQueue` backed by the filesystem at `~/.azd/telemetry/` +4. Initializes the **App Insights Exporter** — a custom OTel `SpanExporter` that converts spans to Application Insights envelopes +5. Optionally adds: + - Stdout trace exporter (via `--trace-log-file`) + - OTLP HTTP exporter (via `--trace-log-url`) +6. Creates an OTel `TracerProvider` with the configured exporters +7. Registers the provider globally via `otel.SetTracerProvider(tp)` + +### 2. Command Execution → Span Creation + +**File:** `cli/azd/cmd/middleware/telemetry.go` + +Every azd command is wrapped by the telemetry middleware: + +``` +user runs `azd deploy` + → middleware.Run() + → tracing.Start(ctx, "cmd.deploy") // creates OTel span + → records attributes: + cmd.entry, cmd.flags, cmd.args.count, + platform.type, installed extensions (id@version) + → runs the actual command action + → on completion: + adds usage attributes (from baggage) + adds perf.interact_time + maps errors via cmd.MapError() + → span.End() +``` + +For **extension commands**, the event name switches to `ext.run` and records `ext.id`, `ext.version`. + +### 3. Span Export → Disk Queue + +**File:** `cli/azd/internal/telemetry/storage_exporter.go` + +The custom exporter: + +1. Receives completed OTel spans +2. Converts each to an Application Insights `Envelope` with `RequestData` (via `SpanToEnvelope()`) +3. Serializes as NDJSON +4. Enqueues to the disk queue (`~/.azd/telemetry/YYYYMMDDThhmmss_retry_random.trn`) +5. Retries enqueue up to 3 times on failure + +### 4. Disk Queue → Upload + +**Files:** `cli/azd/internal/telemetry/storage.go`, `uploader.go` + +The disk queue is a FIFO queue implemented as timestamped files: + +- `Peek()` picks the oldest ready item (not older than `itemFileMaxTimeKept`) +- `Cleanup()` removes stale `.tmp` files and expired items + +Upload happens via `azd telemetry upload` (triggered as a background subprocess): + +1. Acquires `upload.lock` (file lock) +2. Loops: `Peek → Transmit → Remove` +3. Retries up to `maxRetryCount=3` with backoff +4. Handles partial success and `Retry-After` headers from App Insights ingestion + +### 5. App Insights Envelope Format + +**File:** `cli/azd/internal/telemetry/appinsights-exporter/span_to_envelope.go` + +Each span becomes a `contracts.Envelope` containing `RequestData`: + +| Envelope Field | Source | +|---------------|--------| +| `IKey` | Instrumentation key from connection string | +| `Tags[ai.application.ver]` | `service.version` resource attribute | +| `Tags[ai.user.*]` | `UserAccountId`, `UserAuthUserId`, `UserId`, `SessionId` | +| `Properties` | String/bool span attributes | +| `Measurements` | Int64/float64 span attributes | +| `Name` | Span name (e.g., `cmd.deploy`) | +| `Duration` | Span duration (App Insights format) | +| `ResponseCode` | Span status / result code | +| `Success` | Span status == OK | + +Slice attributes are JSON-serialized into `Properties`. + +## VS Code Extension Telemetry + +**Files:** `ext/vscode/src/telemetry/` + +The VS Code extension uses a **separate telemetry path** from the CLI: + +```mermaid +flowchart LR + VSExt["VS Code Extension"] --> VSFw["VS Code Telemetry Framework
(@microsoft/vscode-azext-utils)"] + VSFw --> AppInsights["Application Insights"] +``` + +**Key differences from CLI:** + +| Aspect | CLI | VS Code Extension | +|--------|-----|-------------------| +| Framework | Go + OpenTelemetry | TypeScript + vscode-azext-utils | +| Export | Custom App Insights exporter + disk queue | VS Code telemetry framework (direct) | +| Opt-out | `AZURE_DEV_COLLECT_TELEMETRY=no` | `telemetry.telemetryLevel=off` in VS Code settings | +| Events | `cmd.*`, `ext.*`, `mcp.*`, etc. | `azure-dev.*` (activate, deactivate, tasks, surveys) | + +**Extension events** (`ext/vscode/src/telemetry/telemetryId.ts`): + +- Lifecycle: `azure-dev.activate`, `azure-dev.deactivate` +- CLI command tasks: `deploy`, `provision`, `up`, `down`, `init`, `login`, `restore`, `package` +- Environment/extension actions +- Survey tracking: `azure-dev.survey-check`, `azure-dev.survey-prompt-response` +- Activity statistics: tracks `totalActiveDays` via VS Code Memento storage + +## Extension Framework Telemetry + +**File:** `cli/azd/cmd/middleware/telemetry.go` (host side) + +Extensions run as separate processes and report back to the azd host: + +```mermaid +flowchart LR + Ext["Extension Process"] -->|structured error| Host["azd Host"] + Host -->|maps to span attributes| MW["Telemetry Middleware"] + MW --> Span["OTel Span
(event: ext.run)"] +``` + +- Extension commands emit `ext.run` events with `ext.id` and `ext.version` +- Extensions can report **structured errors** back to the host via `ExtensionService.ReportError` +- Error result codes follow conventions: + - Service errors: `ext.service..` + - Validation: `ext.validation.*` + - Auth: `ext.auth.*` + - Dependency: `ext.dependency.*` +- Extension lifecycle events: `ext.install`, `ext.upgrade`, `ext.promote` + +## Consent & Privacy + +### Opt-Out + +| Surface | Mechanism | +|---------|-----------| +| CLI | Set `AZURE_DEV_COLLECT_TELEMETRY=no` | +| VS Code | Set `telemetry.telemetryLevel` to `off` in VS Code settings | +| Cloud Shell | First-run notice shown; opt-out instructions provided | + +### PII Protection + +- **Hashed fields:** `project.template.id`, `project.template.version`, `project.name`, `env.name` are SHA-256 hashed (case-insensitive) before emission +- **Data classifications** are annotated on every field: + - `PublicPersonalData` + - `SystemMetadata` + - `CallstackOrException` + - `CustomerContent` + - `EndUserPseudonymizedInformation` + - `OrganizationalIdentifiableInformation` +- **Privacy review required** for: new telemetry fields, classification changes, any unhashed PII + +## Key Files + +``` +cli/azd/ +├── cmd/middleware/telemetry.go # Command-level span middleware +├── internal/ +│ ├── telemetry/ +│ │ ├── telemetry.go # Pipeline init, env vars, consent +│ │ ├── storage.go # Disk queue (FIFO) +│ │ ├── storage_exporter.go # OTel exporter → disk queue +│ │ ├── uploader.go # Queue → App Insights upload +│ │ ├── notice.go # First-run consent notice +│ │ └── appinsights-exporter/ +│ │ ├── span_to_envelope.go # Span → App Insights envelope +│ │ ├── transmitter.go # HTTP POST to ingestion +│ │ ├── endpoint_config.go # Connection string parsing +│ │ └── transmit_payload.go # NDJSON serialization +│ └── tracing/ +│ ├── tracing.go # Global tracer +│ ├── attributes.go # Global/usage baggage +│ ├── events/events.go # All event name constants +│ └── fields/ +│ ├── fields.go # All field keys + classifications +│ └── key.go # SHA-256 hashing helpers +ext/vscode/src/telemetry/ +├── telemetryId.ts # Extension event IDs +└── activityStatisticsService.ts # Active days tracking +``` + +## See Also + +- [Feature Telemetry Guide](../guides/feature-telemetry.md) — How to add telemetry for new features +- [Telemetry Data Reference](../reference/telemetry-data.md) — Schema, events, fields, query patterns +- [Telemetry Overview](../guides/telemetry-overview.md) — For product managers and leadership diff --git a/docs/guides/feature-telemetry.md b/docs/guides/feature-telemetry.md new file mode 100644 index 00000000000..9c46d876c13 --- /dev/null +++ b/docs/guides/feature-telemetry.md @@ -0,0 +1,204 @@ +# Feature Telemetry Guide — Adding Telemetry to New Features + +> End-to-end guide for instrumenting telemetry when building new azd features. +> Ensures telemetry is designed alongside the feature, not bolted on after. + + +## Why This Matters + +Telemetry is not a separate system — it's part of the feature. When you ship a feature without telemetry: +- Product can't measure adoption or success +- Engineering can't diagnose failures in production +- Dashboards show gaps that require scrambling to fill later + +This guide covers the instrumentation side — defining events, fields, and wiring them into your code. + +> Microsoft-internal data pipelines (GDPR classification, Kusto functions, Power BI reports) are documented separately for internal maintainers. + +## Step 1: Define Your Events + +**File:** `cli/azd/internal/tracing/events/events.go` + +Add a constant for your feature's event name(s). Follow the naming conventions: + +| Pattern | When to Use | Example | +|---------|------------|---------| +| `cmd.` | Automatically generated for commands | `cmd.deploy` (via `GetCommandEventName`) | +| `.` | Non-command operations | `container.publish`, `hooks.exec` | +| `.` | Scoped events | `arm.deploy.subscription` | + +```go +// In events.go — add your event constant +const ( + // MyFeatureEvent tracks the execution of the my-feature operation. + MyFeatureEvent = "myfeature.execute" +) +``` + +> **Note:** Command events (`cmd.*`) are created automatically by the telemetry middleware via +> `events.GetCommandEventName(...)`. You only need to define explicit event constants for +> non-command operations (sub-spans, background work, etc.). + +## Step 2: Define Your Fields + +**File:** `cli/azd/internal/tracing/fields/fields.go` + +Add `AttributeKey` variables for any new properties your feature emits. Every field must have: + +1. **A key name** — descriptive, dot-separated, lowercase +2. **A classification** — what kind of data is this (see [Data Classifications](#data-classifications)) +3. **A purpose** — why are we collecting it (see [Purposes](#purposes)) +4. **`IsMeasurement: true`** if the value is numeric (goes to `Measurements` column, not `Properties`) + +```go +// In fields.go — add your field keys +var ( + // The strategy used by my feature. + MyFeatureStrategyKey = AttributeKey{ + Key: attribute.Key("myfeature.strategy"), + Classification: SystemMetadata, + Purpose: FeatureInsight, + } + + // The number of items processed. + MyFeatureItemCountKey = AttributeKey{ + Key: attribute.Key("myfeature.item.count"), + Classification: SystemMetadata, + Purpose: PerformanceAndHealth, + IsMeasurement: true, + } +) +``` + +### Data Classifications + +| Classification | Use When | +|----------------|----------| +| `SystemMetadata` | Non-personal system/config data (most common) | +| `EndUserPseudonymizedInformation` | User identifiers that are hashed (e.g., machine ID) | +| `OrganizationalIdentifiableInformation` | Org identifiers (subscription ID, tenant ID) | +| `PublicPersonalData` | Data the user made public | +| `CallstackOrException` | Stack traces or error details | +| `CustomerContent` | User-created content — highest sensitivity, avoid if possible | + +### Purposes + +| Purpose | Use When | +|---------|----------| +| `FeatureInsight` | Understanding feature adoption and usage patterns | +| `BusinessInsight` | Business metrics (users, orgs, growth) | +| `PerformanceAndHealth` | Performance, errors, reliability | + +### Hashing Sensitive Values + +If your field contains user-generated names or identifiers, **hash it**: + +```go +// In your code, use StringHashed instead of regular attribute setting +tracing.SetUsageAttributes( + fields.StringHashed(fields.MyFeatureNameKey, userProvidedName), +) +``` + +## Step 3: Instrument Your Code + +### For Command Actions + +The telemetry middleware (`cmd/middleware/telemetry.go`) automatically creates a span for every command. You just need to add your feature-specific attributes: + +```go +func (a *myAction) Run(ctx context.Context) (*actions.ActionResult, error) { + // Set usage attributes — these get attached to the command span + tracing.SetUsageAttributes( + fields.MyFeatureStrategyKey.String("incremental"), + fields.MyFeatureItemCountKey.Int(len(items)), + ) + + // ... do work ... + + return &actions.ActionResult{...}, nil +} +``` + +### For Sub-Operations (Child Spans) + +If your feature has distinct sub-operations worth tracking separately: + +```go +func (s *myService) ProcessItems(ctx context.Context, items []Item) error { + ctx, span := tracing.Start(ctx, events.MyFeatureEvent) + defer span.End() + + // Set attributes on this span + span.SetAttributes( + fields.MyFeatureItemCountKey.Int(len(items)), + ) + + // ... do work ... + + if err != nil { + span.SetStatus(codes.Error, err.Error()) + return err + } + + return nil +} +``` + +### For Extension Commands + +Extension commands automatically get `ext.run` events. To add structured error reporting: + +```go +// Extensions report structured errors back to the host +return &azdext.ServiceError{ + Message: fmt.Sprintf("deployment failed: %s", resp.Error.Message), + ErrorCode: fmt.Sprintf("deploy.%s", resp.Error.Code), + StatusCode: resp.StatusCode, + ServiceName: "arm", +} +``` + +This maps to result codes like `ext.service.arm.500` in telemetry. See `cli/azd/pkg/azdext/extension_error.go` for the full structured error types. + +## Step 4: Update the Telemetry Schema Doc + +**File:** `docs/specs/metrics-audit/telemetry-schema.md` + +Add your new events and fields to the canonical schema reference. This document is the source of truth for what telemetry azd collects and is reviewed during privacy audits. + +## Step 5: Privacy Review + +Every new field must have correct `Classification` and `Purpose` values: + +- If your field has sensitivity higher than `SystemMetadata`, consult the [Privacy Review Checklist](../specs/metrics-audit/privacy-review-checklist.md) +- If you're adding `CustomerContent` or unhashed PII, a formal privacy review is required before merge + +## Step 6: Product Review + +Before merging your feature PR: + +1. **Share the telemetry spec** with your PM — explain what events/fields you're adding and why +2. **Show sample queries** — demonstrate how the data answers product questions +3. **Update the data reference** — add your feature's fields to `docs/reference/telemetry-data.md` + +This ensures product can provide feedback during development, not scramble after launch. + +## Quick Reference: Where Things Live + +| What | File/Path | +|------|-----------| +| Event name constants | `cli/azd/internal/tracing/events/events.go` | +| Field key definitions | `cli/azd/internal/tracing/fields/fields.go` | +| Hashing helpers | `cli/azd/internal/tracing/fields/key.go` | +| Telemetry middleware | `cli/azd/cmd/middleware/telemetry.go` | +| Telemetry pipeline init | `cli/azd/internal/telemetry/telemetry.go` | +| Error classification | `cli/azd/internal/cmd/errors.go` (MapError) | +| Canonical schema | `docs/specs/metrics-audit/telemetry-schema.md` | +| Privacy review checklist | `docs/specs/metrics-audit/privacy-review-checklist.md` | + +## See Also + +- [Architecture](../architecture/telemetry.md) — How the telemetry system works end-to-end +- [Data Reference](../reference/telemetry-data.md) — Complete schema, events, fields +- [Privacy Review Checklist](../specs/metrics-audit/privacy-review-checklist.md) — When to do privacy reviews diff --git a/docs/guides/telemetry-overview.md b/docs/guides/telemetry-overview.md new file mode 100644 index 00000000000..c9dbd7c12d1 --- /dev/null +++ b/docs/guides/telemetry-overview.md @@ -0,0 +1,81 @@ +# azd Telemetry — Product Overview + +> What azd telemetry tells us, where to find it, and how to work with it. + + +## What azd Telemetry Captures + +Azure Developer CLI (azd) collects anonymous usage telemetry to understand how developers use the tool, measure feature adoption, and diagnose issues at scale. Users can opt out at any time. + +### What We Collect + +| Category | Examples | +|----------|---------| +| **Commands** | Which commands are run (`init`, `deploy`, `provision`, `up`), success/failure, duration | +| **Features** | Feature-specific properties (template used, packaging format, auth method, target Azure services) | +| **Errors** | Error codes and categories (ARM errors, auth failures, build failures), **not** user content | +| **Environment** | OS, azd version, execution environment (GitHub Actions, Azure Pipelines, VS Code, etc.) | +| **Extensions** | Which extensions are installed and invoked, extension errors | +| **Performance** | Operation duration, time spent waiting for user input vs. executing | + +### What We Don't Collect + +- No source code or project file contents +- No Azure credentials, tokens, or connection strings +- No personal names, emails, or IP addresses +- Project names and template names are **hashed** (one-way) — we can count unique projects but can't see what they're called +- Users opt out via `AZURE_DEV_COLLECT_TELEMETRY=no` + +## Key Metrics + +| Metric | What It Measures | +|--------|-----------------| +| **MAU** (Monthly Active Users) | Unique users per month (by hashed machine ID) | +| **MEU** (Monthly Engaged Users) | Users who run engagement commands (provision, deploy, up) | +| **MDU** (Monthly Dev Users) | Users in local dev environments (not CI/CD) | +| **Success Rate** | % of command executions that succeed | +| **Error Rate by Category** | Top error categories (ARM, auth, build, network) | +| **Template Adoption** | Which templates are used, by how many users | +| **Funnel Completion** | % of users completing init → provision → deploy | +| **Retention** | Users returning week-over-week / month-over-month | +| **New Users** | First-time users per time period | +| **Provision/Deploy Duration** | P50/P90 operation time by template | + +> Microsoft-internal dashboards and reporting tools are documented separately for internal maintainers. + +## How to Request New Telemetry + +### For a New Feature + +Work with the feature engineer during development: + +1. **During design** — Discuss what questions you want to answer about the feature +2. **During implementation** — Engineer instruments telemetry following the [Feature Telemetry Guide](feature-telemetry.md) +3. **During PR review** — Review the telemetry fields to ensure they answer your product questions +4. **After launch** — Verify data is flowing and dashboards are updated + +### For Additional Metrics on an Existing Feature + +1. File an issue describing: + - What question you want to answer + - What data you think is needed + - Which feature/commands it relates to +2. Engineering evaluates whether: + - The data already exists and just needs a query/dashboard + - New instrumentation is required (code change) + +## Privacy and Compliance + +- Telemetry contains no direct PII — sensitive identifiers are one-way hashed or classified +- Identifiers (machine ID, project name, etc.) are one-way hashed +- Users can opt out at any time +- All telemetry fields are classified for privacy compliance +- Data retention follows Microsoft standard telemetry retention policies + +## Further Reading + +| Document | What It Covers | Audience | +|----------|---------------|----------| +| [Architecture](../architecture/telemetry.md) | End-to-end system architecture with diagrams | Engineering | +| [Data Reference](../reference/telemetry-data.md) | Complete schema, all events and fields | Engineering, Product | +| [Feature Telemetry Guide](feature-telemetry.md) | How to add telemetry to new features | Engineering | diff --git a/docs/reference/telemetry-data.md b/docs/reference/telemetry-data.md new file mode 100644 index 00000000000..a8d39bde223 --- /dev/null +++ b/docs/reference/telemetry-data.md @@ -0,0 +1,562 @@ +# Telemetry Data Reference — Understanding azd Telemetry + +> Schema reference for all azd telemetry events, fields, and data shapes. +> Use this to understand what data azd emits and how to work with it. + + +## Data Shape + +All azd telemetry is emitted as Application Insights `RequestData` envelopes. Each command execution produces one top-level span, with optional child spans for sub-operations. + +### Core Columns + +| Column | Type | Description | +|--------|------|-------------| +| `TimeGenerated` | datetime | When the event was recorded | +| `Name` | string | Event/span name (e.g., `cmd.deploy`, `ext.run`) | +| `DurationMs` | real | Total span duration in milliseconds | +| `Success` | bool | Whether the operation succeeded | +| `ResultCode` | string | Error classification code (e.g., `Success`, `service.arm.500`, `internal.unclassified`) | +| `OperationId` | string | Unique ID for the top-level command invocation | +| `Properties` | dynamic | String/bool span attributes (JSON bag) | +| `Measurements` | dynamic | Numeric span attributes (JSON bag) | +| `AppVersion` | string | azd CLI version | + +### Accessing Properties and Measurements + +```kql +// String properties +| extend TemplateId = tostring(Properties['project.template.id']) + +// Numeric measurements +| extend InteractTimeMs = toreal(Measurements['perf.interact_time']) + +// Computed execution time (excludes user interaction) +| extend ExecutionTimeMs = DurationMs - toreal(Measurements['perf.interact_time']) +``` + +## Events Reference + +Events are defined in `cli/azd/internal/tracing/events/events.go`. Each event becomes a span `Name`. + +### Core Command Events (`cmd.*`) + +Commands follow the pattern `cmd.` where spaces become dots. + +| Event Pattern | Example | Description | +|--------------|---------|-------------| +| `cmd.` | `cmd.init`, `cmd.up`, `cmd.deploy` | Top-level command execution | +| `cmd..` | `cmd.auth.login`, `cmd.env.new` | Subcommand execution | +| `cmd...` | `cmd.pipeline.config` | Deeper subcommands | + +**Common command events:** +- `cmd.init` — project initialization +- `cmd.up` — full provision + deploy cycle +- `cmd.provision` — infrastructure provisioning +- `cmd.deploy` — application deployment +- `cmd.package` — application packaging +- `cmd.down` — resource teardown +- `cmd.auth.login` — authentication +- `cmd.env.new` / `cmd.env.select` — environment management +- `cmd.pipeline.config` — CI/CD pipeline setup +- `cmd.monitor` — monitoring +- `cmd.restore` — dependency restoration + +### Extension Events (`ext.*`) + +| Event | Description | +|-------|-------------| +| `ext.run` | Extension command execution | +| `ext.install` | Extension installation | +| `ext.upgrade` | Extension upgrade attempt | +| `ext.promote` | Registry promotion (e.g., dev → main) | + +### Agent & Copilot Events + +| Event | Description | +|-------|-------------| +| `agent.troubleshoot` | Agent troubleshooting session | +| `copilot.initialize` | Copilot agent initialization | +| `copilot.session` | Copilot session creation/resumption | + +### MCP Events (`mcp.*`) + +| Event Pattern | Description | +|--------------|-------------| +| `mcp.` | MCP tool invocation | + +### Infrastructure Events (`arm.*`) + +| Event | Description | +|-------|-------------| +| `arm.deploy.subscription` | ARM deployment at subscription scope | +| `arm.deploy.resourcegroup` | ARM deployment at resource group scope | +| `arm.stack.deploy.subscription` | ARM deployment stack at subscription scope | +| `arm.stack.deploy.resourcegroup` | ARM deployment stack at resource group scope | +| `arm.whatif.subscription` | ARM what-if at subscription scope | +| `arm.whatif.resourcegroup` | ARM what-if at resource group scope | +| `arm.validate.subscription` | ARM validation at subscription scope | +| `arm.validate.resourcegroup` | ARM validation at resource group scope | + +### Other Events + +| Event | Description | +|-------|-------------| +| `tools.pack.build` | Cloud Native Buildpacks build | +| `validation.preflight` | Local preflight validation | +| `hooks.exec` | Lifecycle hook execution | +| `aks.postprovision.skip` | AKS postprovision hook skipped | +| `deploy.appservice.zip` | App Service zip deployment | +| `container.credentials` | Container registry credential retrieval | +| `container.publish` | Container image publish | +| `container.remotebuild` | Remote container build | +| `exegraph.run` | Execution graph run (parallel operations) | +| `exegraph.step` | Single step within execution graph | + +### VS Code Extension Events (`azure-dev.*`) + +These are emitted by the VS Code extension via the VS Code telemetry framework (separate from CLI telemetry). + +| Event | Description | +|-------|-------------| +| `azure-dev.activate` | Extension activated | +| `azure-dev.deactivate` | Extension deactivated | +| `azure-dev.tasks.dotenv` | Dotenv task executed | +| `azure-dev.commands.` | CLI command tasks (deploy, provision, up, down, init, login, restore, package) | +| `azure-dev.survey-check` | Survey eligibility check | +| `azure-dev.survey-prompt-response` | Survey prompt user response | + +### VS RPC Events (`vsrpc.*`) + +JSON-RPC events for VS Code ↔ azd communication. Follow the pattern `vsrpc.`. + +## Fields Reference + +Fields appear as `Properties` (strings/bools) or `Measurements` (numbers). + +### Application-Level Fields (Every Event) + +These are set once at process startup and attached to **every** span. + +| Field Key | Type | Description | Example Values | +|-----------|------|-------------|----------------| +| `service.name` | string | Always `"azd"` | `azd` | +| `service.version` | string | CLI version | `1.23.5` | +| `os.type` | string | Operating system | `linux`, `windows`, `darwin` | +| `os.version` | string | OS version | `10.0.22621`, `14.5` | +| `host.arch` | string | CPU architecture | `amd64`, `arm64` | +| `process.runtime.version` | string | Go runtime version | `go1.26.0` | +| `machine.id` | string | MAC address hash (pseudonymized) | SHA-256 hash | +| `machine.devdeviceid` | string | SQM device ID | UUID string | +| `execution.environment` | string | Where azd is running | See [Execution Environments](#execution-environments) | +| `service.installer` | string | How azd was installed | `msi`, `brew`, `choco`, `rpm`, `deb` | + +### Identity & Account Fields + +| Field Key | Type | Description | +|-----------|------|-------------| +| `user_AuthenticatedId` | string | Entra ID Object ID | +| `ad.tenant.id` | string | Entra ID Tenant ID | +| `ad.account.type` | string | `User` or `Service Principal` | +| `ad.subscription.id` | string | Azure Subscription ID | + +### Project Context Fields + +| Field Key | Type | Hashed? | Description | +|-----------|------|---------|-------------| +| `project.template.id` | string | ✅ SHA-256 | Template identifier from `azure.yaml` metadata | +| `project.template.version` | string | ✅ SHA-256 | Template version | +| `project.name` | string | ✅ SHA-256 | Project name | +| `project.service.hosts` | string[] | ❌ | Host types — see [Service Targets](#service-targets) | +| `project.service.targets` | string[] | ❌ | Resolved deployment targets — see [Service Targets](#service-targets) | +| `project.service.languages` | string[] | ❌ | Languages across all services — see [Service Languages](#service-languages) | +| `project.service.language` | string | ❌ | Language of specific service being executed — see [Service Languages](#service-languages) | +| `platform.type` | string | ❌ | Platform integration (e.g., `aca`, `aks`) | + +#### Service Targets + +Valid values for `project.service.hosts` and `project.service.targets`: + +| Value | Description | +|-------|-------------| +| `appservice` | Azure App Service | +| `containerapp` | Azure Container Apps | +| `containerapp-dotnet` | Azure Container Apps (.NET Aspire) | +| `function` | Azure Functions | +| `staticwebapp` | Azure Static Web Apps | +| `springapp` | Azure Spring Apps | +| `aks` | Azure Kubernetes Service | +| `ai.endpoint` | Azure AI endpoint | + +#### Service Languages + +Valid values for `project.service.languages` and `project.service.language`: + +| Value | Description | +|-------|-------------| +| `dotnet` | .NET | +| `csharp` | C# | +| `fsharp` | F# | +| `python` | Python | +| `js` | JavaScript | +| `ts` | TypeScript | +| `java` | Java | +| `docker` | Docker (containerized) | +| `swa` | Static Web App | +| `custom` | Custom framework | + +#### Other Project Fields + +| Field Key | Type | Hashed? | Description | +|-----------|------|---------|-------------| +| `env.name` | string | ✅ SHA-256 | Environment name | + +> **Joining with template names:** Template IDs are hashed. To resolve to human-readable names, +> join with a template lookup table using the hashed `project.template.id`. + +### Command Entry-Point Fields + +| Field Key | Type | Description | +|-----------|------|-------------| +| `cmd.flags` | string[] | Flag names that were set (values not recorded) | +| `cmd.args.count` | measurement | Number of positional arguments | +| `cmd.entry` | string | How the command was invoked (formatted as event name) | + +### Error Fields + +| Field Key | Type | Description | +|-----------|------|-------------| +| `error.category` | string | High-level error category | +| `error.code` | string | Specific error code | +| `error.type` | string | Same as `ResultCode` — the classified error type | +| `error.chain.types` | string[] | Full Go error type chain, outermost first | + +#### Error Classification (ResultCode Taxonomy) + +The `ResultCode` field classifies errors into categories. Understanding this taxonomy is essential for querying failures. + +| Pattern | Category | Example | +|---------|----------|---------| +| `Success` | No error | — | +| `user.canceled` | User cancelled the operation | — | +| `service.arm.` | ARM service error | `service.arm.500`, `service.arm.409` | +| `service.aad.` | Entra ID (AAD) error | `service.aad.failed` | +| `service..` | Other Azure service error | `service.graph.403` | +| `tool..` | External tool error | `tool.docker.1` | +| `ext.service..` | Extension service error | `ext.service.arm.500` | +| `ext.validation.*` | Extension validation error | `ext.validation.config` | +| `ext.auth.*` | Extension auth error | `ext.auth.expired` | +| `ext.dependency.*` | Extension dependency error | `ext.dependency.missing` | +| `internal.unclassified` | Catch-all for unclassified errors | — | +| `internal.errors_errorString` | Legacy catch-all (being replaced by `internal.unclassified`) | — | + +> **⚠️ Known gap:** Many errors historically fall into `internal.errors_errorString` / `internal.unclassified` +> because the error classifier only inspects the leaf error type. The `error.chain.types` field improves this +> by capturing the full error type chain. + +### Service Attributes (Azure API Calls) + +| Field Key | Type | Description | +|-----------|------|-------------| +| `service.host` | string | Azure service host | +| `service.name` | string | Azure service name (on service call spans) | +| `service.statusCode` | measurement | HTTP status code | +| `service.method` | string | HTTP method | +| `service.errorCode` | measurement | Service-specific error code | +| `service.correlationId` | string | Azure correlation ID | + +### Performance Fields + +| Field Key | Type | Description | +|-----------|------|-------------| +| `perf.interact_time` | measurement | Time (ms) spent waiting for user input | + +> **Computing execution time:** `ExecutionTimeMs = DurationMs - Measurements['perf.interact_time']` +> This gives you the actual processing time, excluding user interaction (prompts, confirmations). + +### Feature-Specific Fields + +
+Authentication + +| Field Key | Type | Values | +|-----------|------|--------| +| `auth.method` | string | `browser`, `device-code`, `service-principal-secret`, `service-principal-certificate`, `federated-github`, `federated-azure-pipelines`, `federated-oidc`, `managed-identity`, `external`, `oneauth`, `check-status` | +
+ +
+Init / App Init + +| Field Key | Type | Description | +|-----------|------|-------------| +| `init.method` | string | `template`, `app`, `project`, `environment`, `copilot` | +| `appinit.detected.databases` | string[] | Databases detected during init | +| `appinit.detected.services` | string[] | Services detected during init | +| `appinit.confirmed.databases` | string[] | Databases confirmed by user | +| `appinit.confirmed.services` | string[] | Services confirmed by user | +| `appinit.modify_add.count` | measurement | Services added during modification | +| `appinit.modify_remove.count` | measurement | Services removed during modification | +| `appinit.lastStep` | string | Last init step reached | +
+ +
+Hooks + +| Field Key | Type | Description | +|-----------|------|-------------| +| `hooks.name` | string | Hook name (e.g., `preprovision`, `postdeploy`). Custom hooks are SHA-256 hashed. | +| `hooks.type` | string | Scope: `project`, `service`, or `layer` | +| `hooks.kind` | string | Executor: `sh`, `pwsh`, `python`, `js`, `ts`, `dotnet` | +
+ +
+Pipeline Config + +| Field Key | Type | Description | +|-----------|------|-------------| +| `pipeline.provider` | string | `github`, `azdo`, `auto` | +| `pipeline.auth` | string | `federated`, `client-credentials`, `auto` | +
+ +
+Infrastructure + +| Field Key | Type | Description | +|-----------|------|-------------| +| `infra.provider` | string | `bicep`, `terraform`, `auto` | +
+ +
+Deployment + +| Field Key | Type | Description | +|-----------|------|-------------| +| `deploy.appservice.attempt` | measurement | Retry attempt number for App Service zip deploy | +| `deploy.appservice.linux` | string | Whether deploying to Linux App Service | +
+ +
+Preflight Validation + +| Field Key | Type | Description | +|-----------|------|-------------| +| `validation.preflight.outcome` | string | `passed`, `warnings_accepted`, `aborted_by_errors`, `aborted_by_user`, `skipped`, `error` | +| `validation.preflight.diagnostics` | string[] | Diagnostic IDs emitted | +| `validation.preflight.rules` | string[] | Rule IDs executed | +| `validation.preflight.warning.count` | measurement | Number of warnings | +| `validation.preflight.error.count` | measurement | Number of errors | +
+ +
+Provision Cancellation + +| Field Key | Type | Description | +|-----------|------|-------------| +| `provision.cancellation` | string | `none`, `leave_running`, `canceled`, `cancel_timed_out`, `cancel_timed_out_nested`, `cancel_raced_succeeded`, `cancel_raced_failed`, `cancel_raced_deleted`, `cancel_too_late`, `cancel_failed` | +
+ +
+Copilot + +| Field Key | Type | Description | +|-----------|------|-------------| +| `copilot.session.id` | string | Session identifier | +| `copilot.session.isNew` | string | Whether this is a new session | +| `copilot.session.messageCount` | measurement | Messages in session | +| `copilot.init.isFirstRun` | string | First copilot run | +| `copilot.init.reasoningEffort` | string | Reasoning effort level | +| `copilot.init.model` | string | Model used | +| `copilot.init.consentScope` | string | Consent scope | +| `copilot.mode` | string | Copilot mode | +| `copilot.message.model` | string | Model for specific message | +| `copilot.message.inputTokens` | measurement | Input token count | +| `copilot.message.outputTokens` | measurement | Output token count | +| `copilot.message.billingRate` | measurement | Billing rate | +| `copilot.message.premiumRequests` | measurement | Premium request count | +| `copilot.message.durationMs` | measurement | Message duration | +| `copilot.consent.approvedCount` | measurement | Approved consent actions | +| `copilot.consent.deniedCount` | measurement | Denied consent actions | +
+ +
+Extensions + +| Field Key | Type | Description | +|-----------|------|-------------| +| `extension.id` | string | Extension identifier | +| `extension.version` | string | Extension version | +| `extension.installed` | string[] | List of installed extensions (`id@version`) | +
+ +
+MCP + +| Field Key | Type | Description | +|-----------|------|-------------| +| `mcp.client.name` | string | MCP client name | +| `mcp.client.version` | string | MCP client version | +
+ +
+Execution Graph + +| Field Key | Type | Description | +|-----------|------|-------------| +| `exegraph.step.count` | measurement | Total steps in graph | +| `exegraph.max_concurrency` | string | Effective concurrency limit | +| `exegraph.error_policy` | string | `fail_fast` or `continue_on_error` | +| `exegraph.step.name` | string | Step name | +| `exegraph.step.deps` | string[] | Step dependencies | +| `exegraph.step.tags` | string[] | Step tags | +
+ +
+Pack (Buildpacks) + +| Field Key | Type | Description | +|-----------|------|-------------| +| `pack.builder.image` | string | Builder image name | +| `pack.builder.tag` | string | Builder image tag | +
+ +
+Update + +| Field Key | Type | Description | +|-----------|------|-------------| +| `update.channel` | string | Update channel | +| `update.installMethod` | string | Installation method | +| `update.fromVersion` | string | Version before update | +| `update.toVersion` | string | Version after update | +| `update.result` | string | Update outcome | +
+ +
+JSON-RPC + +| Field Key | Type | Description | +|-----------|------|-------------| +| `rpc.method` | string | RPC method name | +| `rpc.jsonrpc.request_id` | string | Request ID | +| `rpc.jsonrpc.error_code` | measurement | Error code | +
+ +
+Agent + +| Field Key | Type | Description | +|-----------|------|-------------| +| `agent.fix.attempts` | string | Number of fix attempts | +
+ +### Execution Environments + +The `execution.environment` field identifies where azd is running. Format: `[;;...]` + +| Value | Description | +|-------|-------------| +| `Desktop` | Direct terminal usage | +| `Visual Studio` | VS integration | +| `Visual Studio Code` | VS Code integration | +| `VS Code Azure GitHub Copilot` | Azure Copilot in VS Code | +| `Azure CloudShell` | Azure Cloud Shell | +| `Claude Code` | Claude Code AI agent | +| `GitHub Copilot CLI` | GitHub Copilot CLI | +| `Gemini` | Gemini AI agent | +| `OpenCode` | OpenCode AI agent | +| `GitHub Actions` | GitHub Actions CI | +| `Azure Pipelines` | Azure Pipelines CI | +| `GitHub Codespaces` | GitHub Codespaces | +| Other CI systems | `AppVeyor`, `Bamboo`, `BitBucket Pipelines`, `Travis CI`, `Circle CI`, `GitLab CI`, `Jenkins`, `AWS CodeBuild`, `Google Cloud Build`, `TeamCity`, `JetBrains Space` | + +**Modifier:** `Azure App Spaces Portal` may be appended as a modifier (`;` separated). + +## Data Nuances & Gotchas + +Important things to know when working with azd telemetry data. These are sourced from real investigations and issues. + +### OperationId Reuse in Retry/Troubleshoot Flows + +When `cmd.up` triggers `agent.troubleshoot` after a failure, the troubleshoot agent may retry the failed operation (e.g., `cmd.deploy`). These retries share the **same OperationId** as the parent `cmd.up` span. + +This means you may see multiple rows with the same `OperationId` and `Name` (e.g., two `cmd.deploy` rows). These are **not duplicate events** — they are retry attempts within a single user session. + +**Example pattern:** +``` +OperationId: 28ce1f2898a4fec84522107e36c22038 + cmd.up (511s, FAIL) + ├── cmd.package ✅ + ├── cmd.provision ✅ + ├── cmd.deploy ❌ (service.arm.500) ← attempt 1 + ├── agent.troubleshoot ✅ (471s) + │ ├── cmd.mcp.start + │ ├── cmd.package ✅ → cmd.provision ✅ ← retry + ├── cmd.deploy ❌ (service.aad.failed) ← attempt 2 + └── cmd.deploy ❌ (service.aad.failed) ← attempt 3 +``` + +**Impact on queries:** +```kql +// ❌ WRONG — counts retries as separate invocations +| where Name == 'cmd.deploy' | summarize count() + +// ✅ CORRECT — count distinct OperationIds to get unique invocations +| where Name == 'cmd.deploy' | summarize dcount(OperationId) + +// ✅ Or be explicit about only first attempts +| where Name == 'cmd.deploy' +| summarize arg_min(TimeGenerated, *) by OperationId +``` + +### The `internal.unclassified` / `internal.errors_errorString` Catch-All + +Many failed commands produce the catch-all result code `internal.errors_errorString` (being renamed to `internal.unclassified`). This happens because the error classifier inspects only the leaf error type, and `errors.New()` / `fmt.Errorf()` without `%w` produce `*errors.errorString`, which has no domain meaning. + +**To investigate these errors:** +1. Check `error.chain.types` (if available) for the full error type chain +2. Correlate with `service.errorCode` or `service.statusCode` for Azure API failures +3. Look at surrounding span context (same `OperationId`) for additional detail + +### Hashed Fields and Template Joins + +Fields like `project.template.id`, `project.name`, `env.name` are **SHA-256 hashed** before emission to protect privacy. You cannot reverse them. + +To resolve template IDs to human-readable names, join with a template lookup table using the hashed ID. + +### Execution Time vs Duration + +`DurationMs` includes time the user spent at prompts (confirmations, selections). To compute actual execution time: +```kql +| extend ExecutionTimeMs = DurationMs - toreal(Measurements['perf.interact_time']) +``` + +## Feature → Telemetry Mapping + +How to find telemetry for a given feature area. Start here if you know the feature and want to know what to query. + +| Feature Area | Key Events | Key Fields / Filters | What You Can Measure | +|-------------|------------|---------------------|---------------------| +| **Core Workflows (init/up/deploy/provision/down)** | `cmd.init`, `cmd.up`, `cmd.deploy`, `cmd.provision`, `cmd.down` | `cmd.entry`, `cmd.flags` | Adoption, success rate, duration, error patterns | +| **Deployment Targets** | `cmd.deploy`, `cmd.package` | `project.service.targets` (`appservice`, `containerapp`, `aks`, etc.) | Usage by target, success rate per target | +| **Container Apps (.NET / Aspire)** | `cmd.deploy`, `cmd.provision` | `project.service.targets` = `containerapp-dotnet`, `platform.type` = `aca` | Aspire-specific adoption and success | +| **Language Support** | `cmd.deploy`, `cmd.package`, `cmd.restore` | `project.service.languages`, `project.service.language` | Usage by language | +| **Templates** | `cmd.init`, `cmd.up` | `project.template.id` (hashed — join with template lookup to resolve) | Template adoption, success by template | +| **Provisioning (IaC)** | `cmd.provision`, `arm.deploy.*`, `arm.validate.*` | `infra.provider` (`bicep`, `terraform`) | Provision success, ARM errors, duration | +| **Authentication** | `cmd.auth.login` | `auth.method` | Auth method usage, failure rates | +| **CI/CD Pipelines** | `cmd.pipeline.config` | `pipeline.provider` | Pipeline setup adoption | +| **Extensions** | `ext.run`, `ext.install`, `ext.upgrade` | `extension.id`, `extension.version`, `extension.installed` | Extension adoption, errors | +| **MCP** | `mcp.` | `mcp.client.name`, `mcp.client.version` | Tool usage by client | +| **Agentic (Copilot)** | `copilot.initialize`, `copilot.session` | `copilot.mode`, `copilot.init.model`, `copilot.message.*` | Session counts, token usage | +| **Agent Troubleshooting** | `agent.troubleshoot` | `agent.fix.attempts` | Auto-fix adoption, retry counts | +| **VS Code Extension** | `azure-dev.*` | `azure-dev.commands.` | VS Code usage, command usage | +| **Execution Environment** | All events | `execution.environment` | Usage by environment, CI vs local | +| **Self-Update** | `cmd.update` | `update.installMethod`, `update.fromVersion` | Update adoption | +| **Hooks** | `hooks.exec` | `hooks.name`, `hooks.type`, `hooks.kind` | Hook usage by type | +| **Container Build** | `container.publish`, `container.remotebuild`, `tools.pack.build` | `pack.builder.image` | Build method usage, success rates | + +## See Also + +- [Architecture](../architecture/telemetry.md) — End-to-end telemetry flow +- [Feature Telemetry Guide](../guides/feature-telemetry.md) — How to add telemetry for new features +- [Telemetry Schema (canonical)](../../specs/metrics-audit/telemetry-schema.md) — Source-of-truth schema in the codebase +- [Privacy Review Checklist](../../specs/metrics-audit/privacy-review-checklist.md) — When and how to do privacy reviews