Skip to content

Add ELM tasks to RequiredForTelemetry allowlist#5543

Open
aasim wants to merge 2 commits into
microsoft:masterfrom
aasim:users/aamallad/ms-owned-task-elm
Open

Add ELM tasks to RequiredForTelemetry allowlist#5543
aasim wants to merge 2 commits into
microsoft:masterfrom
aasim:users/aamallad/ms-owned-task-elm

Conversation

@aasim
Copy link
Copy Markdown
Member

@aasim aasim commented Apr 29, 2026

Context

The 1ES Enterprise Live Migration (ELM) feature orchestrates Azure DevOps → Azure DevOps repository migrations. It runs two first-party Microsoft tasks (ElmPrecheckTask and ElmRepoSyncTask) inside a dynamic pipeline. These tasks emit ##vso[telemetry.publish] events for client-side policy results (RepoSizePolicy, PushPackSizePolicy, MaxFileSizePolicy, RefNamePolicy), per-migration outcomes, and SLA-relevant timing data.

Today none of that telemetry reaches Customer Intelligence. The PublishTelemetryCommand in TelemetryCommandExtension.cs silently drops events from any task whose TaskStep.IsServerOwned is not true and whose ID is not in WellKnownTasks.RequiredForTelemetry. The Advanced Security task family solved this exact problem by adding their task IDs to the same allowlist (see MicrosoftExtensionTaskIds.AdvancedSecurity* entries).

ADO Work Item: AB#


Description

Adds the two ELM task GUIDs to WellKnownTasks.MicrosoftExtensionTaskIds and appends them to the RequiredForTelemetry list in Constants.cs:

  • ElmPrecheckTask95f6d7c1-6cc0-47c2-944c-6172489de134
  • ElmRepoSyncTask3a0763a8-1b93-4553-a749-947a21c45b3f

Publisher: enterpriselivemigration (Microsoft-signed first-party extension).


Risk Assessment (Low)

Two GUID entries appended to a hardcoded allowlist. No control flow changes, no new code paths, no behavior change for any task already on the list or for tasks not on it. Identical pattern to the prior Advanced Security additions, which have been in production for multiple agent releases without incident. Backward compatible — older agents simply continue to drop ELM telemetry as they do today.


Unit Tests Added or Updated (No)

Existing tests in TelemetryCommandExtensionTests.cs (PublishTelemetryCommandForInBoxTask, PublishTelemetryCommandForCustomerTask) cover the gating logic generically. The Advanced Security additions also did not add per-task tests for the same reason — the behavior under test is the allowlist mechanism, not the contents of the list.


Additional Testing Performed

  • Verified locally that git diff produces only the intended additions in Constants.cs.
  • Will validate end-to-end after agent rollout by running a Kusto query against cluster('vso').database('VSO').ClientTrace filtered on Area == "EnterpriseLiveMigration" and confirming ElmPrecheckTask/ElmRepoSyncTask events appear with the expected payload (repositoryId, migrationId, policy pass/fail counts).
  • Cross-checked GUIDs against tasks/ElmPrecheckTask/task.json and tasks/ElmRepoSyncTask/task.json in the enterprise-live-migration-tasks repo.

Change Behind Feature Flag (No)

Allowlist entries are static Guid values consumed at agent startup. The existing entries (Google Play, App Store, Advanced Security) are also not flagged. Adding a feature flag would require restructuring WellKnownTasks.RequiredForTelemetry from a static list to a runtime-evaluated collection — out of scope and inconsistent with established convention.


Tech Design / Approach

  • Design has been written and reviewed: follows the precedent set by the Advanced Security task additions (same file, same list, same justification: first-party Microsoft tasks needing telemetry for service-health monitoring).
  • Trade-off considered and rejected: setting TaskStep.IsServerOwned = true from the AzureDevOps server-side dynamic-pipeline generator. That path requires an mseng change, server deployment, and ties the gate to internal orchestration code. The allowlist approach is public, reviewable, and auditable.
  • Alternative considered and rejected: relaying telemetry through the ELM server API. Would duplicate plumbing that the agent already provides and require server-side schema changes for every new field.

Documentation Changes Required (No)

The RequiredForTelemetry list is not part of any external API or user-facing surface. The two added GUIDs match what is already documented in the ELM task definitions.


Logging Added/Updated (No)

No log statements added or modified. The agent already logs telemetry publish attempts at trace level via the existing PublishTelemetryCommand plumbing.


Telemetry Added/Updated (Yes)

This change is the enabler for telemetry. The actual telemetry events are emitted by the ELM tasks themselves (in the enterprise-live-migration-tasks repo) under area EnterpriseLiveMigration with features ElmPrecheck and ElmRepoSync. Payload includes:

  • repositoryId, migrationId (correlation keys)
  • Per-policy status and message
  • totalPolicies, passedPolicies, failedPolicies counters

Events will be visible in ClientTrace and CustomerIntelligence tables on the vso Kusto cluster after agent rollout. Validation query is prepared and will be run against staging migrations first.


Rollback Scenario and Process (Yes)

Rollback is a single-commit revert that removes the two GUIDs from the list. No data migration, no schema change, no dependency to unwind. Agents on the unrolled-back version simply revert to dropping ELM telemetry — same behavior as today. Rollback can be cherry-picked into a hotfix release if needed.


Dependency Impact Assessed and Regression Tested (Yes)

Single-file change in Microsoft.VisualStudio.Services.Agent. No other agent component reads RequiredForTelemetry for any purpose other than the telemetry gate at TaskRunner.IsTelemetryPublishRequired() and the PublishTelemetryCommand.Execute early-return. No third-party dependencies touched. Existing CI builds and L0 test suite cover the surrounding code.

@aasim aasim requested review from a team as code owners April 29, 2026 16:22
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
1 pipeline(s) require an authorized user to comment /azp run to run.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
1 pipeline(s) require an authorized user to comment /azp run to run.

@aasim
Copy link
Copy Markdown
Member Author

aasim commented Apr 30, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@aasim aasim removed the internal label Apr 30, 2026
@aasim aasim enabled auto-merge (squash) April 30, 2026 13:54
@tarunramsinghani
Copy link
Copy Markdown
Contributor

Merged - #5547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants