Skip to content

fix: add timeout to apt update to prevent indefinite hang on network issues#5553

Open
Bruce6X wants to merge 1 commit into
microsoft:masterfrom
Bruce6X:patch-1
Open

fix: add timeout to apt update to prevent indefinite hang on network issues#5553
Bruce6X wants to merge 1 commit into
microsoft:masterfrom
Bruce6X:patch-1

Conversation

@Bruce6X
Copy link
Copy Markdown

@Bruce6X Bruce6X commented May 6, 2026

Context

installdependencies.sh is executed by the Azure VM Extension on every VMSS reimage. It calls apt update && apt install ... with no timeout. When any apt source is unreachable, apt update hangs indefinitely in CLOSE_WAIT state, blocking apt install from running and preventing agent startup.

Observed in production:

  • 2026-05-05: connections hung to Canonical servers archive.ubuntu.com
  • 2026-05-06: connections hung to packages.microsoft.com (IP 13.107.246.67, port 443)

In both cases, confirmed via ss -tnp showing CLOSE_WAIT connections held by /usr/lib/apt/methods/https subprocess with no timeout.


Description

Add a 120-second timeout to apt update and apt-get update calls. Treat failure as non-fatal (print warning and continue) so apt install can still run against the locally cached package index on the VM.


Risk Assessment (Low / Medium / High)

Low. The change only affects behavior when apt update fails or times out — in the normal case (network available) behavior is unchanged. The locally cached package index is sufficient for apt install to satisfy the required dependencies in most cases.


Unit Tests Added or Updated (Yes / No)

No. This is a shell script change with no existing unit test framework.


Additional Testing Performed

Manually verified on Ubuntu 24.04 VMSS VMs:

  • Blocked all apt sources via iptables, confirmed timeout 120 apt update exits after timeout
  • Confirmed apt install libkrb5-3 zlib1g debsums completes successfully with exit code 0 using cached index after apt update timeout

Change Behind Feature Flag (Yes / No)

No. Feature flags are not applicable to agent installation scripts.


Tech Design / Approach

Replace apt update && apt install ... with two separate commands:

  1. timeout 120 apt update || echo "WARNING: ..." — non-fatal, continues on failure
  2. apt install ... — runs regardless of whether apt update succeeded

Documentation Changes Required (Yes/No)

No.


Logging Added/Updated (Yes/No)

Yes. A warning message is printed when apt update fails or times out, to aid diagnosis.


Telemetry Added/Updated (Yes/No)

No.


Rollback Scenario and Process (Yes/No)

Low risk — revert the two changed lines to restore original && behavior. No state changes or migrations involved.


Dependency Impact Assessed and Regression Tested (Yes/No)

Yes. Change is limited to the apt/apt-get update step on Debian-based systems. All subsequent apt install calls are unchanged. No impact on Fedora, SUSE, Alpine, or Mariner code paths.

@Bruce6X Bruce6X requested review from a team as code owners May 6, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant