feat: Replace static musl build on alpine with glibc based build on gcr distroless#2919
Conversation
lalitb
left a comment
There was a problem hiding this comment.
Nice work on this. I authored #2214, and I think this is the better fix overall. #2214 fixed the immediate musl build issue. This PR avoids that problem completely by moving the image to glibc + distroless, which feels cleaner long term.
I left one small comment about the binary size workflow, but overall this direction looks good to me.
|
I ran a little experiment and indeed after this PR we use a huge amount of memory to build compared to before - Now the question is why: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2919 +/- ##
==========================================
- Coverage 85.93% 85.93% -0.01%
==========================================
Files 725 725
Lines 275782 275782
==========================================
- Hits 236993 236988 -5
- Misses 38265 38270 +5
Partials 524 524
🚀 New features to boost your workflow:
|
My wild guess is that the final link step is more memory hungry for glibc as compared to musl. Probably can validate by adding temporary memory sampling around the |
Removing the config.toml overrides for the linker brought it back down. I kept them originally and changed them to |
…-otelcol nightly suite (open-telemetry#3032) The nightly Pipeline Performance Tests run [26069747923](https://github.com/open-telemetry/otel-arrow/actions/runs/26069747923/job/76648381259) failed in the "Run syslog TCP performance test otelcol log suite" step because the backend (`df_engine`) container never became ready (readiness check timed out after 10 attempts). Root cause: PR open-telemetry#2919 switched the `df_engine` image from musl/alpine to glibc/distroless, changing the in-container home directory from `/home/dataflow` to `/home/nonroot`, and updated the volume mount path in all then-existing nightly docker yamls. The `syslog-tcp-otelcol-docker.yaml` suite (added concurrently in open-telemetry#2962) was missed, so it was still mounting the backend config at `/home/dataflow/config.yaml` — a path that doesn't exist in the new image — causing the backend container to fail to start. This one-line change brings it in line with the other nightly suites.
Change Summary
This PR changes our standard df_engine image from being a static build based on musl + alpine to being based on glibc on gcr distroless.
Part of this change requires updates to the mount paths for orchestrator config files as the home directory is now
nonrootinstead ofdataflow.What issue does this PR close?
How are these changes tested?
To test this, I did local runs of all the nightly/continuous/comparison dashboard suites using the new image + mount path changes.
I did my best to test cross compiling with the new targets for ARM as well:
Are there any user-facing changes?
Yes, the runtime image has been changed for the repo dockerfile and that comes along with some new expectations for mount paths.