Skip to content

[BugFix] Include underlying cause in HdfsFsManager copy error messages (backport #73414)#73479

Merged
wanpengfei-git merged 1 commit into
branch-4.0from
mergify/bp/branch-4.0/pr-73414
May 19, 2026
Merged

[BugFix] Include underlying cause in HdfsFsManager copy error messages (backport #73414)#73479
wanpengfei-git merged 1 commit into
branch-4.0from
mergify/bp/branch-4.0/pr-73414

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify Bot commented May 19, 2026

Why I'm doing:

HdfsFsManager.copyFromLocal / copyToLocal wrap the underlying Hadoop exception in a StarRocksException but only put the source / destination paths into the new message, dropping the cause's message.

Callers that surface only StarRocksException#getMessage to the user can lose all useful diagnostic information. The motivating case is automated cluster snapshot upload to remote object storage:

ClusterSnapshotCheckpointScheduler.java
    errMsg = "upload image failed, err msg: " + e.getMessage();

When the underlying S3/OSS upload fails (AccessDenied, NoSuchBucket, SocketTimeoutException, SignatureDoesNotMatch, FileAlreadyExistsException, etc.), the user only sees the following in information_schema.cluster_snapshot_jobs.error_message:

upload image failed, err msg: Failed to copy local /opt/starrocks/fe/meta/image to s3://tenant-bucket/.../meta/image/automated_cluster_snapshot_xxx

…with no hint of the real reason. Operators currently have to grep fe.log for the Exception while copy local … stack trace to find what actually went wrong.

What I'm doing:

  • Append e.getMessage() to the StarRocksException wrapper message in both copyFromLocal and copyToLocal (for both InterruptedIOException and the generic Exception paths), so the real cause is visible at the call site without losing the existing wrapper for log greppability.
  • Drive-by: fix the missing space before to local in copyToLocal's error string (previously produced Failed to copy /pathto local /dest).
  • Add unit tests testCopyToLocalIncludesCauseMessage and testCopyFromLocalIncludesCauseMessage covering both methods.

The underlying cause is already attached to the StarRocksException and still gets logged with full stack trace via LOG.error — this change only adds the cause's message to the user-visible message string.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

#73414)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 2d6a9dd)
@wanpengfei-git wanpengfei-git merged commit fa93d5e into branch-4.0 May 19, 2026
38 of 39 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-4.0/pr-73414 branch May 19, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants