[BugFix] Include underlying cause in HdfsFsManager copy error messages#73414
[BugFix] Include underlying cause in HdfsFsManager copy error messages#73414xiangguangyxg wants to merge 3 commits into
Conversation
copyFromLocal/copyToLocal wrap the underlying Hadoop exception in a StarRocksException but only put the raw paths into the new message, dropping the cause's message. Callers that surface only StarRocksException#getMessage (e.g. ClusterSnapshotJob.error_message shown via information_schema.cluster_snapshot_jobs) therefore see "Failed to copy local /opt/starrocks/fe/meta/image to s3://..." without any hint of the real reason (AccessDenied, NoSuchBucket, SocketTimeoutException, etc.), forcing operators to dig into fe.log to find the cause. Append the cause's message to the wrapper message so the real reason is visible at the call site too. Also fix the missing space before "to local" in copyToLocal's error string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Cover both copyToLocal and copyFromLocal: the wrapper StarRocksException's message must include the underlying cause's message so callers that only surface getMessage() still see the real failure reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fe Cover Checker flagged lines 1222 and 1238 (the InterruptedIOException catch blocks of copyToLocal/copyFromLocal) as uncovered. Add tests that inject InterruptedIOException through Mockito so the wrapper message formatting in those branches is exercised. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Can't wait for the next one! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 4 / 4 (100.00%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
Why I'm doing:
HdfsFsManager.copyFromLocal/copyToLocalwrap the underlying Hadoop exception in aStarRocksExceptionbut only put the source / destination paths into the new message, dropping the cause's message.Callers that surface only
StarRocksException#getMessageto the user can lose all useful diagnostic information. The motivating case is automated cluster snapshot upload to remote object storage:When the underlying S3/OSS upload fails (AccessDenied, NoSuchBucket, SocketTimeoutException, SignatureDoesNotMatch, FileAlreadyExistsException, etc.), the user only sees the following in
information_schema.cluster_snapshot_jobs.error_message:…with no hint of the real reason. Operators currently have to grep
fe.logfor theException while copy local …stack trace to find what actually went wrong.What I'm doing:
e.getMessage()to theStarRocksExceptionwrapper message in bothcopyFromLocalandcopyToLocal(for bothInterruptedIOExceptionand the genericExceptionpaths), so the real cause is visible at the call site without losing the existing wrapper for log greppability.to localincopyToLocal's error string (previously producedFailed to copy /pathto local /dest).testCopyToLocalIncludesCauseMessageandtestCopyFromLocalIncludesCauseMessagecovering both methods.The underlying cause is already attached to the
StarRocksExceptionand still gets logged with full stack trace viaLOG.error— this change only adds the cause's message to the user-visible message string.Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
Checklist:
Bugfix cherry-pick branch check: