Skip to content

Closed by author#9375

Closed
yyswhsccc wants to merge 1 commit into
modelscope:mainfrom
yyswhsccc:bounty-radar/issue-9349-qwen35-grpo-offload
Closed

Closed by author#9375
yyswhsccc wants to merge 1 commit into
modelscope:mainfrom
yyswhsccc:bounty-radar/issue-9349-qwen35-grpo-offload

Conversation

@yyswhsccc
Copy link
Copy Markdown

@yyswhsccc yyswhsccc commented May 18, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to ensure that Qwen3.5 conv1d parameters are correctly gathered when using DeepSpeed ZeRO-3. It adds a helper function _gather_qwen3_5_conv1d_params_if_zero3 and wraps the sequence parallel forward logic within this context to prevent issues where causal-conv kernels might bypass DeepSpeed's standard hooks. Additionally, a new test suite is included to verify the parameter gathering behavior and the patched forward implementation. I have no feedback to provide as there were no review comments to evaluate.

@yyswhsccc yyswhsccc changed the title [Bug fix] Fix Qwen3.5 conv1d ZeRO-3 gather Closed by author May 18, 2026
@yyswhsccc yyswhsccc closed this May 18, 2026
@yyswhsccc yyswhsccc deleted the bounty-radar/issue-9349-qwen35-grpo-offload branch May 18, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant