Skip to content

[megatron] megatron lora_llm support no_save_optim#9269

Open
Jintao-Huang wants to merge 1 commit into
modelscope:mainfrom
Jintao-Huang:megatron_lora_llm_support_no_save_optim
Open

[megatron] megatron lora_llm support no_save_optim#9269
Jintao-Huang wants to merge 1 commit into
modelscope:mainfrom
Jintao-Huang:megatron_lora_llm_support_no_save_optim

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request relaxes validation constraints for the lora_llm tuner type and updates the trainer to handle lora_llm similarly to lora during weight loading and checkpointing. Review feedback highlights a potential issue where non-LoRA trainable parameters in multimodal models might be excluded from checkpoints when peft_format is enabled. It is also suggested to default the tuner type to lora_llm for multimodal models when adapters are present.

self.opt_param_scheduler,
iteration=iteration,
peft_format=args.tuner_type == 'lora',
peft_format=args.tuner_type in {'lora', 'lora_llm'},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Setting peft_format=True for lora_llm ensures that LoRA weights are saved, but it may exclude other trainable parameters in multimodal models. If freeze_vit or freeze_aligner are set to False, their weights will be filtered out of the Megatron checkpoint because they are not LoRA parameters and are not automatically added to modules_to_save. Consider ensuring that peft_format is only True when no other non-LoRA parameters are trainable, or that those modules are included in the filter.

Comment on lines 710 to 711
if self.tuner_type == 'full':
self.tuner_type = 'lora'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For multimodal models, when adapters are provided but tuner_type is not specified (defaulting to 'full'), it might be more appropriate to set tuner_type to 'lora_llm' instead of 'lora'. This ensures that the specialized multimodal LoRA logic is used consistently for multimodal models.

Suggested change
if self.tuner_type == 'full':
self.tuner_type = 'lora'
if self.tuner_type == 'full':
self.tuner_type = 'lora_llm' if self.is_multimodal else 'lora'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant