[bugfix] Skip fps append in vllm mode for Qwen2.5-VL video (#9357)#9373
[bugfix] Skip fps append in vllm mode for Qwen2.5-VL video (#9357)#9373yushuosun wants to merge 1 commit into
Conversation
) In vllm mode, Qwen2VLTemplate.replace_tag was passing the local fps probe (a list) through mm_processor_kwargs to vllm's Qwen2_5_VLProcessor, which under transformers v5 validates fps as scalar (int|float|None) and rejects the list with StrictDataclassFieldValidationError. The v3 branch immediately below already guards 'video_metadata' with 'if self.mode != "vllm":' for the same reason. Apply the same guard to the v2_5 fps append so vllm computes fps itself from the video input. The non-vllm _encode path is unaffected: it still receives fps in mm_processor_kwargs to compute second_per_grid_ts. Fixes modelscope#9357 Co-authored-by: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the Qwen template in swift/template/templates/qwen.py to ensure that video frame rate metadata is only appended to mm_processor_kwargs when the mode is not 'vllm' for version 2.5. This change ensures consistency with the logic used for version 3. There are no review comments to address, and I have no further feedback to provide.
There was a problem hiding this comment.
Pull request overview
Fixes a vLLM-only crash when rendering prompts for Qwen2.5-VL with video inputs under newer transformers/huggingface_hub strict validation, by avoiding passing an incompatible fps structure via mm_processor_kwargs in vLLM mode.
Changes:
- Skip appending per-video
fpsdata intoinputs.mm_processor_kwargswhenself.mode == 'vllm'for theversion == 'v2_5'branch. - Align
v2_5behavior with existingv3handling that already guards vLLM from similar multimodal kwargs mutations.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I'm a bit concerned whether |
Motivation
Fixes #9357.
Running
swift infer --infer_backend vllmonQwen/Qwen2.5-VL-3B-Instructwith a video dataset crashes during prompt rendering under
transformers v5/ latesthuggingface_hub:tfbackend works; onlyvllmis broken.Root cause
swift/template/templates/qwen.py:347— theQwen2VLTemplate.replace_tagpath for
version == 'v2_5'appends the per-videovideo_kwargs(a dict that includes
fps) intoinputs.mm_processor_kwargs['fps']unconditionally:
In vLLM mode this list of dicts is then forwarded to the new
huggingface_hubstrict dataclass validator inside the HF processor,which expects
fpsto be a scalar and rejects the list.The neighbouring branches already special-case vLLM:
The
'v2_5'branch was simply missing the same guard.Modifications
swift/template/templates/qwen.py— wrap the'v2_5'branch'smm_processor_kwargs['fps'].append(...)inif self.mode != 'vllm':,matching the v3 / omni pattern (+2 / -1):
Net diff: +2 / -1 lines in one file. The
tfbackend path isunchanged.