[bugfix] Skip fps append in vllm mode for Qwen2.5-VL video (#9357) by yushuosun · Pull Request #9373 · modelscope/ms-swift

yushuosun · 2026-05-17T23:57:19Z

Motivation

Running swift infer --infer_backend vllm on Qwen/Qwen2.5-VL-3B-Instruct
with a video dataset crashes during prompt rendering under
transformers v5 / latest huggingface_hub:

File "huggingface_hub/dataclasses.py", line 144, in __strict_setattr__
    validator(value)
File "huggingface_hub/dataclasses.py", line 625, in validator
    type_validator(field.name, value, field.type)
File "huggingface_hub/dataclasses.py", line 482, in type_validator
    type_validator(name, value, args[0])
TypeError: ... fps must be a scalar, got list ...

tf backend works; only vllm is broken.

Root cause

swift/template/templates/qwen.py:347 — the Qwen2VLTemplate.replace_tag
path for version == 'v2_5' appends the per-video video_kwargs
(a dict that includes fps) into inputs.mm_processor_kwargs['fps']
unconditionally:

inputs.mm_processor_kwargs.setdefault('fps', []).append(video_kwargs)

In vLLM mode this list of dicts is then forwarded to the new
huggingface_hub strict dataclass validator inside the HF processor,
which expects fps to be a scalar and rejects the list.

The neighbouring branches already special-case vLLM:

elif self.version == 'v3':
    if self.mode != 'vllm':
        video, video_metadata = ...
elif self.version == 'omni':
    if self.mode != 'vllm':
        ...

The 'v2_5' branch was simply missing the same guard.

Modifications

swift/template/templates/qwen.py — wrap the 'v2_5' branch's
mm_processor_kwargs['fps'].append(...) in if self.mode != 'vllm':,
matching the v3 / omni pattern (+2 / -1):

 if self.version == 'v2_5':
-    inputs.mm_processor_kwargs.setdefault('fps', []).append(video_kwargs)
+    if self.mode != 'vllm':
+        inputs.mm_processor_kwargs.setdefault('fps', []).append(video_kwargs)
 elif self.version == 'v3':

Net diff: +2 / -1 lines in one file. The tf backend path is
unchanged.

) In vllm mode, Qwen2VLTemplate.replace_tag was passing the local fps probe (a list) through mm_processor_kwargs to vllm's Qwen2_5_VLProcessor, which under transformers v5 validates fps as scalar (int|float|None) and rejects the list with StrictDataclassFieldValidationError. The v3 branch immediately below already guards 'video_metadata' with 'if self.mode != "vllm":' for the same reason. Apply the same guard to the v2_5 fps append so vllm computes fps itself from the video input. The non-vllm _encode path is unaffected: it still receives fps in mm_processor_kwargs to compute second_per_grid_ts. Fixes modelscope#9357 Co-authored-by: Claude <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request updates the Qwen template in swift/template/templates/qwen.py to ensure that video frame rate metadata is only appended to mm_processor_kwargs when the mode is not 'vllm' for version 2.5. This change ensures consistency with the logic used for version 3. There are no review comments to address, and I have no further feedback to provide.

Copilot

Pull request overview

Fixes a vLLM-only crash when rendering prompts for Qwen2.5-VL with video inputs under newer transformers/huggingface_hub strict validation, by avoiding passing an incompatible fps structure via mm_processor_kwargs in vLLM mode.

Changes:

Skip appending per-video fps data into inputs.mm_processor_kwargs when self.mode == 'vllm' for the version == 'v2_5' branch.
Align v2_5 behavior with existing v3 handling that already guards vLLM from similar multimodal kwargs mutations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Tohrusky · 2026-05-18T09:00:30Z

I'm a bit concerned whether video_metadata is correctly passed in vllm mode. Could we print the RoPE-related or something to verify the propagation when set a FPS=24?

Copilot AI review requested due to automatic review settings May 17, 2026 23:57

Copilot started reviewing on behalf of yushuosun May 17, 2026 23:57 View session

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

Copilot AI reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] Skip fps append in vllm mode for Qwen2.5-VL video (#9357)#9373

[bugfix] Skip fps append in vllm mode for Qwen2.5-VL video (#9357)#9373
yushuosun wants to merge 1 commit into
modelscope:mainfrom
yushuosun:claude/trusting-ptolemy-i0ALe

yushuosun commented May 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Tohrusky commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yushuosun commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Root cause

Modifications

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Tohrusky commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yushuosun commented May 17, 2026 •

edited

Loading