Static dependency pyproject.toml proposal#9337
Conversation
To modernize pkg management workflow
and remove redundant dep entries
There was a problem hiding this comment.
Code Review
This pull request introduces a pyproject.toml file to define the build system, metadata, and dependencies for the ms_swift project. Feedback highlights several dependency issues, including a version conflict between datasets and the required Python version, the inclusion of the unmaintained attrdict package, and redundant or deprecated dependencies like modelscope and recommonmark. Additionally, improvements were suggested for dependency naming, environment markers for importlib-metadata, and adjusting restrictive Python version requirements for trl and Megatron-related packages.
| "charset_normalizer", | ||
| "cpm_kernels", | ||
| "dacite", | ||
| "datasets>=3.0,<4.8.5", |
There was a problem hiding this comment.
datasets>=3.0 requires Python 3.9 or higher. This conflicts with the project's declared requires-python = ">=3.8" on line 9. This will cause installation to fail on Python 3.8 environments. Consider lowering the minimum version of datasets to 2.0 or increasing the minimum Python version to 3.9.
| "datasets>=3.0,<4.8.5", | |
| "datasets>=2.0,<4.8.5", |
| "accelerate", | ||
| "addict", | ||
| "aiohttp", | ||
| "attrdict", |
There was a problem hiding this comment.
attrdict is unmaintained and incompatible with Python 3.10+ (it fails with AttributeError: module 'collections' has no attribute 'Mapping'). Since this project supports Python 3.10, 3.11, and 3.12, this dependency will cause runtime errors. It is recommended to remove it and use addict (already listed on line 28) instead.
| "einops", | ||
| "fastapi", | ||
| "gradio>=3.40.0,<6.0", | ||
| "importlib_metadata", |
There was a problem hiding this comment.
The canonical package name is importlib-metadata (with a hyphen). Additionally, since the project requires Python >= 3.8, the standard library importlib.metadata is available. If the backport is still needed for specific features (like the improved entry_points API in 3.10+), it should use an environment marker to avoid unnecessary installation on newer Python versions.
| "importlib_metadata", | |
| "importlib-metadata; python_version < '3.10'", |
| "tqdm", | ||
| "transformers>=4.33,<5.9.0", | ||
| "transformers_stream_generator", | ||
| "trl>=0.15,<1.0; python_version>='3.10'", |
There was a problem hiding this comment.
Including trl in the main dependencies with a python_version>='3.10' marker means it won't be installed on Python 3.8/3.9. If the codebase imports trl without runtime checks, it will crash. If trl is a core dependency, the project's minimum Python version should be bumped to 3.10. If it is optional, it should be moved to [project.optional-dependencies].
| "mcore-bridge>=1.2.0; python_version>='3.12'", | ||
| "megatron-core>=0.15; python_version>='3.12'", | ||
| "peft>=0.15; python_version>='3.12'", |
There was a problem hiding this comment.
The python_version>='3.12' markers for Megatron-related dependencies are very restrictive. megatron-core and mcore-bridge typically support Python 3.10+. This restriction prevents users on Python 3.10 and 3.11 from using the Megatron integration. Unless there is a specific technical reason for 3.12+, consider lowering this to 3.10.
| "expecttest", | ||
| "flake8", | ||
| "isort>=4.3.21", | ||
| "modelscope", |
| docs = [ | ||
| "docutils>=0.16.0", | ||
| "myst_parser", | ||
| "recommonmark", |
Do as gemini-code-assist said
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a pyproject.toml file to manage project metadata, dependencies, and build configuration for the ms_swift package. The review feedback suggests several improvements to dependency management, including lowering the Python version requirement for Megatron integration to increase compatibility, standardizing version bounds for datasets and transformers according to semantic versioning, and adding a fallback for the trl library on Python versions below 3.10. There is also a recommendation to consolidate dependency definitions to prevent potential synchronization issues between configuration files.
| "mcore-bridge>=1.2.0; python_version>='3.12'", | ||
| "megatron-core>=0.15; python_version>='3.12'", | ||
| "peft>=0.15; python_version>='3.12'", |
There was a problem hiding this comment.
The Python version restriction python_version>='3.12' for the megatron optional dependencies appears overly restrictive. megatron-core and its associated packages typically support Python 3.10 and 3.11. Restricting these to 3.12+ will prevent many users in common research environments (which often use Python 3.10 or 3.11) from utilizing the Megatron integration features.
| "mcore-bridge>=1.2.0; python_version>='3.12'", | |
| "megatron-core>=0.15; python_version>='3.12'", | |
| "peft>=0.15; python_version>='3.12'", | |
| "mcore-bridge>=1.2.0; python_version>='3.10'", | |
| "megatron-core>=0.15; python_version>='3.10'", | |
| "peft>=0.15; python_version>='3.10'", |
| "charset_normalizer", | ||
| "cpm_kernels", | ||
| "dacite", | ||
| "datasets>=3.0,<4.8.5; python_version>='3.9'", |
There was a problem hiding this comment.
The upper bound <4.8.5 for datasets seems arbitrary. Since datasets is currently on version 3.x, using a major version bound like <4.0.0 is more standard and avoids confusion with specific patch versions that do not yet exist.
| "datasets>=3.0,<4.8.5; python_version>='3.9'", | |
| "datasets>=3.0,<4.0.0; python_version>='3.9'", |
| "tensorboard", | ||
| "tiktoken", | ||
| "tqdm", | ||
| "transformers>=4.33,<5.9.0", |
There was a problem hiding this comment.
| "tqdm", | ||
| "transformers>=4.33,<5.9.0", | ||
| "transformers_stream_generator", | ||
| "trl>=0.15,<1.0; python_version>='3.10'", |
There was a problem hiding this comment.
The trl dependency is guarded by python_version>='3.10'. Since the project supports Python 3.8+, users on older versions will not have trl installed, which may lead to runtime errors if the package depends on it. Consider adding a fallback for Python < 3.10 using an older version of trl that still supports those environments.
| "trl>=0.15,<1.0; python_version>='3.10'", | |
| "trl>=0.15,<1.0; python_version>='3.10'", | |
| "trl<0.15; python_version<'3.10'", |
| dependencies = [ | ||
| "accelerate", | ||
| "addict; python_version>='3.10'", | ||
| "aiohttp", | ||
| "attrdict; python_version<'3.10'", | ||
| "binpacking", | ||
| "charset_normalizer", | ||
| "cpm_kernels", | ||
| "dacite", | ||
| "datasets>=3.0,<4.8.5; python_version>='3.9'", | ||
| "datasets>=2.0,<3; python_version<'3.9'", | ||
| "einops", | ||
| "fastapi", | ||
| "gradio>=3.40.0,<6.0", | ||
| "importlib-metadata; python_version<'3.10'", | ||
| "json_repair", | ||
| "matplotlib", | ||
| "modelscope>=1.23", | ||
| "nltk", | ||
| "numpy", | ||
| "openai", | ||
| "oss2", | ||
| "pandas", | ||
| "peft>=0.11,<0.20", | ||
| "pillow", | ||
| "PyYAML>=5.4", | ||
| "requests", | ||
| "rouge", | ||
| "safetensors", | ||
| "scipy", | ||
| "sentencepiece", | ||
| "simplejson>=3.3.0", | ||
| "sortedcontainers>=1.5.9", | ||
| "tensorboard", | ||
| "tiktoken", | ||
| "tqdm", | ||
| "transformers>=4.33,<5.9.0", | ||
| "transformers_stream_generator", | ||
| "trl>=0.15,<1.0; python_version>='3.10'", | ||
| "uvicorn", | ||
| "zstandard", | ||
| ] |
There was a problem hiding this comment.
Hardcoding a static list of dependencies in pyproject.toml while setup.py continues to parse them from requirements/*.txt introduces a maintenance risk where the two sources of truth can fall out of sync. It is recommended to consolidate dependency management into pyproject.toml and either remove setup.py or refactor it to read from the TOML file to maintain a single source of truth.
PR type
PR information
Add a secondary
pyproject.tomlproject description in addition to legacysetup.pyto makeuv syncwork forms-swiftmain repo.All
requirements/*.txthave been imported into thepyproject.tomlstatically, with python version guards addition for some package occasionally to allow the declarative dependency resolver happy.Experiment results
Just enough for
uv sync.