Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD][Build] Porting dockerfiles from the ROCm/vllm fork #11777

Merged
merged 14 commits into from
Jan 21, 2025

Conversation

gshtras
Copy link
Contributor

@gshtras gshtras commented Jan 6, 2025

An attempt to unify the build process with how it is done in ROCm/vllm
Split the build process into:

  • Building the required dependency libraries using Dockerfile.rocm_base. Not needed for the end user, done by AMD.
  • Building the vLLM itself on top of it.

In addition to not building the libraries each time, the new base image is now much smaller (7GB on docker hub vs 18GB previously), which allows to make the build process much faster (~10 minutes)

Copy link

github-actions bot commented Jan 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation ci/build labels Jan 6, 2025
@gshtras gshtras force-pushed the dockerfile_unification branch from 8016502 to a309129 Compare January 6, 2025 19:44
@mgoin mgoin self-requested a review January 6, 2025 21:19
Signed-off-by: Gregory Shtrasberg <[email protected]>
Dockerfile.rocm Outdated Show resolved Hide resolved
Dockerfile.rocm Outdated Show resolved Hide resolved
Signed-off-by: Gregory Shtrasberg <[email protected]>
@DaBossCoda
Copy link

nice

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 12, 2025
Copy link

mergify bot commented Jan 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @gshtras.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 12, 2025
@mergify mergify bot removed the needs-rebase label Jan 13, 2025
Signed-off-by: Gregory Shtrasberg <[email protected]>
@robertgshaw2-redhat
Copy link
Collaborator

@SageMoore - could you take a quick look through this?

Copy link
Contributor

@SageMoore SageMoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. I assume the configs are copied over from the rocm/vllm-dev container and are expected to be faster than what we have on main? Do you have any performance results that you can share?

Dockerfile.rocm_base Show resolved Hide resolved
ARG FA_BRANCH
ARG FA_REPO
RUN git clone ${PYTORCH_REPO} pytorch
RUN cd pytorch && git checkout ${PYTORCH_BRANCH} && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this version of pytorch support rocm 6.3? If not, would it make more sense to pull in the 6.2 wheel from pypy instead of building from source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same base image and therefore a combination of versions used in the AMD weekly dockers - rocm/vllm-dev:main (out of ROCm/vllm.git)
6.2 wheel is a all-in-one whl that brings with it an entire ROCm worth of .so's, including the old less performant hipblaslt

@gshtras
Copy link
Contributor Author

gshtras commented Jan 13, 2025

Looks reasonable to me. I assume the configs are copied over from the rocm/vllm-dev container and are expected to be faster than what we have on main? Do you have any performance results that you can share?

Not sure which configs do you mean. The change to the triton tuning configs in this PR is due to the triton version change. Triton 3.2+ deprecated the use of num_stages:0 in favor of num_stages:2

@mgoin
Copy link
Member

mgoin commented Jan 14, 2025

It looks like the CI isn't able to find the container?

@gshtras
Copy link
Contributor Author

gshtras commented Jan 14, 2025

It looks like the CI isn't able to find the container?

All GPU nodes of AMD CI are currently down due to a network issue in the compute cluster used there
It looks like the build of the container is the only one that succeeded as it doesn't use GPU nodes

@gshtras gshtras force-pushed the dockerfile_unification branch from 1092866 to 5c36cb8 Compare January 16, 2025 20:52
@gshtras
Copy link
Contributor Author

gshtras commented Jan 17, 2025

To unbreak the CI we need vllm-project/buildkite-ci#57

Copy link

mergify bot commented Jan 18, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @gshtras.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 18, 2025
@DarkLight1337
Copy link
Member

Can you fix the merge conflicts?

Signed-off-by: Gregory Shtrasberg <[email protected]>
@mgoin mgoin enabled auto-merge (squash) January 21, 2025 01:42
@youkaichao youkaichao disabled auto-merge January 21, 2025 04:22
@youkaichao youkaichao merged commit d4b62d4 into vllm-project:main Jan 21, 2025
68 of 70 checks passed
@gshtras gshtras deleted the dockerfile_unification branch January 22, 2025 18:49
abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025
abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025
kzawora-intel added a commit to HabanaAI/vllm-fork that referenced this pull request Jan 28, 2025
- **[Bugfix] Fix score api for missing max_model_len validation
(vllm-project#12119)**
- **[Bugfix] Mistral tokenizer encode accept list of str (vllm-project#12149)**
- **[AMD][FP8] Using MI300 FP8 format on ROCm for block_quant (vllm-project#12134)**
- **[torch.compile] disable logging when cache is disabled (vllm-project#12043)**
- **[misc] fix cross-node TP (vllm-project#12166)**
- **[AMD][CI/Build][Bugfix] use pytorch stale wheel (vllm-project#12172)**
- **[core] further polish memory profiling (vllm-project#12126)**
- **[Docs] Fix broken link in SECURITY.md (vllm-project#12175)**
- **[Model] Port deepseek-vl2 processor, remove dependency (vllm-project#12169)**
- **[core] clean up executor class hierarchy between v1 and v0
(vllm-project#12171)**
- **[Misc] Support register quantization method out-of-tree (vllm-project#11969)**
- **[V1] Collect env var for usage stats (vllm-project#12115)**
- **[BUGFIX] Move scores to float32 in case of running xgrammar on cpu
(vllm-project#12152)**
- **[Bugfix] Fix multi-modal processors for transformers 4.48 (vllm-project#12187)**
- **[torch.compile] store inductor compiled Python file (vllm-project#12182)**
- **benchmark_serving support --served-model-name param (vllm-project#12109)**
- **[Misc] Add BNB support to GLM4-V model (vllm-project#12184)**
- **[V1] Add V1 support of Qwen2-VL (vllm-project#12128)**
- **[Model] Support for fairseq2 Llama (vllm-project#11442)**
- **[Bugfix] Fix num_heads value for simple connector when tp enabled
(vllm-project#12074)**
- **[torch.compile] fix sym_tensor_indices (vllm-project#12191)**
- **Move linting to `pre-commit` (vllm-project#11975)**
- **[DOC] Fix typo in docstring and assert message (vllm-project#12194)**
- **[DOC] Add missing docstring in LLMEngine.add_request() (vllm-project#12195)**
- **[Bugfix] Fix incorrect types in LayerwiseProfileResults (vllm-project#12196)**
- **[Model] Add Qwen2 PRM model support (vllm-project#12202)**
- **[Core] Interface for accessing model from `VllmRunner` (vllm-project#10353)**
- **[misc] add placeholder format.sh (vllm-project#12206)**
- **[CI/Build] Remove dummy CI steps (vllm-project#12208)**
- **[CI/Build] Make pre-commit faster (vllm-project#12212)**
- **[Model] Upgrade Aria to transformers 4.48 (vllm-project#12203)**
- **[misc] print a message to suggest how to bypass commit hooks
(vllm-project#12217)**
- **[core][bugfix] configure env var during import vllm (vllm-project#12209)**
- **[V1] Remove `_get_cache_block_size` (vllm-project#12214)**
- **[Misc] Pass `attention` to impl backend (vllm-project#12218)**
- **[Bugfix] Fix `HfExampleModels.find_hf_info` (vllm-project#12223)**
- **[CI] Pass local python version explicitly to pre-commit mypy.sh
(vllm-project#12224)**
- **[Misc] Update CODEOWNERS (vllm-project#12229)**
- **fix: update platform detection for M-series arm based MacBook
processors (vllm-project#12227)**
- **[misc] add cuda runtime version to usage data (vllm-project#12190)**
- **[bugfix] catch xgrammar unsupported array constraints (vllm-project#12210)**
- **[Kernel] optimize moe_align_block_size for cuda graph and large
num_experts (e.g. DeepSeek-V3) (vllm-project#12222)**
- **Add quantization and guided decoding CODEOWNERS (vllm-project#12228)**
- **[AMD][Build] Porting dockerfiles from the ROCm/vllm fork (vllm-project#11777)**
- **[BugFix] Fix GGUF tp>1 when vocab_size is not divisible by 64
(vllm-project#12230)**
- **[ci/build] disable failed and flaky tests (vllm-project#12240)**
- **[Misc] Rename `MultiModalInputsV2 -> MultiModalInputs` (vllm-project#12244)**
- **[Misc]Add BNB quantization for PaliGemmaForConditionalGeneration
(vllm-project#12237)**
- **[Misc] Remove redundant TypeVar from base model (vllm-project#12248)**
- **[Bugfix] Fix mm_limits access for merged multi-modal processor
(vllm-project#12252)**

---------

Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Gregory Shtrasberg <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: hongxyan <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Michal Adamczyk <[email protected]>
Signed-off-by: zibai <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Martin Gleize <[email protected]>
Signed-off-by: Shangming Cai <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: isikhi <[email protected]>
Signed-off-by: Jason Cheng <[email protected]>
Signed-off-by: Jinzhen Lin <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Konrad Zawora <[email protected]>
Co-authored-by: Wallas Henrique <[email protected]>
Co-authored-by: Kunshang Ji <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Hongxia Yang <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: yancong <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Michal Adamczyk <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: gujing <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: imkero <[email protected]>
Co-authored-by: Martin Gleize <[email protected]>
Co-authored-by: mgleize user <[email protected]>
Co-authored-by: shangmingc <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Yuan Tang <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: wangxiyuan <[email protected]>
Co-authored-by: Işık <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Cheng Kuan Yong Jason <[email protected]>
Co-authored-by: Jinzhen Lin <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
rasmith pushed a commit to rasmith/vllm that referenced this pull request Jan 30, 2025
Isotr0py pushed a commit to Isotr0py/vllm that referenced this pull request Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed rocm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants