-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM docker build with 24.12 pytorch fw image #581
Conversation
fcf8372
to
5918b45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests for this anywhere ATM? Do we need to be testing ARM platforms as part of our pre-merge testing generally? That would definitely need to be something we discuss with the azure runner folks if so
5918b45
to
1bc2f6d
Compare
Dockerfile.arm
Outdated
tensorstore==0.1.45 | ||
|
||
# For some reason, we do not need to do the tensorstore verson package hack on arm64 | ||
# RUN sed -i 's/^Version: 0\.0\.0$/Version: 0.1.45/' /usr/local/lib/python3.12/dist-packages/tensorstore-0.0.0.dist-info/METADATA && mv /usr/local/lib/python3.12/dist-packages/tensorstore-0.0.0.dist-info /usr/local/lib/python3.12/dist-packages/tensorstore-0.1.45.dist-info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Dockerfile.arm
Outdated
|
||
cd ${WORKDIR} && \ | ||
RUN cd ${WORKDIR} && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstjohn I broke up this run command into two -- my understanding is i'm doing this in an earlier stage of the docker build therefore this doesn't have negative image size consequences -- am I correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all these layers end up in our release image, groan. I was thinking I would work a bit on trying to clean that up. But I think if you want to split these, that's fine; but you should move the
rm -rf llvm-project
line up to this block (if you can)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cant move the rm -rf llvm-project up, because that's where the static libraries are against which triton compiles. I will just merge them back. I split these commands for debugging purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to merging them back. I put it together because the LLVM build cache is huge and must be deleted in the same RUN command to actually reduce size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep -- makes total sense -- thanks for the feedback.
Merged the two commands again. @gagank1 can you re-review?
1bc2f6d
to
597c408
Compare
/build-ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure LGTM if the tests pass 😅 , but tbh I'm not 100% clear on what's going on here with the ffmpeg stuff
597c408
to
b11b2fc
Compare
/build-ci |
rm -rf decord | ||
apt-get install -y ffmpeg libavcodec-dev libavfilter-dev libavformat-dev libavutil-dev | ||
# && cp /usr/lib/aarch64-linux-gnu/libnvcuvid* /usr/local/cuda/ | ||
RUN --mount=type=bind,source=./arm_build/decord_ffmpeg6_fix.patch,target=/decord_ffmpeg6_fix.patch \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my own knowledge - what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decord doesn't build againt the latest ffmpeg. Rather than downgrading ffmpeg (which caused issues for me), someone provided a patch file to make decord compatible with the newest ffmpeg. The link in the comment has a description: dmlc/decord#186 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, i wonder why ffmpeg started causing issues now
Signed-off-by: Timur Rvachov <[email protected]>
/build-ci |
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #581 +/- ##
=======================================
Coverage 87.73% 87.73%
=======================================
Files 89 89
Lines 5758 5758
=======================================
Hits 5052 5052
Misses 706 706 ☔ View full report in Codecov by Sentry. |
Description
Updates ARM Dockerfile to work with 24.12 pytorch FW image
Type of changes
CI Pipeline Configuration
ARM build is not covered by pre-merge CI, is covered nightly. Having said that, this change might not work with CI at all because the blossom runners have a kernel/CUDA mismatch on their Grace systems.... more info TBD
Usage
Pre-submit Checklist