-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an initial test_linux_packages sub-workflow to our CI #123
Conversation
About ready - fixing a couple last things possibly with the rebase but ready for inspection! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mysteriously the pipeline failed even though I was able to download the file that is logged to be non-existing. I re-triggered the pipeline step to see where it takes us.
Also left some comments. I think we should use lower privileges for download and maybe even for unpacking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good for the initial tests
i wonder if we could pair build_linux_packages
with this to avoid having to re-download the artifacts, although the file sizes are so small it should be golden. may be something to consider when we start looking into larger artifacts
That would mean to merge the two jobs as files can only be shared within a job. However, we're planing to break the |
We also don't want to any needless compute on MI300s since they are a very hot commodity. We don't want to compile/build on MI300s. |
- name: Run rocminfo | ||
run: | | ||
echo "Running rocminfo" | ||
/bin/rocminfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the artifacts stored for run 13583436756, the recent rework of the layering and with (slightly adjusted) steps above, rocminfo
isn't unpacked to /tmp/bin/rocminfo
fior me locally. Are you running something that is already pre-existing on the system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've swapped it around to fix the issue you mentioned above. I did run into an issue where I wasn't putting it in the right place but tweaked a couple things and made an rsync with --verbose to ensure I'm copying. I'll play around with moving to /tmp maybe monday or could do in a follow-up. I could see a reasonable argument to doing not in / so we won't need sudo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now you can just skip moving files at all :) This also makes sure we don't pollute test machines if test are not running containerized. Commented as part of my last review.
Ready for another review. Addressed all comments and incorporated fileset_tool in order to unpack paths. |
push: | ||
branches: | ||
- ADHOCBUILD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftofter? Or should this rather be an input passed to workflow_dispatch
and workflow_call
, similar as in the build_linux_packages
workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stanza is required even if empty if the workflow can be called by another workflow.
pushd "${BUILD_ARTIFACTS_DIR}" | ||
mkdir output_dir | ||
python ${{ github.workspace }}/build_tools/fileset_tool.py artifact-flatten *.tar.xz -o output_dir --verbose | ||
sudo rsync --archive --keep-dirlinks --verbose output_dir/* / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with that you have now you can just drop this line
sudo rsync --archive --keep-dirlinks --verbose output_dir/* / |
in Run rocminfo
you can then run
${{ github.workspace }}/${BUILD_ARTIFACTS_DIR}/output_dir/bin/rocminfo
- name: Run rocminfo | ||
run: | | ||
echo "Running rocminfo" | ||
/bin/rocminfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/bin/rocminfo | |
${{ github.workspace }}/${BUILD_ARTIFACTS_DIR}/output_dir/bin/rocminfo |
Going to land this as is and then address comments in follow-up to make it a bit easier to debug with the workflow checked in on main! That ok on you @marbre ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let's land this and improve in-tree :)
This PR introduces testing on our MI300 fleet focused on installing rocminfo and its dependencies that were built and uploaded into s3 from the previous build_linux_packages phase.
A few callouts here:
A few other notes, running this fast is a bit challenging. We may want to add a couple other github actions to make debugging easier. What I ended up doing was debugging with only the test part and a hardcoded prior CI run on an ubuntu machine first. Switched to an MI300. Then finally switched to depending on the build_linux_packages workflow.