rocmPackages.composable_kernel: compress output #299589

mschwaig · 2024-03-27T21:26:48Z

Description of changes

I made a mistake in my ROCm 6.0 PR #287846, not realizing that rocmPackages.composable_kernel would be larger than Hydra's size limit. See the comments here for details: #287846 (comment)

~~This PR compresses the huge lib/libdevice_operations.a file that's in the output of that derivation, and then decompresses it again in another derivation based on runCommandLocal.~~

~~Is this kind of fix acceptable?~~
~~Is how I implemented this configurable enough and the derivations carry the correct metadata?~~
~~How else could this issue be addressed?~~
~~Not having that derivation cached is a huge pain, since the build takes a few hours at least.~~

~~I am running nixpkgs-review on this right now.~~

Things done

Add a 👍 reaction to pull requests you find important.

mschwaig · 2024-03-28T10:29:57Z

Result of nixpkgs-review pr 299589 run on x86_64-linux 1

1 package marked as broken and skipped:

rocmPackages.migraphx

2 packages built:

rocmPackages.composable_kernel
rocmPackages.miopen

ulrikstrid · 2024-03-28T13:20:11Z

I'm going to invoke @wegank and @SuperSandro2000 to give their view of this

SuperSandro2000 · 2024-03-31T18:29:42Z

lib/libdevice_operations.a

Do we even need the static library? Is it stripped?

GZGavinZhao · 2024-04-03T04:55:44Z

Do we even need the static library?

It is intended to be consumed as a static library. Building it as a shared library needs patching. I've tried that in the past, and every time MIOpen would fail to find composable_kernel properly if I do that. I will try this again and see if the situation now improves.

GZGavinZhao · 2024-04-03T05:04:10Z

An option would be to cherry-pick llvm/llvm-project@7e28234 to ROCm's LLVM.

Or maybe backport ROCm/composable_kernel#1044 and ROCm/MIOpen#2526. I've tried this a few months ago (GZGavinZhao/composable_kernel@c8d9ac8) and I don't recall getting too much space improvement, but I could've mis-remembered since my original intention wasn't to save space.

mschwaig · 2024-04-03T10:14:05Z

lib/libdevice_operations.a

Do we even need the static library? Is it stripped?

Yes, we need it. It is a fat binary file with a a lot of of GPU-specific code for each target GPU, which is used by other parts of ROCm.

I checked the log and yes, it is stripped:

stripping (with command strip and flags -S -p) in  /nix/store/sfybgqj64vhgp386fd3xw0fkjx0wvzwf-composable_kernel-6.0.2/lib

I would prefer if we first found an acceptable solution to get caching to work for users of ROCm, because especially composable_kernel is a real pain to build in terms of hardware requirements and how long the build takes. We could then always switch to a better solution when we find one that we can implement with a reasonable amount of effort.

Another alternative to this PR would probably be setting a long enough meta.timeout for the few consumers of composable_kernel (migraphx and miopen) so it can be rebuilt on the fly while building those.
Though I barely know anything about Hydra, so I don't know if it would behave how I would imagine.

In case the bandwidth and storage costs are actually proportional to compressed size I think it would make sense to tie Hydra's output size limit to compressed size to avoid these kinds of workarounds as much as possible.

mschwaig · 2024-04-03T10:29:04Z

An option would be to cherry-pick llvm/llvm-project@7e28234 to ROCm's LLVM.

Thanks for linking this, I was aware of the -compress option for clang-offload-bundler and that it is not present yet in version of LLVM we use in ROCm, but I was not aware of the associated back and forth. I might find the time to see if this works out of the box, but currently I cannot invest a lot of time into getting it to work.

An option would be to cherry-pick llvm/llvm-project@7e28234 to ROCm's LLVM.

Or maybe backport ROCm/composable_kernel#1044 and ROCm/MIOpen#2526. I've tried this a few months ago (GZGavinZhao/composable_kernel@c8d9ac8) and I don't recall getting too much space improvement, but I could've mis-remembered since my original intention wasn't to save space.

To me it looks like this just splits up that huge file in a few smaller ones that add up to roughly the same total size again, so it does not help us with the self-imposed limit we have here. Going by that first PR it also looks like it would need corresponding changes in other places. I cannot invest the time required to try this.

mschwaig · 2024-04-03T10:32:50Z

We could also try dropping the gfx940 and gfx941 targets as suggested here #298388 (comment) to see if that gets us below the size limit.

mschwaig · 2024-04-03T10:48:20Z

Here is a bit of an overview of what's in that file:

$ bloaty /nix/store/sfybgqj64vhgp386fd3xw0fkjx0wvzwf-composable_kernel-6.0.2/lib/libdevice_operations.a
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  75.4%  2.78Gi  93.7%  2.78Gi    .hip_fatbin
   8.1%   307Mi   0.0%       0    .strtab
   5.9%   221Mi   0.0%       0    [AR Symbol Table]
   5.8%   219Mi   4.7%   142Mi    [954581 Others]
   2.4%  90.7Mi   0.0%       0    [ELF Section Headers]
   0.8%  29.9Mi   1.0%  29.9Mi    .rodata.str1.1
   0.6%  23.6Mi   0.0%       0    .symtab
   0.4%  13.6Mi   0.4%  13.6Mi    .eh_frame
   0.2%  8.49Mi   0.0%       0    .rela.eh_frame
   0.1%  5.64Mi   0.0%       0    .group
   0.1%  3.86Mi   0.1%  3.86Mi    .text
   0.1%  2.97Mi   0.0%       0    .rela.text
   0.0%   877Ki   0.0%       0    [Unmapped]
   0.0%   688Ki   0.0%       0    .rela.text._ZN2ck15get_device_nameB5cxx11Ev
   0.0%   522Ki   0.0%   522Ki    .text._ZN2ck15get_device_nameB5cxx11Ev
   0.0%   361Ki   0.0%       0    .llvm_addrsig
   0.0%   310Ki   0.0%       0    .rela.text._Z15hip_check_error10hipError_t
   0.0%   165Ki   0.0%       0    .rela.text._ZNK2ck16tensor_operation6device12BaseOperator17GetTypeIdHashCodeB5cxx11Ev
   0.0%   160Ki   0.0%   160Ki    .text._Z15hip_check_error10hipError_t
   0.0%   153Ki   0.0%   153Ki    .text._ZNK2ck16tensor_operation6device12BaseOperator17GetTypeIdHashCodeB5cxx11Ev
   0.0%   105Ki   0.0%       0    .rela.data.rel.ro
 100.0%  3.69Gi 100.0%  2.97Gi    TOTAL

.hip_fatbin contains the GPU specific code and this is what clang-offload-bundler will be able to compress in the future.

Flakebi · 2024-04-03T12:14:27Z

The workaround here looks good to me.
(Much more appealing than rebuilding for its dependencies.)

Thanks for putting so much time and effort into this.

mschwaig · 2024-04-03T14:04:36Z

Another alternative to this PR would probably be setting a long enough meta.timeout for the few consumers of composable_kernel (migraphx and miopen) so it can be rebuilt on the fly while building those.
Though I barely know anything about Hydra, so I don't know if it would behave how I would imagine.

Turns out this would not even work because miopen is practically the same size as composable_kernel so it will not get cached either. In practice this PR still reduces the problem by orders of magnitude, since building miopen will only take ROCm users about 10 minutes. Packages further downstream like torchWithRocm should be cached, since they are a lot smaller.

mschwaig · 2024-04-03T19:15:55Z

We could also try dropping the gfx940 and gfx941 targets as suggested here #298388 (comment) to see if that gets us below the size limit.

I tried this as well now and it does not work either. It only gets us down to 3.1 GB.

mschwaig · 2024-04-04T11:51:56Z

An option would be to cherry-pick llvm/llvm-project@7e28234 to ROCm's LLVM.

I have backported this commit here mschwaig/llvm-project@527b6ea and it looks like that will work. The size of composable_kernel and miopen goes down from 3.7 GB to 1.3 GB when I do this.

What I'm still having trouble with:

Applying this commit as a patch in Nix. For some reason none of the changes, besides the ones creating new files, apply at all because the files in question are not found. I suspect that this has something to do with sourceRoot, but I could not figure out how to get past that issue.
Right now I'm just wrapping clang-offload-bundler to add the -compress argument instead of only modifying the relevant invocations, which I was not able to identify yet. This might be fine, but I am not sure.
~~I have not tested that applications keep working.~~

You can find what I have here: mschwaig@fe795d4

To conserve reviewer's time I am moving this back into the draft stage.

nixos-discourse · 2024-04-04T22:19:35Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/trouble-patching-llvm-source-code-for-rocm/42772/1

This patches the clang-offload-bundler tool to add a compression option from a more recent version of clang. This compression option reduces the size of ROCm's fat binaries. Those binaries contain .hip_fatbin sections with GPU-specific code, for each target. Compression is automatically turned on for all produced outputs via a wrapper, because it's difficult to identify all the places where the -compression argument would be needed. Once upsteam introduces handeling for this argument, we should drop the wrapper again. This transistion will create inconsistsency, but I do not think that it will impact any actual users and it's what's practical to implement.

mschwaig · 2024-04-07T15:29:24Z

I have a working version with the -compress option for clang-offload-bundler now, so this is ready for review again.

Result of nixpkgs-review pr 299589 run on x86_64-linux 1

11 packages marked as broken and skipped:

rocmPackages.llvm.flang
rocmPackages.llvm.flang.doc
rocmPackages.llvm.flang.info
rocmPackages.llvm.flang.man
rocmPackages.llvm.libclc
rocmPackages.migraphx
rocmPackages.mivisionx
rocmPackages.mivisionx-cpu
rocmPackages.mivisionx-hip
rocmPackages.rdc
rocmPackages.rdc.doc

88 packages built:

blender-hip
rocmPackages.clang-ocl
rocmPackages.clr
rocmPackages.clr.icd
rocmPackages.composable_kernel
rocmPackages.half
rocmPackages.hip-common
rocmPackages.hipblas
rocmPackages.hipcc
rocmPackages.hipcub
rocmPackages.hipfft
rocmPackages.hipfort
rocmPackages.hipify
rocmPackages.hiprand
rocmPackages.hipsolver
rocmPackages.hipsparse
rocmPackages.hsa-amd-aqlprofile-bin
rocmPackages.llvm.bintools
rocmPackages.llvm.clang
rocmPackages.llvm.clang-tools-extra
rocmPackages.llvm.clang-tools-extra.doc
rocmPackages.llvm.clang-tools-extra.info
rocmPackages.llvm.clang-tools-extra.man
rocmPackages.llvm.clang-unwrapped
rocmPackages.llvm.clang-unwrapped.doc
rocmPackages.llvm.clang-unwrapped.info
rocmPackages.llvm.clang-unwrapped.man
rocmPackages.llvm.compiler-rt
rocmPackages.llvm.libc
rocmPackages.llvm.libc.doc
rocmPackages.llvm.libcxx
rocmPackages.llvm.libcxx.doc
rocmPackages.llvm.libcxxabi
rocmPackages.llvm.libunwind
rocmPackages.llvm.libunwind.doc
rocmPackages.llvm.lld
rocmPackages.llvm.lld.doc
rocmPackages.llvm.lldb
rocmPackages.llvm.lldb.doc
rocmPackages.llvm.lldb.info
rocmPackages.llvm.lldb.man
rocmPackages.llvm.llvm
rocmPackages.llvm.llvm.doc
rocmPackages.llvm.llvm.info
rocmPackages.llvm.llvm.man
rocmPackages.llvm.mlir
rocmPackages.llvm.openmp
rocmPackages.llvm.openmp.doc
rocmPackages.llvm.openmp.info
rocmPackages.llvm.openmp.man
rocmPackages.llvm.polly
rocmPackages.llvm.polly.doc
rocmPackages.llvm.polly.info
rocmPackages.llvm.polly.man
rocmPackages.llvm.pstl
rocmPackages.llvm.rocmClangStdenv
rocmPackages.miopen
rocmPackages.rccl
rocmPackages.rocalution
rocmPackages.rocblas
rocmPackages.rocdbgapi
rocmPackages.rocdbgapi.doc
rocmPackages.rocfft
rocmPackages.rocgdb
rocmPackages.rocm-cmake
rocmPackages.rocm-comgr
rocmPackages.rocm-core
rocmPackages.rocm-device-libs
rocmPackages.rocm-runtime
rocmPackages.rocm-smi
rocmPackages.rocm-thunk
rocmPackages.rocminfo
rocmPackages.rocmlir
rocmPackages.rocmlir-rock
rocmPackages.rocmlir.external
rocmPackages.rocprim
rocmPackages.rocprofiler
rocmPackages.rocr-debug-agent
rocmPackages.rocsolver
rocmPackages.rocsparse
rocmPackages.rocthrust
rocmPackages.roctracer
rocmPackages.rocwmma
rocmPackages.rpp (rocmPackages.rpp-hip)
rocmPackages.rpp-cpu
rocmPackages.rpp-opencl
rocmPackages.tensile
rocmPackages.tensile.dist

ulrikstrid

I think this looks good now. Landing this would be huge since it takes quite some time to build composable_kernel

mschwaig · 2024-04-13T14:59:09Z

I think this looks good now. Landing this would be huge since it takes quite some time to build composable_kernel

As the author of this PR I think its ready too. 👍 I ran nixpkgs-review on it and manually tested if torchWithRocm still works.

(This PR is also a prerequisite for merging #298388 as is, because that PR simplifies the rocblas build in a way that would otherwise exceed the cache size limit as well.)

ulrikstrid · 2024-04-14T07:33:28Z

#298388 (comment)

This comment is interesting, maybe we can drop those versions that are mentioned? This can be done in a separate PR however so that we can merge this without anything that could be considered controversial sneaking in.

mschwaig · 2024-04-14T09:49:46Z

I agree that dropping those ISAs is worth looking into.
I have opened issue #303994 for it, since I do not have the time to open a PR for it in the coming week.

mschwaig · 2024-04-16T07:43:42Z

Unfortunately I made a mistake when testing this PR and this segfaults when properly tested with llama.cpp:

#0  0x00007fffec3bb452 in amd::Elf::getElfSize(void const*) () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#1  0x00007fffec035025 in hip::CodeObject::ElfSize(void const*) () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#2  0x00007fffec0b043e in hip::FatBinaryInfo::ExtractFatBinaryUsingCOMGR(std::vector<hip::Device*, std::allocator<hip::Device*> > const&) ()
   from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#3  0x00007fffec0b1b6e in hip::FatBinaryInfo::ExtractFatBinary(std::vector<hip::Device*, std::allocator<hip::Device*> > const&) () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#4  0x00007fffec038add in hip::StatCO::digestFatBinary(void const*, hip::FatBinaryInfo*&) () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#5  0x00007fffec276a2c in PlatformState::digestFatBinary(void const*, hip::FatBinaryInfo*&) () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#6  0x00007fffec276430 in PlatformState::init() () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#7  0x00007fffec04d0f6 in hip::init(bool*) () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#8  0x00007fffec066dcd in void std::__invoke_impl<void, void (&)(bool*), bool*>(std::__invoke_other, void (&)(bool*), bool*&&) () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#9  0x00007fffec066d9d in std::__invoke_result<void (&)(bool*), bool*>::type std::__invoke<void (&)(bool*), bool*>(void (&)(bool*), bool*&&) ()
   from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#10 0x00007fffec066d6c in std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&)::{lambda()#1}::operator()() const ()
   from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#11 0x00007fffec066d44 in std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&)::{lambda()#1}>(void (&)(bool*))::{lambda()#1}::operator()() const () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#12 0x00007fffec066d11 in std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&)::{lambda()#1}>(void (&)(bool*))::{lambda()#1}::__invoke() () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#13 0x00007fffebbf50df in __pthread_once_slow () from /nix/store/ddwyrxif62r8n6xclvskjyy6szdhvj60-glibc-2.39-5/lib/libc.so.6
#14 0x00007fffec05e5eb in __gthread_once(int*, void (*)()) () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#15 0x00007fffec061fe1 in void std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&) () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#16 0x00007fffec077144 in hipGetDeviceCount () from /nix/store/md9m0qlr5qzv1yswqqhx4md4kj8njawf-clr-6.0.2/lib/libamdhip64.so.6
#17 0x00007ffff6515442 in (anonymous namespace)::TensileHost::TensileHost() () from /nix/store/rydjpmr5awlzznyp4a1r3fhwvyb4gnaj-rocblas-6.0.2/lib/librocblas.so.4
#18 0x00007ffff65098de in (anonymous namespace)::get_library_and_adapter(std::shared_ptr<Tensile::MasterSolutionLibrary<Tensile::ContractionProblem, Tensile::ContractionSolution> >*, std::shared_ptr<hipDeviceProp_tR0600>*, int) () from /nix/store/rydjpmr5awlzznyp4a1r3fhwvyb4gnaj-rocblas-6.0.2/lib/librocblas.so.4
#19 0x00007ffff7e32d56 in ggml_cuda_init() () from /nix/store/9sq0q30g65xyd02yzvixfl39y6wdw5df-llama-cpp-rocm-0.0.0/lib/libllama.so
#20 0x00007ffff7e33399 in ggml_backend_cuda_get_device_count () from /nix/store/9sq0q30g65xyd02yzvixfl39y6wdw5df-llama-cpp-rocm-0.0.0/lib/libllama.so
#21 0x000055555559c31f in _GLOBAL__sub_I_llama_bench.cpp ()
#22 0x00007fffebb8c23e in __libc_start_main_impl () from /nix/store/ddwyrxif62r8n6xclvskjyy6szdhvj60-glibc-2.39-5/lib/libc.so.6
#23 0x0000555555572275 in _start ()

mschwaig · 2024-04-16T08:05:49Z

clang-offload-bundler in rocmPackages.llvm.openmp also needs rocmPackages.llvm.llvm on it's path to even get there.

@GZGavinZhao do you happen to have a working version of those the patch for llvm/clang?
I saw in llvm/llvm-project#67162 that you have GZGavinZhao/rocm-llvm-project@c859f20

ulrikstrid · 2024-04-16T11:50:27Z

Should we roll back this @mschwaig ? Or does someone have time to fix it before 24.05?

GZGavinZhao · 2024-04-16T15:32:39Z

@GZGavinZhao do you happen to have a working version of those the patch for llvm/clang?

I couldn't get LLVM to build with my patches. IIRC it would fail during the test phase, which was sort of expected since I removed the test file changes in the patches to prevent conflicts, but I didn't have time to dig deep into it.

From the stack trace, it seems like the problem is not with LLVM's side, but that the HIP clr runtime and/or ROCm-CompilerSupport doesn't support compressed device binaries yet.

I was able to reproduce this segmentation fault just with the vectoradd_hip example file (hipcc vectoradd_hip.cpp -o vectoradd_hip). However, I'd note that upon inspecting the verbose compilation output, it seems like clang-offload-bundler compress the device binary regardless of whether --offload-compress was passed. You can verify this by adding the --verbose flag when compiling and you would see this

 "/nix/store/y44kwxkk8pc4wdj0gqhnbdzjm4c7w82w-rocm-llvm-clang-6.0.2/bin/clang-offload-bundler" -type=o -bundle-align=4096 -targets=host-x86_64-unknown-linux,hipv4-amdgcn-amd-amdhsa--gfx1032,hipv4-amdgcn-amd-amdhsa--gfx90c -input=/dev/null -input=/tmp/vectoradd_hip-gfx1032-406a80.out -input=/tmp/vectoradd_hip-gfx90c-2193ac.out -output=/tmp/vectoradd_hip-71eb0a.hipfb -verbose
Compressed bundle format version: 1
Compression method used: zlib
Binary size before compression: 67184 bytes
Binary size after compression: 27668 bytes
Truncated MD5 hash: 0x5296b68059d8fbbe
===-------------------------------------------------------------------------===
                     Timer group for clang offload bundler
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0051 seconds (0.0051 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0049 ( 99.4%)   0.0000 (  0.0%)   0.0049 ( 96.8%)   0.0050 ( 96.8%)  Compression time
   0.0000 (  0.6%)   0.0001 (100.0%)   0.0002 (  3.2%)   0.0002 (  3.2%)  Hash calculation time
   0.0050 (100.0%)   0.0001 (100.0%)   0.0051 (100.0%)   0.0051 (100.0%)  Total

This is weird because according to llvm/llvm-project#67162, device binary compression should be off by default, but it seems like clang-offload-bundler would compress the device binary even if it doesn't receive the -compress flag.

mschwaig · 2024-04-16T16:39:51Z

Thanks for the minimal test case and all of the additional info.

Compression always being on is due to the wrapper I added here:

nixpkgs/pkgs/development/rocm-modules/6/llvm/stage-1/clang-unwrapped.nix

Lines 42 to 51 in 772dbad

    
             extraPostInstall = '' 
        
               mv bin/clang-tblgen $out/bin 
        
               # add wrapper to compress embedded accelerator-specific code 
        
               # this makes the output of composable_kernel significantly smaller right now 
        
               # TODO: remove this once ROCm does it out of the box 
        
               mv $out/bin/clang-offload-bundler $out/bin/clang-offload-bundler-unwrapped 
        
               makeWrapper $out/bin/clang-offload-bundler-unwrapped $out/bin/clang-offload-bundler \ 
        
                 --add-flags '-compress' 
        
             '';

mschwaig · 2024-04-16T17:34:26Z

Should we roll back this @mschwaig ? Or does someone have time to fix it before 24.05?

I don't quite understand the timelines involved, as far as I know this would have to be fixed until tomorrow, to get into the release.

Until tomorrow (or within the next week even) I don't think this change can be salvaged.
I could spend more time to look into it starting in a few days, and maybe find and easy fix but the much more likely outcome is that this is not possible yet, because not all required components support it and we have to look at my previous proposal for working around the size limit again (f199cf1).

We could

revert the changes in this PR, or
replace them with what I proposed previously.

I would assume that getting around the Hydra size limit with either solution is something we could backport into the release later on as well (maybe after we have something working on unstable for some time)?
In that case I think we should just revert this PR.

GZGavinZhao · 2024-04-17T00:25:56Z

I personally suggest a revert as well. I just dug into clr and comgr code to check and found that even AMD themselves currently have no support for compressed device binaries. This means that we would essentially have to implement support for compressed device binaries in comgr by ourselves. It shouldn't be hard, but I don't think it's something doable in one or two days. A week is probably enough, but if the deadline for 24.05 is tomorrow, then I suggest we should just revert the PR.

wegank · 2024-04-17T00:39:56Z

The deadline is certainly not tomorrow, as it only applies to breaking changes to Release Critical Packages.

EDIT: reverted in #304672.

surfaceflinger · 2024-04-17T08:33:13Z

Not sure how ROCm works, but if we're not getting ROCm in Hydra in 24.05 then would it be possible instead to make an override to build a single target? (if that's even possible)
Could be done as a module that would have effect on everything that uses ROCm. Tried building ollama yesterday on OC'd 5600x and it was like 2-3 hours before failing :(

GZGavinZhao · 2024-04-20T23:59:30Z

Thanks to a very new PR (4 days ago) llvm/llvm-project#88827, I managed to make compressed device binaries work for vectoradd_hip.cpp. Next week I will start cleaning up my work locally, try backporting the commit, and run tests on composable_kernel.

ulrikstrid · 2024-04-21T08:04:47Z

@GZGavinZhao please ping me for testing etc

ofborg bot requested review from Flakebi and Madouura March 27, 2024 22:12

ofborg bot added 11.by: package-maintainer This PR was created by the maintainer of the package it changes 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Mar 27, 2024

mschwaig marked this pull request as ready for review March 28, 2024 10:30

ulrikstrid requested review from SuperSandro2000 and wegank March 28, 2024 13:20

mschwaig mentioned this pull request Apr 1, 2024

rocmPackages: extend ISA compatibility #298388

Merged

13 tasks

mschwaig marked this pull request as draft April 4, 2024 11:52

mschwaig force-pushed the compress-composable-kernel branch from f199cf1 to b9f41cf Compare April 5, 2024 19:38

ofborg bot requested review from lovesegfault and acowley April 5, 2024 20:03

ofborg bot added 10.rebuild-darwin: 1-10 10.rebuild-linux: 101-500 and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin labels Apr 5, 2024

mschwaig force-pushed the compress-composable-kernel branch 3 times, most recently from 7fd6313 to f7b817c Compare April 7, 2024 14:55

mschwaig added 2 commits April 7, 2024 17:09

rocmPackages.llvm: replace --replace with --replace-fail (cleanup)

772dbad

mschwaig force-pushed the compress-composable-kernel branch from f7b817c to 772dbad Compare April 7, 2024 15:10

mschwaig marked this pull request as ready for review April 7, 2024 15:29

mschwaig mentioned this pull request Apr 8, 2024

Build failure: rocblas times out on hydra #301937

Closed

ulrikstrid approved these changes Apr 12, 2024

View reviewed changes

wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Apr 12, 2024

ulrikstrid mentioned this pull request Apr 14, 2024

NixOS 24.05 — Feature Freeze & Release Blockers #303286

Closed

39 tasks

wegank merged commit e8d27c9 into NixOS:master Apr 14, 2024
23 checks passed

wegank mentioned this pull request Apr 17, 2024

Revert "rocmPackages.composable_kernel: compress output" #304672

Merged

GZGavinZhao mentioned this pull request Apr 22, 2024

rocmPackages: use compressed device binaries to save space #305920

Draft

15 tasks

mschwaig mentioned this pull request Jun 10, 2024

rocmPackages.composable_kernel: compress composable_kernel using extra derivation #318926

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rocmPackages.composable_kernel: compress output #299589

rocmPackages.composable_kernel: compress output #299589

mschwaig commented Mar 27, 2024 •

edited

Loading

mschwaig commented Mar 28, 2024

ulrikstrid commented Mar 28, 2024

SuperSandro2000 commented Mar 31, 2024

GZGavinZhao commented Apr 3, 2024

GZGavinZhao commented Apr 3, 2024 •

edited

Loading

mschwaig commented Apr 3, 2024

mschwaig commented Apr 3, 2024

mschwaig commented Apr 3, 2024

mschwaig commented Apr 3, 2024

Flakebi commented Apr 3, 2024

mschwaig commented Apr 3, 2024

mschwaig commented Apr 3, 2024

mschwaig commented Apr 4, 2024 •

edited

Loading

nixos-discourse commented Apr 4, 2024

mschwaig commented Apr 7, 2024

ulrikstrid left a comment

mschwaig commented Apr 13, 2024

ulrikstrid commented Apr 14, 2024

mschwaig commented Apr 14, 2024 •

edited

Loading

mschwaig commented Apr 16, 2024

mschwaig commented Apr 16, 2024 •

edited

Loading

ulrikstrid commented Apr 16, 2024 •

edited

Loading

GZGavinZhao commented Apr 16, 2024 •

edited

Loading

mschwaig commented Apr 16, 2024

mschwaig commented Apr 16, 2024

GZGavinZhao commented Apr 17, 2024

wegank commented Apr 17, 2024 •

edited

Loading

surfaceflinger commented Apr 17, 2024

GZGavinZhao commented Apr 20, 2024

ulrikstrid commented Apr 21, 2024

rocmPackages.composable_kernel: compress output #299589

rocmPackages.composable_kernel: compress output #299589

Conversation

mschwaig commented Mar 27, 2024 • edited Loading

Description of changes

Things done

mschwaig commented Mar 28, 2024

ulrikstrid commented Mar 28, 2024

SuperSandro2000 commented Mar 31, 2024

GZGavinZhao commented Apr 3, 2024

GZGavinZhao commented Apr 3, 2024 • edited Loading

mschwaig commented Apr 3, 2024

mschwaig commented Apr 3, 2024

mschwaig commented Apr 3, 2024

mschwaig commented Apr 3, 2024

Flakebi commented Apr 3, 2024

mschwaig commented Apr 3, 2024

mschwaig commented Apr 3, 2024

mschwaig commented Apr 4, 2024 • edited Loading

nixos-discourse commented Apr 4, 2024

mschwaig commented Apr 7, 2024

ulrikstrid left a comment

Choose a reason for hiding this comment

mschwaig commented Apr 13, 2024

ulrikstrid commented Apr 14, 2024

mschwaig commented Apr 14, 2024 • edited Loading

mschwaig commented Apr 16, 2024

mschwaig commented Apr 16, 2024 • edited Loading

ulrikstrid commented Apr 16, 2024 • edited Loading

GZGavinZhao commented Apr 16, 2024 • edited Loading

mschwaig commented Apr 16, 2024

mschwaig commented Apr 16, 2024

GZGavinZhao commented Apr 17, 2024

wegank commented Apr 17, 2024 • edited Loading

surfaceflinger commented Apr 17, 2024

GZGavinZhao commented Apr 20, 2024

ulrikstrid commented Apr 21, 2024

mschwaig commented Mar 27, 2024 •

edited

Loading

GZGavinZhao commented Apr 3, 2024 •

edited

Loading

mschwaig commented Apr 4, 2024 •

edited

Loading

mschwaig commented Apr 14, 2024 •

edited

Loading

mschwaig commented Apr 16, 2024 •

edited

Loading

ulrikstrid commented Apr 16, 2024 •

edited

Loading

GZGavinZhao commented Apr 16, 2024 •

edited

Loading

wegank commented Apr 17, 2024 •

edited

Loading