Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile without AVX512 instructions #276

Closed
schnusch opened this issue Jan 22, 2025 · 2 comments
Closed

compile without AVX512 instructions #276

schnusch opened this issue Jan 22, 2025 · 2 comments

Comments

@schnusch
Copy link

schnusch commented Jan 22, 2025

When running nixfmt on a VPS I get an illegal instruction signal. I initially tried nixfmt from NixOS/nixpkgs@107d5ef (EDIT I also tried NixOS/nixpkgs@a0f3e10, see below) but then tried building it:

$ git checkout 8d4bd69
$ nix-shell
$ cabal new-build
$ ./dist-newstyle/build/x86_64-linux/ghc-9.4.8/nixfmt-0.6.0/x/nixfmt/build/nixfmt/nixfmt < /etc/nixos/*.nix
Illegal instruction (core dumped)

I dug a little, recompiled it with cabal new-build --enable-debug-info=3, and ran it with gdb:

(gdb) continue
[...]
Thread 1 "nixfmt" received signal SIGILL, Illegal instruction.
0x00000000005753b7 in measure_off_avx ()
(gdb) disassemble 
Dump of assembler code for function measure_off_avx:
   0x00000000005753a0 <+0>:     mov    rax,rdi
   0x00000000005753a3 <+3>:     mov    rdi,rsi
   0x00000000005753a6 <+6>:     lea    rsi,[rsi-0x3f]
   0x00000000005753aa <+10>:    mov    rcx,rdx
   0x00000000005753ad <+13>:    cmp    rax,rsi
   0x00000000005753b0 <+16>:    jae    0x5753e8 <measure_off_avx+72>
   0x00000000005753b2 <+18>:    mov    edx,0xffffffbf
=> 0x00000000005753b7 <+23>:    vpbroadcastb zmm0,edx
   0x00000000005753bd <+29>:    jmp    0x5753cc <measure_off_avx+44>
   0x00000000005753bf <+31>:    nop
   0x00000000005753c0 <+32>:    add    rax,0x40
   0x00000000005753c4 <+36>:    sub    rcx,rdx
   0x00000000005753c7 <+39>:    cmp    rax,rsi
   0x00000000005753ca <+42>:    jae    0x5753e8 <measure_off_avx+72>
[...]

This answer on StackOverflow says vpbroadcastb zmm0,edx is a AVX512 instruction. Judging by /proc/cpuinfo my CPU only supports AVX2:

VPS's /proc/cpuinfo
processor       : 0                                                 
vendor_id       : AuthenticAMD                                      
cpu family      : 23                                                
model           : 49                                                                                                                                                                                                                                                              
model name      : AMD EPYC 7282 16-Core Processor                                                                                                                                                                                                                                 
stepping        : 0                                                 
microcode       : 0x830107a                                                                                                              
cpu MHz         : 2794.748                                          
cache size      : 512 KB                                            
physical id     : 0                                                 
siblings        : 4                                                 
core id         : 0                                                 
cpu cores       : 4                                                 
apicid          : 0                                                 
initial apicid  : 0                                                 
fpu             : yes                                               
fpu_exception   : yes                                               
cpuid level     : 16                                                
wp              : yes                                               
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr wbnoinvd arat umip rdpid arch_capabilities          
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso                                 
bogomips        : 5589.49                                           
TLB size        : 1024 4K pages                                     
clflush size    : 64                                                
cache_alignment : 64                                                
address sizes   : 40 bits physical, 48 bits virtual                 
power management:

Weirdly I can run nixfmt from NixOS/nixpkgs@a0f3e10 on my Notebook, which does not seem to support AVX512 either:

Notebook's /proc/cpuinfo
processor       : 0                                                                                                                                                                                                                                                               
vendor_id       : AuthenticAMD                                                                                                                                                                                                                                                    
cpu family      : 23                                                                                                                                                                                                                                                              
model           : 24                                                                                                                                                                                                                                                              
model name      : AMD Ryzen 3 3200U with Radeon Vega Mobile Gfx                                                                                                                                                                                                                   
stepping        : 1                                                                                                                                                                                                                                                               
microcode       : 0x8108102                                                                                                                                                                                                                                                       
cpu MHz         : 3378.967                                                                                                                                                                                                                                                        
cache size      : 512 KB                                                                                                                                                                                                                                                          
physical id     : 0                                                                                                                                                                                                                                                               
siblings        : 4                                                                                                                                                                                                                                                               
core id         : 0                                                                                                                                                                                                                                                               
cpu cores       : 2                                                                                                                                                                                                                                                               
apicid          : 0                                                                                                                                                                                                                                                               
initial apicid  : 0                                                                                                                                                                                                                                                               
fpu             : yes                                                                                                                                                                                                                                                             
fpu_exception   : yes                                                                                                                                                                                                                                                             
cpuid level     : 13                                                                                                                                                                                                                                                              
wp              : yes                                                                                                                                                                                                                                                             
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es                             
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso div0 ibpb_no_ret                                                                                                                                                         
bogomips        : 5190.28                                                                                                                                                                                                                                                         
TLB size        : 2560 4K pages                                                                                                                                                                                                                                                   
clflush size    : 64                                                                                                                                                                                                                                                              
cache_alignment : 64                                                                                                                                                                                                                                                              address sizes   : 43 bits physical, 48 bits virtual                                                                               
power management: ts ttp tm hwpstate eff_freq_ro [13] [14]

How do I tell cabal to build nixfmt without AVX512? (or ideally how would I override pkgs.nixfmt-rfc-style to do that?)

@schnusch
Copy link
Author

It seems wrong cpuid extended feature flags are reported for/by the AMD EPYC 7282 processor.

measure_off_avx comes from haskell's text package, bundled with ghc.

https://github.com/haskell/text/blob/1ae86be323ff0561a313810303f779add0f7da76/cbits/measure_off.c#L46-L52

The call to __get_cpuid_count seems to report wrong results, because the AMD EPYC 7282 is not supposed to support AVX512 per its specs, but the following test program returns support for AVX512:

#include <cpuid.h>
#include <stdint.h>
#include <stdio.h>

static void print_register_bits(uint32_t x, const char *s) {
    for(size_t i = 0; i < 32; ++i) {
        printf("%s & (1 << %zu) = %lu\n", s, i, (unsigned long)(x & ((uint32_t)1 << i)));
    }
}

int main(void) {
    uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
    __get_cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
    // https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features
    print_register_bits(ebx, "ebx");
    return 0;
}

Namely the following extended feature flags which should not be supported:

  • avx512-f ebx & (1 << 16)
  • avx512-dq ebx & (1 << 17)
  • avx512-ifma ebx & (1 << 21)
  • avx512-cd ebx & (1 << 28)
  • avx512-bw ebx & (1 << 30)
  • avx512-vl ebx & (1 << 31)

My current fix is to override pkgs.haskell.packages.ghc966 and add a patch where has_avx512_vl_bw always reports no support for AVX512.

final: prev: {
  haskell = prev.haskell // {
    packages = prev.haskell.packages // {
      ghc966 = prev.haskell.packages.ghc966.override (prevArgs: {
        ghc = prevArgs.ghc.overrideAttrs (prevAttrs: {
          patches = (prevAttrs.patches or [ ]) ++ [
            ./haskell-text-no-avx512.patch
          ];
        });
      });
    };
  };
}
diff --git a/cbits/measure_off.c b/cbits/measure_off.c
index 5f098f8..231f4d3 100644
--- a/libraries/text/cbits/measure_off.c
+++ b/libraries/text/cbits/measure_off.c
@@ -42,6 +42,7 @@
 
 #if defined(__x86_64__) && defined(COMPILER_SUPPORTS_AVX512)
 bool has_avx512_vl_bw() {
+  return false;
 #if (__GNUC__ >= 7 || __GNUC__ == 6 && __GNUC_MINOR__ >= 3) || defined(__clang_major__)
   uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
   __get_cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);

Overriding Haskell packages proved quite complicated (pkgs.haskell.packages.text is null because text is actually provided by GHC) and I am still not totally satisfied with the above overlay.

@infinisil
Copy link
Member

Great detective work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants