-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
supporting Arm Neoverse V2 CPUs: NVIDIA Grace, AWS Graviton 4, Google Axion #845
Comments
I was wondering if we inspect what instructions our binaries actually require? |
Seems to be possible: https://github.com/pkgw/elfx86exts (says it works for x86 and ARM) |
We could inspect the instructions required by the software, and have CI complain if it goes outside the common options. |
Step 0: |
Sample output from $ elfx86exts $(which elfx86exts)
File format and CPU architecture: Elf, X86_64
MODE64 (call)
CMOV (cmova)
BMI2 (mulx)
AVX (vpxor)
NOVLX (vpxor)
AVX2 (vpbroadcastq)
BMI (tzcnt)
SSE2 (pause)
SSE1 (xorps)
BWI (vpbroadcastb)
VLX (vpbroadcastb)
AVX512 (kortestw)
DQI (kshiftrb)
Instruction set extensions used: AVX, AVX2, AVX512, BMI, BMI2, BWI, CMOV, DQI, MODE64, NOVLX, SSE1, SSE2, VLX
CPU Generation: Unknown |
Instructions set extensions cover a range of assembly instructions relevant to different categories (security, performance, AI/ML, Cryptography,...). We could consider keeping a catalogue of what assembly instructions are required for each software package. We'd still need a map from the assembly instructions back to the CPU features to know whether or not a particular CPU can run the code for a particular architecture branch of EESSI. I couldn't easily find this information, so I asked ChatGPT for a helpful table for the NVIDIA Grace (which uses the Armv9-A ISA) and got something that is at the very least a good starting point:
We only really care about a certain set of categories, so small differences in ARM instruction set extensions may not be relevant to the software stacks we ship. |
So a hardware check would be more like asking the question: this software stack requires |
It seems you can check this with a compiler: #include <stdio.h>
int main() {
__asm__ volatile ("mov %%eax, %%eax" ::: "eax");
printf("Instruction executed!\n");
return 0;
} I wonder if we can compile a small executable with the full list of instructions for the stack and then just run that on the CPU? If it doesn't throw an "Illegal instruction" then it is supported? Is that enough? Could things like cache sizes etc. have an impact? EDIT: Tested the AI generated code above, doesn't seem to work out-of-the-box, but could it work in principle? |
Indeed, in principle that approach could work: ocaisa@LAPTOP-O6HF2IKC:~$ cat test_instruction_sapphaire_rapids.c
#include <stdio.h>
int main() {
__asm__ volatile (".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0"); // Encodes `tilezero tmm0`
printf("AMX instruction executed!\n");
return 0;
}
ocaisa@LAPTOP-O6HF2IKC:~$ gcc test_instruction_sapphaire_rapids.c
ocaisa@LAPTOP-O6HF2IKC:~$ ./a.out
Illegal instruction (core dumped) |
A couple more AI-generated proof-of-concepts [
{
"name": "AVX vaddps",
"assembly": "vaddps %xmm0, %xmm1, %xmm2"
},
{
"name": "AMX tilezero",
"assembly": ".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0"
},
{
"name": "SSE movaps",
"assembly": "movaps %xmm0, %xmm1"
}
] you could have a python code generator like import json
# Template for the generated C file (Fixed escaping `{}` using double `{{}}`)
C_TEMPLATE = """\
#include <stdio.h>
#include <setjmp.h>
#include <signal.h>
// Global variable for signal handling
sigjmp_buf buf;
// Signal handler for illegal instructions
void sigill_handler(int sig) {{
siglongjmp(buf, 1); // Jump back to safety
}}
// Function to test an instruction
void test_instruction(const char *name, void (*instr_func)()) {{
printf("[*] Testing: %s... ", name);
if (sigsetjmp(buf, 1) == 0) {{
instr_func(); // Run the instruction
printf("Success\\n");
}} else {{
printf("Failed (Illegal Instruction)\\n");
}}
}}
// Instruction test functions
{function_definitions}
int main() {{
// Set up signal handler for SIGILL
signal(SIGILL, sigill_handler);
printf("\\n=== CPU Instruction Test ===\\n");
// Run all tests
{function_calls}
printf("\\n=== Test Complete ===\\n");
return 0;
}}
"""
def generate_c_code(instructions):
"""Generates C code based on the provided instructions."""
function_definitions = []
function_calls = []
for instr in instructions:
func_name = f"test_{instr['name'].replace(' ', '_').lower()}"
function_definitions.append(f"void {func_name}() {{ __asm__ volatile (\"{instr['assembly']}\\n\"); }}")
function_calls.append(f" test_instruction(\"{instr['name']}\", {func_name});")
return C_TEMPLATE.format(
function_definitions="\n".join(function_definitions),
function_calls="\n".join(function_calls)
)
def main():
# Load JSON file
with open("instructions.json", "r") as f:
instructions = json.load(f)
# Generate C code
c_code = generate_c_code(instructions)
# Write to file
with open("test_instructions.c", "w") as f:
f.write(c_code)
print("[+] Generated 'test_instructions.c' successfully!")
if __name__ == "__main__":
main() and then ocaisa@LAPTOP-O6HF2IKC:~$ python generate_c_code.py
[+] Generated 'test_instructions.c' successfully!
ocaisa@LAPTOP-O6HF2IKC:~$ gcc test_instructions.c
ocaisa@LAPTOP-O6HF2IKC:~$ ./a.out
=== CPU Instruction Test ===
[*] Testing: AVX vaddps... Success
[*] Testing: AMX tilezero... Failed (Illegal Instruction)
[*] Testing: SSE movaps... Success
=== Test Complete === I could see having individual |
To me, this looks promising: [ocaisa@login1 ~]$ srun --partition=x86-64-generic-node --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
[ocaisa@x86-64-generic-node1 ~]$ ./a.out
=== CPU Instruction Test ===
[*] Testing: AVX vaddps... Success
[*] Testing: AMX tilezero... Failed (Illegal Instruction)
[*] Testing: SSE movaps... Success
[*] Testing: AVX512 vaddps... Failed (Illegal Instruction)
[*] Testing: AVX512 vpdpbusd... Failed (Illegal Instruction)
[*] Testing: FMA vfmadd231ps... Success
[*] Testing: BMI2 pext... Success
[*] Testing: BMI2 mulx... Success
[*] Testing: SHA sha256rnds2... Failed (Illegal Instruction)
[*] Testing: TSX xbegin... Success
=== Test Complete ===
[ocaisa@x86-64-generic-node1 ~]$ exit
exit
[ocaisa@login1 ~]$ srun --partition=x86-64-intel-skylake-node --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
[ocaisa@x86-64-intel-skylake-node1 ~]$ ./a.out
=== CPU Instruction Test ===
[*] Testing: AVX vaddps... Success
[*] Testing: AMX tilezero... Failed (Illegal Instruction)
[*] Testing: SSE movaps... Success
[*] Testing: AVX512 vaddps... Success
[*] Testing: AVX512 vpdpbusd... Success
[*] Testing: FMA vfmadd231ps... Success
[*] Testing: BMI2 pext... Success
[*] Testing: BMI2 mulx... Success
[*] Testing: SHA sha256rnds2... Failed (Illegal Instruction)
[*] Testing: TSX xbegin... Success
=== Test Complete ===
[ocaisa@x86-64-intel-skylake-node1 ~]$ exit
exit
[ocaisa@login1 ~]$ srun --partition=x86-64-intel-srapids-node --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
[ocaisa@x86-64-intel-srapids-node1 ~]$ ./a.out
=== CPU Instruction Test ===
[*] Testing: AVX vaddps... Success
[*] Testing: AMX tilezero... Success
[*] Testing: SSE movaps... Success
[*] Testing: AVX512 vaddps... Success
[*] Testing: AVX512 vpdpbusd... Success
[*] Testing: FMA vfmadd231ps... Success
[*] Testing: BMI2 pext... Success
[*] Testing: BMI2 mulx... Success
[*] Testing: SHA sha256rnds2... Success
[*] Testing: TSX xbegin... Success
=== Test Complete === |
While looking into implementing support in
archdetect
for detecting the Neoverse V2-based Graviton 4, I noticed that the CPUs that implement this microarchitecture only partially overlap (based on https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html):sve2
paca
pacg
rng
sm3
sm4
svesm4
ssbs
Google Axion not supported
ssbs
is particularly interesting, since that means that ouraarch64/neoverse_v1
installations may not even work there...The text was updated successfully, but these errors were encountered: