-
Notifications
You must be signed in to change notification settings - Fork 692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AES-NI #1433
Comments
AES instructions differ very substantially between x86 and ARM. The proposal should address a way to unify over these differences. |
POWER also has AES instructions, FWIW:
They were all added in POWER9 (ISA 3.0). That said, I don't think these fit here since they're not really relaxed. Is there anywhere to gather ideas for a second version of the simd proposal? |
@Maratyszcza I added some details for x84 and ARM. ARM has slightly more low level intrinsics but it seems all fit into one API. Later I'll add couple more instructions related to AES |
I added another operation - |
Looks reasonable to me. |
+1 to these being useful. @nemequ we can consider them 'relaxed' if we extend the definition to "each 128-bit block within a vector", right? SVE2 does that (unfortunately it's an optional extension), and that would also cover x86 VAES. |
@jan-wassenberg, AFAIK, this proposal is limited to 128 bit operations, however we can probably consider it SIMD 2.0 of sorts, which should let us include these operations. I would be interested to pull this into flexible vectors to have the semantics you described 😉 |
@penzn OK :) I'd also welcome both v128 and flexible AES. |
Explicit instructions for AES support are useful but I share the concern/question that it doesn't really fit here. What SIMD instructions are being relaxed here? Don't we expect the same result no matter the platform? What's the fall back for the VM if a VM doesn't support these relaxed instructions? |
You can follow the same strategy as with It is also worth remembering that these instructions are executed on constant time, which is quite important in term of security.
|
I recently implemented a constant-time fallback using basic SIMD only. |
As I know most problematic for non-SIMD emulation is Galois field Multiplication |
This reminds me of some instructions we have added late in the SIMD proposal. I think it would be doable, but we would need to prototype and measure.
I don't think we should worry about scalar-only VMs, especially after SIMD making it to stage 5. |
Just to be pedantic, I don't think we should restrict this proposal to "relaxed" variants of existing SIMD instructions, assuming that that's what you were intending to say here. IMO any 128-bit SIMD operation whose best implementation might have different corner case behavior on some platform of interest should be fair game for the proposal (fma being a case in point). |
Actually, I don't mind if it's a |
Does this have different behavior on different platforms? FMA fits because the results will be different on different platforms since the operations may or may not actually be fused. The summary for this proposal is:
So unless I'm missing something (definitely possible, I'm not really familiar with these work internally) this seems out of scope to me. FWIW, I'd very much like to start gathering potential instructions for a SIMD 2.0 extension, and think this would be a great addition to that list. |
Again, being pedantic, FMA is not a "relaxed" version of any previous instruction, it is a new instruction with relaxed behavior. IMO all 128-bit instructions with relaxed behavior are in scope. I don't know whether the AES instructions fall into that category; I was simply reacting to @jlb6740's wording and wanted to clarify that instructions with new behaviors are not out of scope for relaxed-simd. |
As I understand it Relaxed SIMD is a category of SIMD operations which do not guarantee deterministic behavior (e.g. with NaN or signed zero) or accuracy (as in the case of FMA instruction on platforms where it is not available). In the case of AES we have only one non-determinism: that we cannot guarantee same performance (cost model) for all platforms. In fact, many of the already standardized Fixed SIMD instructions cannot guarantee this either. So as far as I'm concerned this can really be related to SIMD 2.0 |
I think we agree that these AES instructions don't fit into relaxed-simd, but we also don't have a SIMD 2.0 proposal. As concrete next step, we can approach this as a new feature and follow the process that we have in place for new features, starting with filing an issue on the design repository (I can also transfer this issue there). It is up to interested participants to scope out the proposal: it could be a SIMD 2.0 proposal consisting of even more SIMD instructions, or it could be only AES instructions (as detailed here), or it could be a broader crypto instructions proposal. This will be driven by the proposal champion(s). I can provide pointers and help guide the process if @MaxGraey (or anyone else) is willing to take this on. Update: in the 2021-08-06 SIMD sync, we discussed a potential streamlined/lightweight process to make small changes to the spec, this is not confirmed yet, but could also work for AES (if/when it happens). |
@MaxGraey, from discussion at the SIMD subgroup meeting, we should probably highlight some use cases for these instructions, then it can be taken to the CG. |
@penzn "What use cases are there?" point at the end takes some use cases. To summarize, these instructions are used mainly in two cases: to speed up AES encryption/decryption and to speed up сryptographic and non-сryptographic hash functions. Cryptographic hashes can often be used for databases and for very simple HashMap / HashSet containers with high resistance to HashDoS attacks. |
Thanks @penzn and @MaxGraey this sounds like a good start, I'll transfer this issue to the design repository as it is out of scope for relaxed-simd. Please note that a streamlined/lightweight process will still need a champion and satisfy the requirements outlined in the Phases document. |
I'm not sure if this is the right place for this proposal, but I would like to suggest some very useful commands to speed up cryptography and especially cryptographic and non-cryptographic hashing.
What are the instructions being proposed?
AES-NI (Advanced Encryption Standard New Instructions) is extended instruction set which accelerate AES encryption / decription.
v128.aes.enc(a, b)
Perform one round of an AES decryption flow
general:
x86:
aesenc
ARM:
AESMC + AESE + EOR
PPC:
vcipher
v128.aes.enc_last(a, b)
Perform the last round of an AES decryption flow
general:
x86:
aesenclast
ARM:
AESE + EOR
PPC:
vcipherlast
v128.aes.dec(a, b)
Perform one round of an AES decryption flow
general:
x86:
aesdec
ARM:
AESIMC + AESD + EOR
PPC:
vncipher
v128.aes.dec_last(a, b)
Perform the last round of an AES decryption flow
general:
x86:
aesdeclast
ARM:
AESD + EOR
PPC:
vncipherlast
v128.aes.keygen(a, imm8)
Generating the round keys used for encryption
general:
x86:
aeskeygenassist
ARM: Efficient emulation on ARM (See emulating-x86-aes-intrinsics-on-armv8-a):
PPC:
Details about operations like
MixColumns
,ShiftRowsInv
and etc see Intel's white paperHow does behavior differ across processors? What new fingerprinting surfaces will be exposed?
What use cases are there?
The text was updated successfully, but these errors were encountered: