-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/sha256: add native SHA256 instruction implementation for AMD64 #50543
Comments
Change https://golang.org/cl/377514 mentions this issue: |
Roughly what percentage of consumer CPUs are expected to have these instructions? Following https://en.wikipedia.org/wiki/Intel_SHA_extensions, it seems like it should be most Intel since Ice Lake (2019) and AMD since Zen (2017), so it seems significantly newer than x86-64-v3, working on CPUs released back in 2015. Not pushing back by any means, I just wonder how many users should expect to see an improvement e.g. once 1.19 releases in 8 months. |
@mvdan My specific impetus here is less about the consumer space. I probably should have noted this in the description, but the impetus for me is measurably improving the speed of content addressable storage using sha256, specifically container image generation and verification. So speeding up pretty much every container build and container registry access on x86_64. Because CI/CD pipelines for such tasks tend to run in datacenters (which are refreshed about on a 3 year cycle because of perf/watt concerns), and because developer local systems tend to be refreshed at a quicker pace than most consumer systems there should be more than a critical mass of users receiving benefits from this. Because of the developer benefits, I'd personally love to see this in 1.19. |
CC @golang/security |
Re licensing, the code needs to not introduce new notice requirements. What is the exact license on the code you are proposing to add? Also, if it's just an instruction we can use, it seems like we could write our own assembly and not worry about copying someone else's. /cc @FiloSottile |
Re licensing again, the other code in that file was contributed directly by Intel using Go's CLA in https://go-review.googlesource.com/22608, so we were not just "copying" the Linux version and therefore do not have to use that license. The right path might be to try to get Intel to contribute this one too. |
This proposal has been added to the active column of the proposals project |
@rsc Re licensing comment 1: The original is dual licensed GPLv2 and a 3-Clause BSD which exactly matches golang's license text. An additional license clause on binaries shouldn't be necessary for golang to include this AFAIUI. Re licensing comment 2: I misunderstood the git blame, so that's on me. I investigated that particular part of the problem before I had access to gerrit. What I will say is that the prologue and epilogue are different, so the inner loop is all that's taken from there, and there's essentially only one way, being dependent on hardware accel instructions which have implicit register arguments. That inner loop additionally appears to be documented by Intel as the only performance stable way for it to be implemented in their whitepaper. My guess is they want to use macro-op fusion on some of these ops in the future. I would hesitate going off of the beaten path there. |
@monocasa A direct copy of the code in https://github.com/torvalds/linux/blob/master/arch/x86/crypto/sha256_ni_asm.S would require an additional license. That code says, among other things, "Copyright(c) 2015 Intel Corporation." and "Redistributions of source code must retain the above copyright notice." It's true that the license has the same restrictions as the Go license, but the copyright is different, and as such the new license would need to be included by anybody who might be using the Go standard library with that code included. We're trying to avoid license proliferation for people who distribute Go code. Thanks. (But copyright only protects individual expression, not ideas, so it's quite possible that the same sequence of instructions is not covered by copyright if that is the only way to write those instructions. But let's avoid that argument if we can.) |
@ianlancetaylor Ultimately I stand by the idea that the code here isn't ultimately derived in a way which would impede as simply the algorithm itself is taken (as stated in the comment), and there's sort of one performant way to do this, and it's what's documented in Intel's white papers as the correct use of the instructions. That being said, I understand, and obviously defer to the golang maintainers here. I would ask that if this is determined to be a non starter, then at least I'd appreciate the chance to rewrite it (with probably a slight performance degradation, and an ongoing performance related technical debt for using a non recommended code sequence) rather than simply closing the proposal. |
I appreciate that the code is commented and fits in the existing structure, but the assembly policy is also about testing, and I don't see anything about making sure that both this code and the code it hides do the right thing. Do we even have test coverage on our builders for these instructions? Would we notice a regression if we changed test infrastructure? Moreover, there are enough new-to-me instructions in here that I don't think I can do a useful review of the code itself. The easiest solution for that would be to get Intel to do a review while approving the submission under the CLA. |
@FiloSottile The functionality of this code is covered under existing sha256 tests. Additionally any bug in this code leaves you with a compiler that can't get to the unit tests since go internally uses sha256 to address intermediate objects it seems (the compiler is very unhappy when this code is wrong, ends up printing intermediate objects as strings, and lays waste to my tty requiring a shell reset). On top of that, my main concern over the existing test vectors was overflowing the DATA_PTR at a 32 bit boundary since golang's assembler doesn't make it as easy to track if you're using 32bit or 64bit registers as GAS does. I did this using AES to create a deterministic but pseudo random 8GB byte stream in a slice, and validated this new test vector against existing implementations both in golang, and externally via libsodium. I didn't include this as adding close to a minute of test time in the case of a lack of sha hardware instructions or avx2 support seemed gratuitous compared to the existing compiler build time. Is there a location for long relatively running tests like that to be submitted into the tree? I know I would feel better with that under more tests. Beyond that, I attempted to follow the existing nuances WRT to commenting (particularly following the standard set in other crypto hardware accel asm functions) and making the code easy to follow, but I can certainly add to that. It's a little under commented for my general asm sensibilities, but I took a 'when in Rome' approach. I'd love to add anything that would help. Towards that end (and because I don't like the idea of golang accepting code they don't understand whether from Intel or from me), here's some of what I left off that I would normally have added (and have negative qualms with adding if desired):
followed by rounds 20-23
to make this rotation clearer but decided that unused variables to a #define was a larger sin. I welcome calls to change that back. Form there at rounds 52-59, we no longer have to message schedule for the next rounds, because that work has already been done in previous round, so the SHA256MSG1 can be left off (and should for correctness). Finally rounds 60-63 have significantly less work as the epilogue of the unrolled loop and can simply mix in the existing schedule, and perform the rounds, not having to do any additional work for non existent later rounds. WRT the instructions themselves, this patch doesn't add them to the compiler. They've existed in golang for nearly four years, having been added to cmd/internal/obj/x86 by Intel when they added AVX-512 support. However if additional test coverage is required there, I'm more than happy to add it. Can you suggest an example to follow? As for "would we notice a regression", I imagine that Google has been using Intel chips manufactured in the past 3 years or any of the AMD Zen chips in it's test infrastructure. Because of how the new instructions are core to the round and message scheduling any miscompilation of the shani instructions would both cause the sha256 test vectors to fail, and leave you with a useless compiler. If a miscompilation of the accel instructions that didn't result in failures in the test vectors, this would represent most of the way to a viable collision attack on sha256 proper because of how each round message schedule is necessary to fully smear the entropy in a way to get a compliant hash. Finally, I of course welcome a review from Intel, but lack knowledge as to Intel's current relationship with golang. Is making that a requirement a viable option for moving forward, or does that practically kill the proposal? |
I'd like to also mention that I opened #48720 a while ago which implements this and I've implemented that with specifically only the Intel reference documentation. I've also address already a few bits of feedback there but there's still a bunch of open questions. |
@dbussink Unfortunately, basing the code off of Intel reference documentation seems to be a more of a no go from a license perspective since the Intel whitepaper's license states
So either the decision is that there isn't enough here to be a true copyright claim and we're allowed take the underlying flow regardless, or the Linux side is safer from a license perspective already being 3-clause BSD from an entity in the AUTHORS file, or neither are safe and we truly require Intel's explicit signoff to use their instructions as described. |
Ah, where did you find this license? I don't see it on https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sha-extensions.html but I might be overlooking something? Although it would seem very surprising that the documentation itself would not allow an implementation based on it licensing wise 😕. |
@dbussink The last page of https://www.intel.com/content/dam/develop/external/us/en/documents/intel-sha-extensions-white-paper.pdf which is the original source of that information in your blog post (and unfortunately the blog post doesn't grant another license). I agree that it's incredibly goofy to give example code and then ban you from using it directly, but I'm not sure else how to interpret that. |
The statement says only that the paper doesn't grant any license to any intellectual property rights. The statement is unnecessary because the white paper doesn't grant any such rights without it, but lawyers like to add those statements to leave nothing in doubt. You don't need a license to use machine instructions in your own code. The code sequences in the white paper are clearly designated as examples, so you can follow them in your own code. However you cannot copy the whitepaper into any repository or copy Intel code from other open source projects into the Go source code. The Contributor License Agreement (CLA) requirement by Google prevents the latter even if the code would have the same license terms as the Go code. |
Licensing isn't based on what makes sense, it's based on the license text. It very clearly says that you aren't granted a license to any IP in the document, which would include copyright over code sequences. Google v Oracle ended with the de minimis defense still squashed by the courts.
We're talking about the exact same code sequences in question both in the white paper and in the Linux kernel. Either it doesn't matter at all because they're both exactly the same code sequences, or neither are valid sources of these sequences, or the source with the same license explicitly designated as 3 clause BSD by an entity already in the AUTHORs file is a valid source of these sequences. |
@monocasa You are arguing here that you need an explicit license by Intel to use Intel(r) machine instructions for their intended purpose and as documented to implement a public standard. If that would be true, we would live in a very poor world. At the end it is not for you and me to assess the legal risk here and whether to accept it. |
@ulikunitz > You are arguing here that you need an explicit license by Intel to use Intel(r) machine instructions for their intended purpose and as documented to implement a public standard. If that would be true, we would live in a very poor world. The document doesn't just say "these are what these instructions do" like normal Intel documentation. It walks through a full fairly large code sequence on the order of a hundred lines or so, that is verbatim what is at question here, down to the variable names and indentation. It is the exact same code sequence in both cases. A legal text at the end granting no license to any IP is very clear in the US. Hence why I found an alternative with a much clearer license that appears more than compatible. That's how I knew about the lack of license being granted by that document in the first place. |
For the Go project, the only way we are going to accept this code is to get Intel to agree to contribute it under our CLA, as they have done in the past. That's the path forward. Arguments about what the law is are beside the point. |
@TocarIP, are you still at Intel and do you know who could agree to contribute the SHANI SHA256 code under CLA as we did for the other implementations? Thanks. |
I'm no longer with Intel. @aregm and @intel-go in general should be able to help with licence issues. FWIW I've sent https://go-review.googlesource.com/c/go/+/135378 when I was at Intel. |
@rsc let me ping the guys who are supposedly in charge. |
addresses proposal #53084 required by sha-256 change list developed for #50543 Change-Id: I5454d746fce069a7a4993d70dc5b0a5544f8eeaf Reviewed-on: https://go-review.googlesource.com/c/go/+/408794 Run-TryBot: Keith Randall <[email protected]> Reviewed-by: Martin Möhrmann <[email protected]> Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]> Run-TryBot: Keith Randall <[email protected]>
Given that it's a small amount of assembly and we've gotten it contributed directly from Intel under CLA now, this seems fine. |
No change in consensus, so accepted. 🎉 |
Change https://go.dev/cl/408795 mentions this issue: |
In go1.21 the go std has a new SHANI accelerated rountine (just like minio's sha256). See golang/go#50543. Reduce the code surface by dropping what is now a redondent librairy. Benchmarks using go1.21 and go1.20 shows no significant difference between minio and the std for sha256: ``` name old time/op new time/op delta SumAllLarge/sha2-256-12 8.63ms ± 3% 8.63ms ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 23.1ms ± 3% 23.0ms ± 3% ~ (p=0.361 n=18+20) SumAllLarge/dbl-sha2-256-12 36.0ms ± 4% 8.6ms ± 4% -76.16% (p=0.000 n=20+20) name old speed new speed delta SumAllLarge/sha2-256-12 1.94GB/s ± 4% 1.95GB/s ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 725MB/s ± 3% 729MB/s ± 3% ~ (p=0.350 n=18+20) SumAllLarge/dbl-sha2-256-12 467MB/s ± 4% 1958MB/s ± 4% +319.53% (p=0.000 n=20+20) ``` `dlb-sha2-256` doubled in speed because it was never wired to use minio's sha256 implementation and thus was bumped by `crypto/sha256`'s improvement. I think we should remove `github.com/multiformats/go-multihash/register/miniosha256` one go1.22 is released (~1 year) since it would be purposeless, we could also stub it to github.com/multiformats/go-multihash/register/sha256 for all versions but [no one imports it directly](https://pkg.go.dev/github.com/multiformats/go-multihash/register/miniosha256?tab=importedby).
In go1.21 the go std has a new SHANI accelerated rountine (just like minio's sha256). See golang/go#50543. Reduce the code surface by dropping what is now a redondent librairy. Benchmarks using go1.21 and go1.20 shows no significant difference between minio and the std for sha256: ``` name old time/op new time/op delta SumAllLarge/sha2-256-12 8.63ms ± 3% 8.63ms ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 23.1ms ± 3% 23.0ms ± 3% ~ (p=0.361 n=18+20) SumAllLarge/dbl-sha2-256-12 36.0ms ± 4% 8.6ms ± 4% -76.16% (p=0.000 n=20+20) name old speed new speed delta SumAllLarge/sha2-256-12 1.94GB/s ± 4% 1.95GB/s ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 725MB/s ± 3% 729MB/s ± 3% ~ (p=0.350 n=18+20) SumAllLarge/dbl-sha2-256-12 467MB/s ± 4% 1958MB/s ± 4% +319.53% (p=0.000 n=20+20) ``` `dlb-sha2-256` doubled in speed because it was never wired to use minio's sha256 implementation and thus was bumped by `crypto/sha256`'s improvement. I think we should remove `github.com/multiformats/go-multihash/register/miniosha256` one go1.22 is released (~1 year) since it would be purposeless, we could also stub it to github.com/multiformats/go-multihash/register/sha256 for all versions but [no one imports it directly](https://pkg.go.dev/github.com/multiformats/go-multihash/register/miniosha256?tab=importedby).
In go1.21 the go std has a new SHANI accelerated rountine (just like minio's sha256). See golang/go#50543. Reduce the code surface by dropping what is now a redondent librairy. Benchmarks using go1.21 and go1.20 shows no significant difference between minio and the std for sha256: ``` name old time/op new time/op delta SumAllLarge/sha2-256-12 8.63ms ± 3% 8.63ms ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 23.1ms ± 3% 23.0ms ± 3% ~ (p=0.361 n=18+20) SumAllLarge/dbl-sha2-256-12 36.0ms ± 4% 8.6ms ± 4% -76.16% (p=0.000 n=20+20) name old speed new speed delta SumAllLarge/sha2-256-12 1.94GB/s ± 4% 1.95GB/s ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 725MB/s ± 3% 729MB/s ± 3% ~ (p=0.350 n=18+20) SumAllLarge/dbl-sha2-256-12 467MB/s ± 4% 1958MB/s ± 4% +319.53% (p=0.000 n=20+20) ``` `dlb-sha2-256` doubled in speed because it was never wired to use minio's sha256 implementation and thus was bumped by `crypto/sha256`'s improvement. I think we should remove `github.com/multiformats/go-multihash/register/miniosha256` one go1.22 is released (~1 year) since it would be purposeless, we could also stub it to github.com/multiformats/go-multihash/register/sha256 for all versions but [no one imports it directly](https://pkg.go.dev/github.com/multiformats/go-multihash/register/miniosha256?tab=importedby).
In go1.21 the go std has a new SHANI accelerated rountine (just like minio's sha256). See golang/go#50543. Reduce the code surface by dropping what is now a redondent librairy. Benchmarks using go1.21 and go1.20 shows no significant difference between minio and the std for sha256: ``` name old time/op new time/op delta SumAllLarge/sha2-256-12 8.63ms ± 3% 8.63ms ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 23.1ms ± 3% 23.0ms ± 3% ~ (p=0.361 n=18+20) SumAllLarge/dbl-sha2-256-12 36.0ms ± 4% 8.6ms ± 4% -76.16% (p=0.000 n=20+20) name old speed new speed delta SumAllLarge/sha2-256-12 1.94GB/s ± 4% 1.95GB/s ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 725MB/s ± 3% 729MB/s ± 3% ~ (p=0.350 n=18+20) SumAllLarge/dbl-sha2-256-12 467MB/s ± 4% 1958MB/s ± 4% +319.53% (p=0.000 n=20+20) ``` `dlb-sha2-256` quadrupled in speed because it was never wired to use minio's sha256 implementation and thus was brought back up to speed by `crypto/sha256`'s improvement. I think we should remove `github.com/multiformats/go-multihash/register/miniosha256` one go1.22 is released (~1 year) since it would be purposeless, we could also stub it to github.com/multiformats/go-multihash/register/sha256 for all versions but [no one imports it directly](https://pkg.go.dev/github.com/multiformats/go-multihash/register/miniosha256?tab=importedby).
In go1.21 the go std has a new SHANI accelerated rountine (just like minio's sha256). See golang/go#50543. Reduce the code surface by dropping what is now a redondent librairy. Benchmarks using go1.21 and go1.20 shows no significant difference between minio and the std for sha256: ``` name old time/op new time/op delta SumAllLarge/sha2-256-12 8.63ms ± 3% 8.63ms ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 23.1ms ± 3% 23.0ms ± 3% ~ (p=0.361 n=18+20) SumAllLarge/dbl-sha2-256-12 36.0ms ± 4% 8.6ms ± 4% -76.16% (p=0.000 n=20+20) name old speed new speed delta SumAllLarge/sha2-256-12 1.94GB/s ± 4% 1.95GB/s ± 1% ~ (p=0.220 n=20+17) SumAllLarge/sha2-512-12 725MB/s ± 3% 729MB/s ± 3% ~ (p=0.350 n=18+20) SumAllLarge/dbl-sha2-256-12 467MB/s ± 4% 1958MB/s ± 4% +319.53% (p=0.000 n=20+20) ``` `dlb-sha2-256` quadrupled in speed because it was never wired to use minio's sha256 implementation and thus was brought back up to speed by `crypto/sha256`'s improvement. I think we should remove `github.com/multiformats/go-multihash/register/miniosha256` one go1.22 is released (~1 year) since it would be purposeless, we could also stub it to github.com/multiformats/go-multihash/register/sha256 for all versions but [no one imports it directly](https://pkg.go.dev/github.com/multiformats/go-multihash/register/miniosha256?tab=importedby).
@rsc Does the implementation include support for AVX512? Or just AVX2? |
The sha_ni sha256 instructions have been shown to provide an ~4x increase in hash rate on newer amd64 systems versus the avx2 implementation. Transliterating the Linux implementation shows an up to 3.79x increase in hash rate under benchmarks.
https://gist.github.com/monocasa/befeed9c8c5827417c0921e231703f2c
Q&A:
Q: Linux is GPL, is this OK to bring into golang?
A: It appears so. The Linux implementation file is dual BSD-3 clause/gpl. A look through the history of that file shows that all relevant changes have come from Intel employees, Intel has signed the CLA, and in fact the current AVX2 implementation submitted by Russ Cox is called out as arriving from the same source.
Q: Was the Assembly Policy followed?
A: I believe so, even to the point of leaving a measurable but ~1% perf increase on the table in exchange for simplifying constant table access and sharing that with the existing avx2 implementation.
The text was updated successfully, but these errors were encountered: