Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99108: Import SHA2-224 and SHA2-256 from HACL* #99109

Merged
merged 36 commits into from
Feb 7, 2023

Conversation

msprotz
Copy link
Contributor

@msprotz msprotz commented Nov 4, 2022

Issue #99108 is about replacing hashlib primitives (for the non-OpenSSL case) with verified implementations from HACL*. This is the first PR in the series, and focuses specifically on SHA2-256 and SHA2-224.

This PR imports Hacl_Streaming_SHA2 into the Python tree. This is the HACL* implementation of SHA2, which combines a core implementation of SHA2 along with a layer of buffer management that allows updating the digest with any number of bytes. This supersedes the previous implementation in the tree.

@franziskuskiefer was kind enough to benchmark the changes: in addition to being verified (thus providing significant safety and security improvements), this implementation also provides a sizeable performance boost!

---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
Sha2_256_Streaming            3163 ns      3160 ns       219353     // this PR
LibTomCrypt_Sha2_256          5057 ns      5056 ns       136234     // library used by Python currently

The changes in this PR are as follows:

  • import the subset of HACL* that covers SHA2-256/224 into Modules/_hacl
  • rewire sha256module.c to use the HACL* implementation

To review the changes, one should focus on sha256module.c, and possibly Hacl_Streaming_SHA2.h which is the API entry point. The rest of the files are part of HACL* and are cherry-picked straight from the HACL* repository, so as to minimize the amount of hand-edits and facilitate future refreshes of the code, should HACL* land some performance improvements.

The original code comes from https://github.com/project-everest/hacl-star/tree/master/dist/gcc-compatible. The various .h files in include/, krmllib/, etc. will be shared across (hopefully) future primitives imported from HACL*. I can trim those files to minimize the amount of includes, but then this will make refreshing the code in the future harder (since the trimming will have to be redone for each refresh).

The code was originally authored by @karthikbhargavan and I; @polubelova and @R1kM provided a considerable amount of help over the past week, as we were overhauling our implementation to offer a significant performance boost (we want this first PR to be a good one!). Thanks to everyone involved in this team effort.

There are a couple remarks left in the source code, but for now, I think it's better to open up a PR and start discussing. As promised, tagging @alex for this PR. Thanks so much!

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@cpython-cla-bot
Copy link

cpython-cla-bot bot commented Nov 4, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@gpshead gpshead self-requested a review November 5, 2022 01:54
@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be useful for this kind of thing to include a script (somewhere under Tools?.. where to put it can be figured out later) that can generate and update the hacl-star vendored code implementation. As I assume we'll not want this to be a one time code drop but periodically update it, at least for every major Python release, as hacl-star evolves.

@gpshead
Copy link
Member

gpshead commented Nov 7, 2022

Please include a benchmark comparing to the OpenSSL based sha2 as well.

Copy link
Member

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add all new headers to MODULE__SHA256_DEPS in Makefile.pre.in.

I would prefer if you keep the sha256module.c file in its current place and only replace libtom code with HACL calls. It's easier to review and keeps the history for untouched lines.

Modules/Setup.stdlib.in Outdated Show resolved Hide resolved
@franziskuskiefer
Copy link

Please include a benchmark comparing to the OpenSSL based sha2 as well.

Adding to the numbers in the PR description #99109 (comment) we get the following. Note that OpenSSL is compiled without assembly support (no-asm no-hw) to get a fair comparison.

OpenSSL_Sha2_256           3000 ns         2999 ns       233849

@gpshead
Copy link
Member

gpshead commented Nov 8, 2022

Note that OpenSSL is compiled without assembly support (no-asm no-hw) to get a fair comparison.

LOL, thanks, but fair enough. 😄 This does at least answer my broader question about the feasibility of also getting rid of the OpenSSL backed _hashlib in the long run: not today.

Which seems to just be a copyright message update.
We do this with other third party code as well, it avoids linking and
dynamic linking ODR violations when extension module or embedding
applications also use their own version of the library code in question.

+ Minor annotation on the test, use `python -m test -u cpu`
+ Add a link to the readme.
@gpshead
Copy link
Member

gpshead commented Jan 31, 2023

Okay, I think this is ready. I pushed a few changes to the PR branch:

  • Improvements to refresh.sh.
  • A README.md in Modules/_hacl/ to go along with this third party code.
  • Used C #define hacks to "namespace" the Hacl_ prefixed symbols from the .c file so that they can't cause ODR violations if a process winds up linking with other code that linked against its own version of HACL*. (we do this on some of our other third party things within CPython).

The list confined to its own .h file was manually generated and thus needs manual updating, but I doubt the list will change often. I'd like a test to confirm that no Hacl_ C symbols exist in _sha256.so or _sha256.pyd... but that can remain a future todo wishlist item for now.

@gpshead gpshead self-assigned this Jan 31, 2023
@gpshead gpshead added extension-modules C modules in the Modules dir 3.12 bugs and security fixes labels Jan 31, 2023
Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small style nit. Also, there's a cast warning on line 127 in Modules/sha256module.c.

Autoconf changes looks good to me.

Modules/sha256module.c Outdated Show resolved Hide resolved
@msprotz
Copy link
Contributor Author

msprotz commented Jan 31, 2023

Thanks both @gpshead and @erlend-aasland!

@erlend-aasland what warning are you seeing? Is it a cast from python's size_t to uint32_t? I wasn't able to reproduce (neither on gcc-11/x86_64/Linux, nor {clang,gcc-12}/x86_64/OSX), but will happily add a cast if that's what you're seeing.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Jan 31, 2023

Yeah, it's a cast warning. You'll see it in the diff tab on the PR; it's one of the GitHub windows runners that's complaining, IIRC (I'm on mobile now).

@msprotz
Copy link
Contributor Author

msprotz commented Feb 7, 2023

I fixed the final review comments... @erlend-aasland do I need to re-request a review from you, or is there anything I can do? thanks!

Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is in good shape! But on its own - lets call this step "2a" - it'll be a little odd to ship in 3.12 if we don't also have some natural related followups to include as well:

  • 2b) SHA384 & SHA512 support (sha512module.c and related hacl code).
  • 3) SHA3 support (more generated hacl code).

I'll go ahead and merge this, and be responsible for reverting it if we decide we don't wind up just the shorter half of SHA2 using it if the follow-ons aren't ready yet before 3.12-beta1 in May.

Could you go ahead and prepare follow-on PRs?

@gpshead gpshead merged commit 1fcc0ef into python:main Feb 7, 2023
@msprotz
Copy link
Contributor Author

msprotz commented Feb 7, 2023

Yup! I'll respond on the issue so that we don't scatter the overall discussion across individual PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes extension-modules C modules in the Modules dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants