Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go code cosmetic fixes #949

Open
4 of 9 tasks
FiloSottile opened this issue Apr 28, 2021 · 13 comments
Open
4 of 9 tasks

Go code cosmetic fixes #949

FiloSottile opened this issue Apr 28, 2021 · 13 comments

Comments

@FiloSottile
Copy link

FiloSottile commented Apr 28, 2021

  • use uint64 instead of uint1 and drop addcarryxU64 and subborrowxU64 wrappers

  • use := instead of var declarations

var x25 uint64
var x26 uint64
x26, x25 = bits.Mul64((arg1[7]), ((arg2[4]) * 0x2))
x26, x25 := bits.Mul64((arg1[7]), ((arg2[4]) * 0x2))
var x196 uint64 = (x193 >> 58)
x196 := x193 >> 58
  • remove superflous parens
bits.Mul64((arg1[7]), ((arg2[4]) * 0x2))
bits.Mul64(arg1[7], arg2[4] * 0x2)
  • don't use references for integer returns
func addcarryxU58(out1 *uint64, out2 *uint1, arg1 uint1, arg2 uint64, arg3 uint64) {
  var x1 uint64 = ((uint64(arg1) + arg2) + arg3)
  var x2 uint64 = (x1 & 0x3ffffffffffffff)
  var x3 uint1 = uint1((x1 >> 58))
  *out1 = x2
  *out2 = x3
}

addcarryxU58(&x20, &x21, 0x0, x1, (x19 & 0x3ffffffffffffff))
func addcarryxU58(arg1 uint1, arg2 uint64, arg3 uint64) (out1 uint64, out2 uint64) {
  var x1 uint64 = ((uint64(arg1) + arg2) + arg3)
  var x2 uint64 = (x1 & 0x3ffffffffffffff)
  var x3 uint1 = uint1((x1 >> 58))
  out1 = x2
  out2 = x3
  return
}

x20, x21 := addcarryxU58(0x0, x1, (x19 & 0x3ffffffffffffff))
  • emit types for different bounds
// TightElement ...
//
// Bounds:
//
//     [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x200000000000000]
type TightElement *[8]uint64
  • emit types for different domains (Montgomery and not Montgomery)

  • make comments shorter

/*
   The function Opp negates a field element.
   Postconditions:
     eval out1 mod m = -eval arg1 mod m
   Input Bounds:
     arg1: [[0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x400000000000000], [0x0 ~> 0x200000000000000]]
   Output Bounds:
     out1: [[0x0 ~> 0xc00000000000000], [0x0 ~> 0xc00000000000000], [0x0 ~> 0xc00000000000000], [0x0 ~> 0xc00000000000000], [0x0 ~> 0xc00000000000000], [0x0 ~> 0xc00000000000000], [0x0 ~> 0xc00000000000000], [0x0 ~> 0xc00000000000000], [0x0 ~> 0x600000000000000]]
 */
/*inline*/
func Opp(out1 *[9]uint64, arg1 *[9]uint64) {
// Opp negates a field element.
//
//    eval out1 mod m = -eval arg1 mod m
//
func Opp(out1 *LooseElement, arg1 *TightElement) {
  • add autogenerated header

This should be the first line in the file, which by convention will get recognized by various tooling as autogenerated code and left alone.

// Code generated by fiat-crypto. DO NOT EDIT.
  • run gofmt on the output (which will only fix some of the whitespace
@mdempsky
Copy link

I think these would be nice improvements if they're easy and don't add complexity to fiat-crypto's output routines (which aren't verified), but I don't think any of these affect the assembly ultimately generated by the Go compilers?

@armfazh
Copy link
Contributor

armfazh commented Apr 28, 2021

@JasonGross definitely types for differentiating between Montgomery domain. +1

JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Apr 28, 2021
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Apr 28, 2021
@mdempsky
Copy link

FWIW, the syntax rules for Go documentation are described at: https://golang.org/pkg/go/doc/#ToHTML

I think the main noteworthy things for fiat-go code are that blank lines separate paragraphs, and runs of indented lines get formatted with <pre>.

JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Apr 29, 2021
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Apr 29, 2021
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Apr 29, 2021
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Apr 30, 2021
@JasonGross
Copy link
Collaborator

I think the main noteworthy things for fiat-go code are that blank lines separate paragraphs, and runs of indented lines get formatted with <pre>.

@mdempsky What's the rule for runs of indented lines not separated from the previous line by a newline. That is, how is something like

line1

line2
  line3
  line4

line5

  line6
  line7
line8

  line9
  line10

Which indented lines become <pre> blocks?

@JasonGross
Copy link
Collaborator

JasonGross commented Apr 30, 2021

  • run gofmt on the output (which will only fix some of the whitespace

This one is a bit tricky logistically, I'd rather not make gofmt a requirement for synthesizing the go files via the Makefile. I can change the output to match the spacing conventions of gofmt, though?

@mdempsky
Copy link

mdempsky commented Apr 30, 2021

Which indented lines become <pre> blocks?

It looks like lines 3 and 4, 6 and 7, and 9 and 10 become pre blocks:

testing with `go doc`
mdempsky@mdempsky:/tmp/test$ ls
test.go
mdempsky@mdempsky:/tmp/test$ cat test.go
/*
line1
continued

line2
continued
  line3
  line4
continued

line5
continued

  line6
  line7
line8
continued

  line9
  line10
*/
package p
mdempsky@mdempsky:/tmp/test$ go doc .
package p // import "."

line1 continued

line2 continued

    line3
    line4

continued

line5 continued

    line6
    line7

line8 continued

    line9
    line10

(This is the command-line output, but it uses the same parser. The 4-space indented blocks correspond to HTML pre tags; the unindented blocks correspond to HTML p tags.)

This one is a bit tricky logistically, I'd rather not make gofmt a requirement for synthesizing the go files via the Makefile.

Out of curiosity, what's the logistic difficulty here? I thought Go is pretty generally available nowadays.

I can change the output to match the spacing conventions of gofmt, though?

gofmt can be somewhat finnicky about things; e.g., it'll format 1 * 2 and 3 + 4, but then 1*2 + 3.

However, to the extent you're able to better match the output, that sounds good and desirable.

@JasonGross
Copy link
Collaborator

Out of curiosity, what's the logistic difficulty here? I thought Go is pretty generally available nowadays.

The files are regenerated by the default makefile target. I'd like to be able to say all of the following:

  • The dependencies for the default makefile target are just Coq and OCaml
  • The default makefile target regenerates the generated files
  • If regenerating the generated files changes their contents (modulo path separators changing on Linux vs Windows), then there's a bug

@mdempsky
Copy link

Makes sense. So we just need a formally verified implementation of gofmt in Coq? :)

@FiloSottile
Copy link
Author

  • run gofmt on the output (which will only fix some of the whitespace

This one is a bit tricky logistically, I'd rather not make gofmt a requirement for synthesizing the go files via the Makefile. I can change the output to match the spacing conventions of gofmt, though?

Sure, you can run gofmt -d to see the changes it would apply to a file.

JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Apr 30, 2021
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue May 1, 2021
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue May 1, 2021
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue May 3, 2021
Part of mit-plv#949

Indent Go code and adjust spacing

Minor spacing changes

Add `--doc-newline-before-package-declaration`

Now we have package docs back for Go code, when we want them.

Add `--doc-prepend-header{,-raw}`, use `--doc-text-before-function-name`

Fixes mit-plv#958
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue May 3, 2021
Part of mit-plv#949

Indent Go code and adjust spacing

Minor spacing changes

Add `--doc-newline-before-package-declaration`

Now we have package docs back for Go code, when we want them.

Add `--doc-prepend-header{,-raw}`, use `--doc-text-before-function-name`

Fixes mit-plv#958
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue May 3, 2021
Part of mit-plv#949

Indent Go code and adjust spacing

Minor spacing changes

Add `--doc-newline-before-package-declaration`

Now we have package docs back for Go code, when we want them.

Add `--doc-prepend-header{,-raw}`, use `--doc-text-before-function-name`

Fixes mit-plv#958
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue May 3, 2021
Part of mit-plv#949

Indent Go code and adjust spacing

Minor spacing changes

Add `--doc-newline-before-package-declaration`

Now we have package docs back for Go code, when we want them.

Add `--doc-prepend-header{,-raw}`, use `--doc-text-before-function-name`

Fixes mit-plv#958
JasonGross added a commit that referenced this issue May 4, 2021
Part of #949

Indent Go code and adjust spacing

Minor spacing changes

Add `--doc-newline-before-package-declaration`

Now we have package docs back for Go code, when we want them.

Add `--doc-prepend-header{,-raw}`, use `--doc-text-before-function-name`

Fixes #958
gopherbot pushed a commit to golang/go that referenced this issue May 9, 2021
Fiat Cryptography (https://github.com/mit-plv/fiat-crypto) is a project
that produces prime order field implementations (the code that does
arithmetic modulo a prime number) based on a formally verified model.

The formal verification covers some of the most subtle and hard to test
parts of an elliptic curve implementation, like carry chains. It would
probably have prevented #20040 and #43786.

This CL imports a 64-bit implementation of the P-521 base field,
replacing the horribly slow and catastrophically variable time big.Int
CurveParams implementation.

The code in p521_fiat64.go is generated reproducibly by fiat-crypto,
building and running the Dockerfile according to the README.

The code in fiat/p521.go is a thin and idiomatic wrapper around the
fiat-crypto code. It includes an Invert method generated with the help
of github.com/mmcloughlin/addchain.

The code in elliptic/p521.go is a line-by-line port of the CurveParams
implementation. Lsh(x, N) was replaced with repeated Add(x, x) calls.
Mul(x, x) was replaced with Square(x). Mod calls were removed, as all
operations are modulo P. Likewise, Add calls to bring values back to
positive were removed. The ScalarMult ladder implementation is now
constant time, copied from p224ScalarMult. Only other notable changes
are adding a p512Point type to keep (x, y, z) together, and making
addJacobian and doubleJacobian methods on that type, with the usual
receiver semantics to save 4 allocations per step.

This amounts to a proof of concept, and is far from a mature elliptic
curve implementation. Here's a non-exhaustive list of things that need
improvement, most of which are pre-existing issues with crypto/elliptic.
Some of these can be fixed without API change, so can't.

- Marshal and Unmarshal still use the slow, variable time big.Int
  arithmetic. The Curve interface does not expose field operations, so
  we'll have to make our own abstraction.

- Point addition uses an incomplete Jacobian formula, which has variable
  time behaviors for points at infinity and equal points. There are
  better, complete formulae these days, but I wanted to keep this CL
  reviewable against the existing code.

- The scalar multiplication ladder is still heavily variable time. This
  is easy to fix and I'll do it in a follow-up CL, but I wanted to keep
  this one easier to review.

- Fundamentally, values have to go in and out of big.Int representation
  when they pass through the Curve interface, which is both slow and
  slightly variable-time.

- There is no scalar field implementation, so crypto/ecdsa ends up using
  big.Int for signing.

- Extending this to P-384 would involve either duplicating all P-521
  code, or coming up with some lower-level interfaces for the base
  field. Even better, generics, which would maybe let us save heap
  allocations due to virtual calls.

- The readability and idiomaticity of the autogenerated code can
  improve, although we have a clear abstraction and well-enforced
  contract, which makes it unlikely we'll have to resort to manually
  modifying the code. See mit-plv/fiat-crypto#949.

- We could also have a 32-bit implementation, since it's almost free to
  have fiat-crypto generate one.

Anyway, it's definitely better than CurveParams, and definitely faster.

name                   old time/op    new time/op    delta
pkg:crypto/elliptic goos:darwin goarch:arm64
ScalarBaseMult/P521-8    4.18ms ± 3%    0.86ms ± 2%  -79.50%  (p=0.000 n=10+9)
ScalarMult/P521-8        4.17ms ± 2%    0.85ms ± 6%  -79.68%  (p=0.000 n=10+10)
pkg:crypto/ecdsa goos:darwin goarch:arm64
Sign/P521-8              4.23ms ± 1%    0.94ms ± 0%  -77.70%  (p=0.000 n=9+8)
Verify/P521-8            8.31ms ± 2%    1.75ms ± 4%  -78.99%  (p=0.000 n=9+10)
GenerateKey/P521-8       4.15ms ± 2%    0.85ms ± 2%  -79.49%  (p=0.000 n=10+9)

name                   old alloc/op   new alloc/op   delta
pkg:crypto/elliptic goos:darwin goarch:arm64
ScalarBaseMult/P521-8    3.06MB ± 3%    0.00MB ± 0%  -99.97%  (p=0.000 n=10+10)
ScalarMult/P521-8        3.05MB ± 1%    0.00MB ± 0%  -99.97%  (p=0.000 n=9+10)
pkg:crypto/ecdsa goos:darwin goarch:arm64
Sign/P521-8              3.03MB ± 0%    0.01MB ± 0%  -99.74%  (p=0.000 n=10+8)
Verify/P521-8            6.06MB ± 1%    0.00MB ± 0%  -99.93%  (p=0.000 n=9+9)
GenerateKey/P521-8       3.02MB ± 0%    0.00MB ± 0%  -99.96%  (p=0.000 n=9+10)

name                   old allocs/op  new allocs/op  delta
pkg:crypto/elliptic goos:darwin goarch:arm64
ScalarBaseMult/P521-8     19.8k ± 3%      0.0k ± 0%  -99.95%  (p=0.000 n=10+10)
ScalarMult/P521-8         19.7k ± 1%      0.0k ± 0%  -99.95%  (p=0.000 n=9+10)
pkg:crypto/ecdsa goos:darwin goarch:arm64
Sign/P521-8               19.6k ± 0%      0.1k ± 0%  -99.63%  (p=0.000 n=10+10)
Verify/P521-8             39.2k ± 1%      0.1k ± 0%  -99.84%  (p=0.000 n=9+10)
GenerateKey/P521-8        19.5k ± 0%      0.0k ± 0%  -99.91%  (p=0.000 n=9+10)

Updates #40171

Change-Id: Ic898b09a2388382bf51ec007d9a79d72d44efe10
Reviewed-on: https://go-review.googlesource.com/c/go/+/315271
Run-TryBot: Filippo Valsorda <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Katie Hockman <[email protected]>
Trust: Katie Hockman <[email protected]>
Trust: Filippo Valsorda <[email protected]>
@JasonGross JasonGross mentioned this issue May 18, 2021
4 tasks
@Yawning
Copy link

Yawning commented Jul 28, 2021

*  use `uint64` instead of `uint1` and drop addcarryxU64 and subborrowxU64 wrappers

This is more than cosmetic. Getting rid of the wrappers and assuming that bits.Add64 behaves as advertised ("The carryOut output is guaranteed to be 0 or 1") improves curve25519 CarryMul performance by ~35%[0]. I assume CarrySquare will also see a significant performance gain.

	var x51 uint64
	var x52 uint1
	x51, x52 = addcarryxU64(x13, x7, 0x0)
	var x53 uint64
	x53, _ = addcarryxU64(x14, x8, x52)
	var x51 uint64
	var x52 uint64
	x51, x52 = bits.Add64(x13, x7, 0x0)
	var x53 uint64
	x53, _ = bits.Add64(x14, x8, x52)

While we're wishing for various fixes to the Go output, Selectznz specifies a private type for one of it's arguments (arg1 uint1), severely limiting the usefulness of the routine.

[0]: With go 1.16.5. 1.17beta1 does a better job of optimizing the current code so the gain is "only" ~31%.

@JasonGross
Copy link
Collaborator

This is more than cosmetic. Getting rid of the wrappers and assuming that bits.Add64 behaves as advertised ("The carryOut output is guaranteed to be 0 or 1") improves curve25519 CarryMul performance by ~35%[0]. I assume CarrySquare will also see a significant performance gain.

If you typedef uint1 to uint64, does that give you the performance benefit?

If removing the wrappers is really that important, I can add a rewriting pass to relax the bounds for go output.

@Yawning
Copy link

Yawning commented Jul 28, 2021

If you typedef uint1 to uint64, does that give you the performance benefit?

name \ time/op  baseline     typedef-ed   no-wrapper
CarryMult-8     66.4ns ± 1%  50.8ns ± 1%  45.9ns ± 1%
name \ time/op  baseline-1.17  typedef-ed-1.17  no-wrapper-1.17
CarryMult-8       61.7ns ± 0%      48.8ns ± 0%      44.7ns ± 0%

It is considerably faster, but the wrapper still has overhead, even though it gets inlined, because of what appears to be inline marking.

	0x023d 00573 (curve25519.go:404)	XCHGL	AX, AX
	0x023e 00574 (curve25519.go:406)	XCHGL	AX, AX
	0x023f 00575 (curve25519.go:409)	XCHGL	AX, AX
	0x0240 00576 (curve25519.go:411)	XCHGL	AX, AX
	0x0241 00577 (curve25519.go:413)	MOVQ	$2251799813685247, R9
	0x024b 00587 (curve25519.go:413)	ANDQ	R9, R12
	0x024e 00590 (curve25519.go:416)	XCHGL	AX, AX
	0x024f 00591 (curve25519.go:418)	XCHGL	AX, AX
	0x0250 00592 (curve25519.go:421)	XCHGL	AX, AX
	0x0251 00593 (curve25519.go:423)	XCHGL	AX, AX
	0x0252 00594 (curve25519.go:426)	XCHGL	AX, AX
	0x0253 00595 (curve25519.go:428)	XCHGL	AX, AX
	0x0254 00596 (curve25519.go:431)	XCHGL	AX, AX
	0x0255 00597 (curve25519.go:433)	XCHGL	AX, AX
	0x0256 00598 (curve25519.go:436)	XCHGL	AX, AX
	0x0257 00599 (curve25519.go:438)	XCHGL	AX, AX
	0x0258 00600 (curve25519.go:441)	XCHGL	AX, AX
	0x0259 00601 (curve25519.go:443)	XCHGL	AX, AX
	0x025a 00602 (curve25519.go:446)	XCHGL	AX, AX
	0x025b 00603 (curve25519.go:448)	XCHGL	AX, AX
	0x025c 00604 (curve25519.go:451)	XCHGL	AX, AX
	0x025d 00605 (curve25519.go:453)	XCHGL	AX, AX
	0x025e 00606 (curve25519.go:456)	XCHGL	AX, AX
	0x025f 00607 (curve25519.go:462)	XCHGL	AX, AX
	0x0260 00608 (curve25519.go:468)	XCHGL	AX, AX
	0x0261 00609 (curve25519.go:474)	XCHGL	AX, AX
	0x0262 00610 (curve25519.go:475)	SUBQ	CX, DX
	0x0265 00613 (curve25519.go:476)	MOVQ	DI, CX
	0x0268 00616 (curve25519.go:476)	SHRQ	$51, DI
	0x026c 00620 (curve25519.go:476)	SHLQ	$13, DX

Since I don't see soul crushing NOP-walls of doom in the version of the routine where I manually removed the wrapper, I assume that the compiler doesn't emit a InlMark (or gets rid of it at some point) for direct calls to bits.Add64 because of the special cases in the compiler's SSA code.

At some point (in the far future) the compiler behavior may resolve itself, but at least one related issue has been open since 2019 (golang/go#29571).

JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Aug 3, 2021
This is a partial fix for item 1 of
mit-plv#949 (comment) also
mentioned at golang/go#40171 (comment)

Unfortunately, we still cast the arguments to uint1 because we don't
have enough information in the right places to relax the casts.  We
really want a fused absint/rewriting pass, where we get access to true
bounds information rather than just casts.
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Aug 4, 2021
This is a partial fix for item 1 of
mit-plv#949 (comment) also
mentioned at golang/go#40171 (comment)

Unfortunately, we still cast the arguments to uint1 because we don't
have enough information in the right places to relax the casts.  We
really want a fused absint/rewriting pass, where we get access to true
bounds information rather than just casts.
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Aug 4, 2021
This is a partial fix for item 1 of
mit-plv#949 (comment) also
mentioned at golang/go#40171 (comment)

Unfortunately, we still cast the arguments to uint1 because we don't
have enough information in the right places to relax the casts.  We
really want a fused absint/rewriting pass, where we get access to true
bounds information rather than just casts.
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Aug 4, 2021
This is a partial fix for item 1 of
mit-plv#949 (comment) also
mentioned at golang/go#40171 (comment)

Unfortunately, we still cast the arguments to uint1 because we don't
have enough information in the right places to relax the casts.  We
really want a fused absint/rewriting pass, where we get access to true
bounds information rather than just casts.
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Aug 6, 2021
This is a partial fix for item 1 of
mit-plv#949 (comment) also
mentioned at golang/go#40171 (comment)

Unfortunately, we still cast the arguments to uint1 because we don't
have enough information in the right places to relax the casts.  We
really want a fused absint/rewriting pass, where we get access to true
bounds information rather than just casts.
JasonGross added a commit that referenced this issue Aug 7, 2021
This is a partial fix for item 1 of
#949 (comment) also
mentioned at golang/go#40171 (comment)

Unfortunately, we still cast the arguments to uint1 because we don't
have enough information in the right places to relax the casts.  We
really want a fused absint/rewriting pass, where we get access to true
bounds information rather than just casts.
@mmcloughlin
Copy link
Contributor

  • use uint64 instead of uint1 and drop addcarryxU64 and subborrowxU64 wrappers

We should either do this or export uint1 as Uint1. At the moment there are exported functions taking arguments of unexported types.

func Selectznz(out1 *[5]uint64, arg1 uint1, arg2 *[5]uint64, arg3 *[5]uint64) {

FiloSottile added a commit to FiloSottile/nistec that referenced this issue May 13, 2022
Fiat Cryptography (https://github.com/mit-plv/fiat-crypto) is a project
that produces prime order field implementations (the code that does
arithmetic modulo a prime number) based on a formally verified model.

The formal verification covers some of the most subtle and hard to test
parts of an elliptic curve implementation, like carry chains. It would
probably have prevented #20040 and #43786.

This CL imports a 64-bit implementation of the P-521 base field,
replacing the horribly slow and catastrophically variable time big.Int
CurveParams implementation.

The code in p521_fiat64.go is generated reproducibly by fiat-crypto,
building and running the Dockerfile according to the README.

The code in fiat/p521.go is a thin and idiomatic wrapper around the
fiat-crypto code. It includes an Invert method generated with the help
of github.com/mmcloughlin/addchain.

The code in elliptic/p521.go is a line-by-line port of the CurveParams
implementation. Lsh(x, N) was replaced with repeated Add(x, x) calls.
Mul(x, x) was replaced with Square(x). Mod calls were removed, as all
operations are modulo P. Likewise, Add calls to bring values back to
positive were removed. The ScalarMult ladder implementation is now
constant time, copied from p224ScalarMult. Only other notable changes
are adding a p512Point type to keep (x, y, z) together, and making
addJacobian and doubleJacobian methods on that type, with the usual
receiver semantics to save 4 allocations per step.

This amounts to a proof of concept, and is far from a mature elliptic
curve implementation. Here's a non-exhaustive list of things that need
improvement, most of which are pre-existing issues with crypto/elliptic.
Some of these can be fixed without API change, so can't.

- Marshal and Unmarshal still use the slow, variable time big.Int
  arithmetic. The Curve interface does not expose field operations, so
  we'll have to make our own abstraction.

- Point addition uses an incomplete Jacobian formula, which has variable
  time behaviors for points at infinity and equal points. There are
  better, complete formulae these days, but I wanted to keep this CL
  reviewable against the existing code.

- The scalar multiplication ladder is still heavily variable time. This
  is easy to fix and I'll do it in a follow-up CL, but I wanted to keep
  this one easier to review.

- Fundamentally, values have to go in and out of big.Int representation
  when they pass through the Curve interface, which is both slow and
  slightly variable-time.

- There is no scalar field implementation, so crypto/ecdsa ends up using
  big.Int for signing.

- Extending this to P-384 would involve either duplicating all P-521
  code, or coming up with some lower-level interfaces for the base
  field. Even better, generics, which would maybe let us save heap
  allocations due to virtual calls.

- The readability and idiomaticity of the autogenerated code can
  improve, although we have a clear abstraction and well-enforced
  contract, which makes it unlikely we'll have to resort to manually
  modifying the code. See mit-plv/fiat-crypto#949.

- We could also have a 32-bit implementation, since it's almost free to
  have fiat-crypto generate one.

Anyway, it's definitely better than CurveParams, and definitely faster.

name                   old time/op    new time/op    delta
pkg:crypto/elliptic goos:darwin goarch:arm64
ScalarBaseMult/P521-8    4.18ms ± 3%    0.86ms ± 2%  -79.50%  (p=0.000 n=10+9)
ScalarMult/P521-8        4.17ms ± 2%    0.85ms ± 6%  -79.68%  (p=0.000 n=10+10)
pkg:crypto/ecdsa goos:darwin goarch:arm64
Sign/P521-8              4.23ms ± 1%    0.94ms ± 0%  -77.70%  (p=0.000 n=9+8)
Verify/P521-8            8.31ms ± 2%    1.75ms ± 4%  -78.99%  (p=0.000 n=9+10)
GenerateKey/P521-8       4.15ms ± 2%    0.85ms ± 2%  -79.49%  (p=0.000 n=10+9)

name                   old alloc/op   new alloc/op   delta
pkg:crypto/elliptic goos:darwin goarch:arm64
ScalarBaseMult/P521-8    3.06MB ± 3%    0.00MB ± 0%  -99.97%  (p=0.000 n=10+10)
ScalarMult/P521-8        3.05MB ± 1%    0.00MB ± 0%  -99.97%  (p=0.000 n=9+10)
pkg:crypto/ecdsa goos:darwin goarch:arm64
Sign/P521-8              3.03MB ± 0%    0.01MB ± 0%  -99.74%  (p=0.000 n=10+8)
Verify/P521-8            6.06MB ± 1%    0.00MB ± 0%  -99.93%  (p=0.000 n=9+9)
GenerateKey/P521-8       3.02MB ± 0%    0.00MB ± 0%  -99.96%  (p=0.000 n=9+10)

name                   old allocs/op  new allocs/op  delta
pkg:crypto/elliptic goos:darwin goarch:arm64
ScalarBaseMult/P521-8     19.8k ± 3%      0.0k ± 0%  -99.95%  (p=0.000 n=10+10)
ScalarMult/P521-8         19.7k ± 1%      0.0k ± 0%  -99.95%  (p=0.000 n=9+10)
pkg:crypto/ecdsa goos:darwin goarch:arm64
Sign/P521-8               19.6k ± 0%      0.1k ± 0%  -99.63%  (p=0.000 n=10+10)
Verify/P521-8             39.2k ± 1%      0.1k ± 0%  -99.84%  (p=0.000 n=9+10)
GenerateKey/P521-8        19.5k ± 0%      0.0k ± 0%  -99.91%  (p=0.000 n=9+10)

Updates #40171

Change-Id: Ic898b09a2388382bf51ec007d9a79d72d44efe10
Reviewed-on: https://go-review.googlesource.com/c/go/+/315271
Run-TryBot: Filippo Valsorda <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Katie Hockman <[email protected]>
Trust: Katie Hockman <[email protected]>
Trust: Filippo Valsorda <[email protected]>
bassosimone pushed a commit to ooni/oocrypto that referenced this issue May 21, 2022
Fiat Cryptography (https://github.com/mit-plv/fiat-crypto) is a project
that produces prime order field implementations (the code that does
arithmetic modulo a prime number) based on a formally verified model.

The formal verification covers some of the most subtle and hard to test
parts of an elliptic curve implementation, like carry chains. It would
probably have prevented #20040 and #43786.

This CL imports a 64-bit implementation of the P-521 base field,
replacing the horribly slow and catastrophically variable time big.Int
CurveParams implementation.

The code in p521_fiat64.go is generated reproducibly by fiat-crypto,
building and running the Dockerfile according to the README.

The code in fiat/p521.go is a thin and idiomatic wrapper around the
fiat-crypto code. It includes an Invert method generated with the help
of github.com/mmcloughlin/addchain.

The code in elliptic/p521.go is a line-by-line port of the CurveParams
implementation. Lsh(x, N) was replaced with repeated Add(x, x) calls.
Mul(x, x) was replaced with Square(x). Mod calls were removed, as all
operations are modulo P. Likewise, Add calls to bring values back to
positive were removed. The ScalarMult ladder implementation is now
constant time, copied from p224ScalarMult. Only other notable changes
are adding a p512Point type to keep (x, y, z) together, and making
addJacobian and doubleJacobian methods on that type, with the usual
receiver semantics to save 4 allocations per step.

This amounts to a proof of concept, and is far from a mature elliptic
curve implementation. Here's a non-exhaustive list of things that need
improvement, most of which are pre-existing issues with crypto/elliptic.
Some of these can be fixed without API change, so can't.

- Marshal and Unmarshal still use the slow, variable time big.Int
  arithmetic. The Curve interface does not expose field operations, so
  we'll have to make our own abstraction.

- Point addition uses an incomplete Jacobian formula, which has variable
  time behaviors for points at infinity and equal points. There are
  better, complete formulae these days, but I wanted to keep this CL
  reviewable against the existing code.

- The scalar multiplication ladder is still heavily variable time. This
  is easy to fix and I'll do it in a follow-up CL, but I wanted to keep
  this one easier to review.

- Fundamentally, values have to go in and out of big.Int representation
  when they pass through the Curve interface, which is both slow and
  slightly variable-time.

- There is no scalar field implementation, so crypto/ecdsa ends up using
  big.Int for signing.

- Extending this to P-384 would involve either duplicating all P-521
  code, or coming up with some lower-level interfaces for the base
  field. Even better, generics, which would maybe let us save heap
  allocations due to virtual calls.

- The readability and idiomaticity of the autogenerated code can
  improve, although we have a clear abstraction and well-enforced
  contract, which makes it unlikely we'll have to resort to manually
  modifying the code. See mit-plv/fiat-crypto#949.

- We could also have a 32-bit implementation, since it's almost free to
  have fiat-crypto generate one.

Anyway, it's definitely better than CurveParams, and definitely faster.

name                   old time/op    new time/op    delta
pkg:crypto/elliptic goos:darwin goarch:arm64
ScalarBaseMult/P521-8    4.18ms ± 3%    0.86ms ± 2%  -79.50%  (p=0.000 n=10+9)
ScalarMult/P521-8        4.17ms ± 2%    0.85ms ± 6%  -79.68%  (p=0.000 n=10+10)
pkg:crypto/ecdsa goos:darwin goarch:arm64
Sign/P521-8              4.23ms ± 1%    0.94ms ± 0%  -77.70%  (p=0.000 n=9+8)
Verify/P521-8            8.31ms ± 2%    1.75ms ± 4%  -78.99%  (p=0.000 n=9+10)
GenerateKey/P521-8       4.15ms ± 2%    0.85ms ± 2%  -79.49%  (p=0.000 n=10+9)

name                   old alloc/op   new alloc/op   delta
pkg:crypto/elliptic goos:darwin goarch:arm64
ScalarBaseMult/P521-8    3.06MB ± 3%    0.00MB ± 0%  -99.97%  (p=0.000 n=10+10)
ScalarMult/P521-8        3.05MB ± 1%    0.00MB ± 0%  -99.97%  (p=0.000 n=9+10)
pkg:crypto/ecdsa goos:darwin goarch:arm64
Sign/P521-8              3.03MB ± 0%    0.01MB ± 0%  -99.74%  (p=0.000 n=10+8)
Verify/P521-8            6.06MB ± 1%    0.00MB ± 0%  -99.93%  (p=0.000 n=9+9)
GenerateKey/P521-8       3.02MB ± 0%    0.00MB ± 0%  -99.96%  (p=0.000 n=9+10)

name                   old allocs/op  new allocs/op  delta
pkg:crypto/elliptic goos:darwin goarch:arm64
ScalarBaseMult/P521-8     19.8k ± 3%      0.0k ± 0%  -99.95%  (p=0.000 n=10+10)
ScalarMult/P521-8         19.7k ± 1%      0.0k ± 0%  -99.95%  (p=0.000 n=9+10)
pkg:crypto/ecdsa goos:darwin goarch:arm64
Sign/P521-8               19.6k ± 0%      0.1k ± 0%  -99.63%  (p=0.000 n=10+10)
Verify/P521-8             39.2k ± 1%      0.1k ± 0%  -99.84%  (p=0.000 n=9+10)
GenerateKey/P521-8        19.5k ± 0%      0.0k ± 0%  -99.91%  (p=0.000 n=9+10)

Updates #40171

Change-Id: Ic898b09a2388382bf51ec007d9a79d72d44efe10
Reviewed-on: https://go-review.googlesource.com/c/go/+/315271
Run-TryBot: Filippo Valsorda <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Katie Hockman <[email protected]>
Trust: Katie Hockman <[email protected]>
Trust: Filippo Valsorda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants