Skip to content

Commit

Permalink
sha3: use unaligned reads and xors on x86 and x64
Browse files Browse the repository at this point in the history
Speedup of about 1.4x on x64. Added benchmarks that use the
ShakeHash interface, which doesn't require copying the state.

Unaligned or generic xorIn and copyOut functions chosen via
buildline, but both are tested.

Substantial contributions from Eric Eisner.

See golang.org/cl/151630044 for the previous CR.

(There are also some minor edits/additions to the documentation.)

Change-Id: I9500c25682457c82487512b9b8c66df7d75bff5d
Reviewed-on: https://go-review.googlesource.com/2132
Reviewed-by: Adam Langley <[email protected]>
  • Loading branch information
coruus authored and agl committed Jan 12, 2015
1 parent 160b2e1 commit 4ed45ec
Show file tree
Hide file tree
Showing 6 changed files with 313 additions and 189 deletions.
42 changes: 20 additions & 22 deletions sha3/doc.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
// Guidance
//
// If you aren't sure what function you need, use SHAKE256 with at least 64
// bytes of output.
// bytes of output. The SHAKE instances are faster than the SHA3 instances;
// the latter have to allocate memory to conform to the hash.Hash interface.
//
// If you need a secret-key MAC (message authentication code), prepend the
// secret key to the input, hash with SHAKE256 and read at least 32 bytes of
Expand All @@ -21,45 +22,42 @@
//
// Security strengths
//
// The SHA3-x functions have a security strength against preimage attacks of x
// bits. Since they only produce x bits of output, their collision-resistance
// is only x/2 bits.
// The SHA3-x (x equals 224, 256, 384, or 512) functions have a security
// strength against preimage attacks of x bits. Since they only produce "x"
// bits of output, their collision-resistance is only "x/2" bits.
//
// The SHAKE-x functions have a generic security strength of x bits against
// all attacks, provided that at least 2x bits of their output is used.
// Requesting more than 2x bits of output does not increase the collision-
// resistance of the SHAKE functions.
// The SHAKE-256 and -128 functions have a generic security strength of 256 and
// 128 bits against all attacks, provided that at least 2x bits of their output
// is used. Requesting more than 64 or 32 bytes of output, respectively, does
// not increase the collision-resistance of the SHAKE functions.
//
//
// The sponge construction
//
// A sponge builds a pseudo-random function from a pseudo-random permutation,
// by applying the permutation to a state of "rate + capacity" bytes, but
// hiding "capacity" of the bytes.
// A sponge builds a pseudo-random function from a public pseudo-random
// permutation, by applying the permutation to a state of "rate + capacity"
// bytes, but hiding "capacity" of the bytes.
//
// A sponge starts out with a zero state. To hash an input using a sponge, up
// to "rate" bytes of the input are XORed into the sponge's state. The sponge
// has thus been "filled up" and the permutation is applied. This process is
// is then "full" and the permutation is applied to "empty" it. This process is
// repeated until all the input has been "absorbed". The input is then padded.
// The digest is "squeezed" from the sponge by the same method, except that
// output is copied out.
// The digest is "squeezed" from the sponge in the same way, except that output
// output is copied out instead of input being XORed in.
//
// A sponge is parameterized by its generic security strength, which is equal
// to half its capacity; capacity + rate is equal to the permutation's width.
//
// Since the KeccakF-1600 permutation is 1600 bits (200 bytes) wide, this means
// that security_strength == (1600 - bitrate) / 2.
// that the security strength of a sponge instance is equal to (1600 - bitrate) / 2.
//
//
// Recommendations, detailed
// Recommendations
//
// The SHAKE functions are recommended for most new uses. They can produce
// output of arbitrary length. SHAKE256, with an output length of at least
// 64 bytes, provides 256-bit security against all attacks.
//
// The Keccak team recommends SHAKE256 for most applications upgrading from
// SHA2-512. (NIST chose a much stronger, but much slower, sponge instance
// for SHA3-512.)
// 64 bytes, provides 256-bit security against all attacks. The Keccak team
// recommends it for most applications upgrading from SHA2-512. (NIST chose a
// much stronger, but much slower, sponge instance for SHA3-512.)
//
// The SHA-3 functions are "drop-in" replacements for the SHA-2 functions.
// They produce output of the same length, with the same security strengths
Expand Down
67 changes: 17 additions & 50 deletions sha3/sha3.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,6 @@

package sha3

import (
"encoding/binary"
)

// spongeDirection indicates the direction bytes are flowing through the sponge.
type spongeDirection int

Expand All @@ -30,25 +26,25 @@ type state struct {
buf []byte // points into storage
rate int // the number of bytes of state to use

// dsbyte contains the "domain separation" value and the first bit of
// the padding. In sections 6.1 and 6.2 of [1], the SHA-3 and SHAKE
// functions are defined with bits appended to the message: SHA-3
// functions have 01 and SHAKE functions have 1111. Because of the way
// that bits are numbered from the LSB upwards, that ends up as
// 00000010b and 00001111b, respectively. Then the padding rule from
// section 5.1 is applied to pad to a multiple of the rate, which
// involves adding a 1 bit, zero or more zero bits and then a final one
// bit. The first one bit from the padding is merged into the dsbyte
// value giving 00000110b (0x06) and 00011111b (0x1f), respectively.
//
// [1] http://csrc.nist.gov/publications/drafts/fips-202/fips_202_draft.pdf,
// dsbyte contains the "domain separation" bits and the first bit of
// the padding. Sections 6.1 and 6.2 of [1] separate the outputs of the
// SHA-3 and SHAKE functions by appending bitstrings to the message.
// Using a little-endian bit-ordering convention, these are "01" for SHA-3
// and "1111" for SHAKE, or 00000010b and 00001111b, respectively. Then the
// padding rule from section 5.1 is applied to pad the message to a multiple
// of the rate, which involves adding a "1" bit, zero or more "0" bits, and
// a final "1" bit. We merge the first "1" bit from the padding into dsbyte,
// giving 00000110b (0x06) and 00011111b (0x1f).
// [1] http://csrc.nist.gov/publications/drafts/fips-202/fips_202_draft.pdf
// "Draft FIPS 202: SHA-3 Standard: Permutation-Based Hash and
// Extendable-Output Functions (May 2014)"
dsbyte byte
storage [maxRate]byte

// Specific to SHA-3 and SHAKE.
fixedOutput bool // whether this is a fixed-ouput-length instance
outputLen int // the default output size in bytes
state spongeDirection // current direction of the sponge
state spongeDirection // whether the sponge is absorbing or squeezing
}

// BlockSize returns the rate of sponge underlying this hash function.
Expand Down Expand Up @@ -79,51 +75,22 @@ func (d *state) clone() *state {
return &ret
}

// xorIn xors a buffer into the state, byte-swapping to
// little-endian as necessary; it returns the number of bytes
// copied, including any zeros appended to the bytestring.
func (d *state) xorIn(buf []byte) {
n := len(buf) / 8

for i := 0; i < n; i++ {
a := binary.LittleEndian.Uint64(buf)
d.a[i] ^= a
buf = buf[8:]
}
if len(buf) != 0 {
// XOR in the last partial ulint64.
a := uint64(0)
for i, v := range buf {
a |= uint64(v) << uint64(8*i)
}
d.a[n] ^= a
}
}

// copyOut copies ulint64s to a byte buffer.
func (d *state) copyOut(b []byte) {
for i := 0; len(b) >= 8; i++ {
binary.LittleEndian.PutUint64(b, d.a[i])
b = b[8:]
}
}

// permute applies the KeccakF-1600 permutation. It handles
// any input-output buffering.
func (d *state) permute() {
switch d.state {
case spongeAbsorbing:
// If we're absorbing, we need to xor the input into the state
// before applying the permutation.
d.xorIn(d.buf)
xorIn(d, d.buf)
d.buf = d.storage[:0]
keccakF1600(&d.a)
case spongeSqueezing:
// If we're squeezing, we need to apply the permutatin before
// copying more output.
keccakF1600(&d.a)
d.buf = d.storage[:d.rate]
d.copyOut(d.buf)
copyOut(d, d.buf)
}
}

Expand Down Expand Up @@ -151,7 +118,7 @@ func (d *state) padAndPermute(dsbyte byte) {
d.permute()
d.state = spongeSqueezing
d.buf = d.storage[:d.rate]
d.copyOut(d.buf)
copyOut(d, d.buf)
}

// Write absorbs more data into the hash's state. It produces an error
Expand All @@ -168,7 +135,7 @@ func (d *state) Write(p []byte) (written int, err error) {
for len(p) > 0 {
if len(d.buf) == 0 && len(p) >= d.rate {
// The fast path; absorb a full "rate" bytes of input and apply the permutation.
d.xorIn(p[:d.rate])
xorIn(d, p[:d.rate])
p = p[d.rate:]
keccakF1600(&d.a)
} else {
Expand Down
Loading

0 comments on commit 4ed45ec

Please sign in to comment.