Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new hash_to_field code / expand_message_md revisited #214

Merged
merged 12 commits into from
Mar 5, 2020

Conversation

kwantam
Copy link
Collaborator

@kwantam kwantam commented Feb 28, 2020

This PR updates the code in poc/ to use the new hash_to_field function from #212.

I have not yet regenerated the test vectors. Shall I do that?

@burdges
Copy link

burdges commented Feb 29, 2020

Is pairing-plus or whatever up to date?

@kwantam
Copy link
Collaborator Author

kwantam commented Feb 29, 2020

Is pairing-plus or whatever up to date?

Not yet, that's next on my list.

@mratsim
Copy link
Contributor

mratsim commented Mar 1, 2020

New test vectors would be very helpful

@armfazh
Copy link
Collaborator

armfazh commented Mar 2, 2020

Using this new method for obtaining field elements, does it still preserves the guarantees required by the random oracle construction?
Previously, u0 and u1 were obtained from different RO instantiations, by means of appending an extra bit.
u0 = H (M || 0) and u1 = H (M || 1)
Now, it seems to me that we are querying the same RO for obtaining u0 and u1.

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 2, 2020

Using this new method for obtaining field elements, does it still preserves the guarantees required by the random oracle construction?

Yes.

There are two ways to argue this. One is to recall that the underlying RO-to-curve construction is really defined in terms of independent elements of F. Using two hash functions H0 and H1 that are modeled as independent random oracles satisfies this requirement, but it's not the only way to satisfy it---hash_to_field (with count=2) also gives two independent elements of F.

The other slightly less direct way is to observe that the output of expand_message inside of hash-to-field gives 2 * m * L bytes that are uniformly distributed, and therefore we can define two independent random oracles in terms expand_message, just by splitting its output into two non-overlapping chunks of m * L bytes. This is precisely what happens inside of hash-to-field, so we can understand that function as essentially returning the evaluations of H0 and H1.

(These are equivalent views of the world.)

@armfazh
Copy link
Collaborator

armfazh commented Mar 2, 2020

Let's assume H(m,2) and that exactly b0 and b1 match the sizes of certain prime field. Then, u0 = b0 mod p, and u1 = b1 mod p.
My concern is that b1 depends only on b0, since b1 was calculated as the hash of b0 (plus some prefix).
So, take a message m and calculate b0. At this point, b1 is completely determined by b0, but not by other call to a RO using the original message. (in some sense b0 fixes the output of b1).
I am probably missing something. thoughts?

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 2, 2020

I see what you're saying.

The scenario you describe doesn't quite work because b0 isn't output, only b0_chopped. But I think the idea is correct: consder the case where b0_chopped || b1 is the 1st element and a substring of b2 || b3 is the second element. Then b2 and b3 are fully determined by b1, and the adversary sees all of b1, so we still have the issue.

I went back and looked at CDMP08, and now I see where the issue is coming from. Section 5 talks about building longer outputs by computing H(X || 1), H(X || 2), etc. The important thing here is that there's no chaining of output blocks in that construction. And as you say, using chaining on values that the attacker knows makes it possible to distinguish.

(In contrast, HKDF does use chaining, but always combines the chained value with a secret value that the attacker never learns, namely, PRK.)

I think the fix is straightforward (don't chain, just hash b0 over and over), but I'm going to think carefully about it and push some new text / code.

Thank you!!!

@armfazh
Copy link
Collaborator

armfazh commented Mar 2, 2020

The scenario you describe doesn't quite work because b0 isn't output, only b0_chopped. But I think the idea is correct: consder the case where b0_chopped || b1 is the 1st element and a substring of b2 || b3 is the second element. Then b2 and b3 are fully determined by b1, and the adversary sees all of b1, so we still have the issue.

This is exactly the scenario I had in mind, but I simplified to b0 and b1.

I went back and looked at CDMP08, and now I see where the issue is coming from. Section 5 talks about building longer outputs by computing H(X || 1), H(X || 2), etc. The important thing here is that there's no chaining of output blocks in that construction. And as you say, using chaining on values that the attacker knows makes it possible to distinguish.

(In contrast, HKDF does use chaining, but always combines the chained value with a secret value that the attacker never learns, namely, PRK.)

Exactly, another approach is to generate n values of msg_prime, one per field element requested.

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 2, 2020

OK, consider the following:

Steps:
1. ell = ceil((len_in_bytes + k_in_bytes) / b_in_bytes)
2. ABORT if ell > 256
3. DST_prime = I2OSP(len(DST), 1) || DST
4. b_0 = H(DST_prime || I2OSP(0, 1) || I2OSP(len_in_bytes, 2) || msg)
5. for i in (1, ..., ell - 1):
6.   b_i = H(DST_prime || I2OSP(i, 1) || b_0)                   #### change is here
7. b_0_chopped = b_0[0 : (b_in_bytes - k_in_bytes)]
8. pseudo_random_bytes = b_0_chopped || b_1 || ... || b_(ell - 1)
9. return pseudo_random_bytes[0 : len_in_bytes]

This avoids the chaining issue, because the attacker never sees b_0.

If we were extremely paranoid, we might not even output b_0_chopped. This would mean that the minimum number of hash invocations is 2 instead of 1, which will likely make some people unhappy. But that seems unnecessary, by the security argument for chopMD. Informally, an attacker who doesn't know b_0 cannot compute any more blocks, and here the attacker would have to guess k bits of b_0 in order to reconstruct.

So I guess I'm slightly in favor of outputting b_0_chopped, but I can be convinced otherwise. Thoughts @armfazh?


One issue with the above that makes me quite nervous is that now only 1 byte (really, in the worst case, only 1 bit!!!) of the input of H is changing for each b_i. This is quite aggressive.

A slightly better solution would be to do something like

b_i = H(DST_prime || I2OSP(i, 1) || b_0 || b_(i-1))

The big downside of this is that it significantly increases the number of hash invocations. We could get around that with something like:

b_i = H(DST_prime || I2OSP(i, 1) || b_0 ^ b_(i - 1))

where ^ is bitwise XOR. This would require a special case for b_1, obviously!

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 3, 2020

A third approach is to chain, but chop the output of every hash function, not just the first one. (The last block does not need to be chopped in this case.)

Let's think about the cost of all three approaches. For all three approaches, we have

b_0 = H(DST || I2OSP(i, 0) || I2OSP(len_in_bytes, 2) || msg)

For subsequent blocks,

  • type1 is b_i = H(DST || I2OSP(i, 1) || b_0 || b_(i-1))
  • type 2 is b_i = H(DST || I2OSP(i, 1) || b_0 ^ b_(i-1))
  • type 3 is b_i = H(DST || I2OSP(i, 1) || b_(i-1)) where all b_i are chopped by k bits before being output, except b_(ell - 1).
  • type 4 is type 1, but b_0 is not output
  • type 5 is type 2, but b_0 is not output
  • type 6 is type 3, but b_0 is not output

Type 6 doesn't really make any sense, but I've included it for completeness.

Let's look first at output rates for different cases:

case first block middle blocks last block
SHA-256, k = 128, type 1 or 2 16 bytes 32 bytes 32 bytes
SHA-256, k = 128, type 3 16 bytes 16 bytes 32 bytes
SHA-512, k = 128, type 1 or 2 48 bytes 64 bytes 64 bytes
SHA-512, k = 128, type 3 48 bytes 48 bytes 64 bytes
SHA-512, k = 256, type 1 or 2 32 bytes 64 bytes 64 bytes
SHA-512, k = 256, type 3 32 bytes 32 bytes 64 bytes

Types 4, 5, and 6 correspond to types 1, 2, and 3 (respectively), except that the "first block" column is always 0 (because b_0 is never output).

Now, let's look at number of compression function invocations. Assume that the input message is 32 bytes and DST is 16 bytes; this is roughly worst case (because long messages will dominate hashing cost).

case first block subsequent blocks
type 1 1 2
type 2 1 1
type 3 1 1

Closed-form expressions for ell and number of invocations. Below, ell is the number of blocks that contribute to the output. For types 4, 5, and 6, one must first compute b_0 and then hash ell blocks.

  • type 1:
    • ell = ceil((len_in_bytes + k_in_bytes) / b_in_bytes)
    • invocations: 1 + 2 * (ell - 1)
  • type 2:
    • ell = ceil((len_in_bytes + k_in_bytes) / b_in_bytes)
    • invocations: ell
  • type 3:
    • ell = 1 + ceil((len_in_bytes - b_in_bytes) / (b_in_bytes - k_in_bytes))
    • invocations: ell
  • type 4:
    • ell = ceil(len_in_bytes / b_in_bytes)
    • invocations: 1 + 2 * ell
  • type 5:
    • ell = ceil(len_in_bytes / b_in_bytes)
    • invocations: 1 + ell
  • type 6:
    • ell = 1 + ceil((len_in_bytes - b_in_bytes) / (b_in_bytes - k_in_bytes))
    • invocations: 1 + ell

Using the above, let's consider the cost of different scenarios:

  1. k = 128, p = 256 bits, m = 1 (like Curve25519)
    a. count = 1 (len_in_bytes = 48), SHA-256
    b. as above, SHA-512
    c. count = 2 (len_in_bytes = 96), SHA-256
    d. as above, SHA-512

  2. k = 128, p = 381 bits, m = 2 (like BLS12-381 G2)
    a. count = 1 (len_in_bytes = 128), SHA-256
    b. as above, SHA-512
    c. count = 2 (len_in_bytes = 256), SHA-256
    d. as above, SHA-512

  3. k = 256, p = 521 bits, m = 1 (like P521)
    a. count = 1 (len_in_bytes = 98), SHA-512
    b. count = 2 (len_in_bytes = 196), SHA-512

  4. k = 256, p = 581 bits, m = 8 (like BLS48-581 G2)
    a. count = 1 (len_in_bytes = 840), SHA-512
    b. count = 2 (len_in_bytes = 1680), SHA-512

Case type 1 4 type 2 5 type 3 6
1a 3 5 2 3 2 3
1b 1 3 1 2 1 2
1c 7 7 4 4 5 6
1d 3 5 2 3 2 3
2a 9 9 5 5 7 8
2b 5 5 3 3 3 4
2c 17 17 9 9 15 16
2d 9 9 5 5 5 6
3a 5 5 3 3 3 4
3b 7 9 4 5 6 7
4a 27 29 14 15 26 27
4b 53 55 27 28 52 53

Not surprisingly, type 2 is the cheapest across the board.

Maybe slightly more surprisingly, types 1 and 3 have close to the same cost, but type 3 is basically always better.


So my take is that type 2 is the winner, assuming that folks are OK requiring bitwise XOR. Otherwise, we should go with Type 3.

EDIT I think the "right" choice is really type 5, even though it costs one extra hash invocation for very short messages and very short outputs. So that's my vote.

Type 2 costs fewer hashes, but I'm less certain about its security. I'm roughly as confident in type 4 as type 5, but it's much more expensive. I'm least confident about type 3. And as I said above, type 6 really doesn't make any sense...

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 3, 2020

Just to be clear, I'll write down the pseudocode for each one:

All of them start in roughly the same way:

1. ell = ((compute using closed form given above))
2. ABORT if ell > 256
3. DST_prime = I2OSP(len(DST), 1) || DST
4. chopped_length = b_in_bytes - k_in_bytes
5. b_0 = H(DST_prime || I2OSP(0, 1) || I2OSP(len_in_bytes, 2) || msg)

Type 1

6.  for i in (1, ..., ell - 1):
7.    b_i = H(DST_prime || I2OSP(i, 1) || b_0 || b_(i - 1))
8.  b_0_chopped = b_0[0 : chopped_length]
9.  pseudo_random_bytes = b_0_chopped || b_1 || ... || b_(ell - 1)
10. return pseudo_random_bytes[0 : len_in_bytes]

Type 2

5.  b_1 = H(DST_prime || I2OSP(1, 1) || b_0)
6.  for i in (2, ..., ell - 1):
7.    b_i = H(DST_prime || I2OSP(i, 1) || b_0 ^ b_(i - 1))
8.  b_0_chopped = b_0[0 : chopped_length]
9.  pseudo_random_bytes = b_0_chopped || b_1 || ... || b_(ell - 1)
10. return pseudo_random_bytes[0 : len_in_bytes]

Here, ^ is bitwise xor (always applied to equal-length bytestrings in the above).

Type 3

6.  for i in (1, ..., ell - 1):
7.    b_i = H(DST_prime || I2OSP(i, 1) || b_(i - 1))
8.    b_(i-1)_chopped = b_(i-1)[0 : chop_length]
9.  pseudo_random_bytes = b_0_chopped || b_1_chopped || ... || b_(ell - 1)
10. return pseudo_random_bytes[0 : len_in_bytes]

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 3, 2020

Actually, it occurs to me that the construction as currently written (i.e., where b_1 chains into b_2) probably is OK (though it probably still makes sense to try and improve it, as long as it's not too costly).

The reason is, the field elements at the output of hash-to-field do not uniquely determine the bytes at the output of expand-message. In all cases, we take (log(q) + k) bits and reduce them to log(q) bits. Hand-wavily, this means that the indifferentiability simulator has about 2^k choices it can make when mapping field elements to byte strings.

The proof is much cleaner if we can just make expand-message a random string, so probably it's best to do that. For this purpose I favor approach 2, since it adds the least overhead...

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 3, 2020

OK, I revamped my comment above to include options that don't output any part of b_0.

My vote is to go with "type 5." This is maybe slightly conservative and will make embedded folks slightly sadder. I'll update the document and code with proposed changes (we can always back them out).

@chris-wood
Copy link
Collaborator

chris-wood commented Mar 3, 2020

Type 5 seems like the right compromise on balance. We maintain chaining while also mixing in b_0 and keeping it hidden from the distinguisher, and the concatenated output is indistinguishable from a random string. And the additional XOR cost is negligible.

Thanks @armfazh for pointing out the issue!

kwantam added a commit to kwantam/draft-irtf-cfrg-hash-to-curve that referenced this pull request Mar 4, 2020
the new function is 'type 5' as discussed in cfrg#214.
@kwantam
Copy link
Collaborator Author

kwantam commented Mar 4, 2020

I talked with Dan (Boneh) about "type 5" today, and he had two suggestions:

  1. When computing b_0 = H(DST_prime || I2OSP(0, 1) || I2OSP(ell, 2) || msg), there's a danger that a badly-chosen DST will cause a short msg to be split across the first block boundary. The reason this is bad is that the entropy of msg ends up broken up across two invocations of the compression function, which could end up making some attacks (think: meet-in-the-middle) easier. He suggested that we fix this is by adding extra padding before msg to ensure that it starts right at the first block boundary.

    This is slightly unfortunate because it guarantees one extra invocation of the compression function even for short inputs. In principle we could avoid the extra compression by adding a special case for very short DST/msg combinations. But that's pretty ugly, because it requires knowing a lot about the hash function's construction (e.g., SHA-256 has a minimum of 65 bits of padding after the message, so msg only fits completely in the first block when len(DST) + len(msg) <= 51 bytes.

  2. Similarly, if DST is too long, then b_0 or b_0 XOR b_(i - 1) might end up broken across a block boundary in the b_i calculations. For SHA-256, limiting DST_prime to 22 bytes fixes this; for SHA-512, the magic number is 46 bytes.

It occurs to me that an alternative solution to both of the above issues is to switch the order of the inputs to the hash function; this guarantees that the critical inputs are always right at the beginning. In particular,

b_0 = H(msg || I2OSP(len_in_bytes, 2) || I2OSP(0, 1) || DST_prime)
b_1 = H(b_0 || I2OSP(1, 1) || DST_prime)
b_i = H(strxor(b_0, b_(i - 1)) || I2OSP(i, 1) || DST_prime)

This means we're doing domain separation as a suffix rather than as a prefix of the message. Does this make life more difficult when interacting with other protocols? I'm not exactly sure, but I can't see why it should.

In fact, it arguably makes life better, because any protocol that needs to compute another value of H(msg || ...) can reuse an intermediate state of that computation to save time when computing prk!

Note that both of the worries above apply to SHA-2; for SHA-3, we don't really need to worry about any of this.

kwantam added 2 commits March 3, 2020 23:53
addresses a suggestion from Dan regarding breaking b_0 or msg
across blocks
@kwantam kwantam changed the title new hash_to_field code new hash_to_field code / expand_message_md revisited Mar 4, 2020
@kwantam
Copy link
Collaborator Author

kwantam commented Mar 4, 2020

Quick note: I pinged Dan about swapping the input order and he said it looked fine.

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 4, 2020

Arrgh. As I was writing down a proof sketch I found an issue with swapping the argument order. Fortunately, it's not hard to fix, and there's a reasonably easy way of avoiding any performance ramifications.

Next, I'm going to push a commit that implements this. As always, we can revert...

Executive summary

Above I proposed computing

b_0 = H( msg || I2OSP(len_in_bytes, 2) || I2OSP(0, 1) || DST_prime )

To make the proof go through, we should change this to

b_0 = H(I2OSP(0, input_block_size) || msg || I2OSP(len_in_bytes, 2) || I2OSP(0, 0) || DST_prime)

In other words, we should prepend 1 (input) block of 0s to msg before hashing. For SHA-256 this is 64 bytes, for SHA-512 it's 128 bytes. This lets us rely directly on the security proofs for NMAC given in CDMP05, as well as other analysis like Bel06.

In the rest of this post I'll first discuss why we ought to do this, and then describe how implementors can avoid actually computing another hash function invocation.

Why do we need this change?

Previously we were arguing that our construction was an implementation of the NMAC construction (really, the confusingly-named HMAC^f construction; for clarity I'll just call it NMAC) in Section 3.5 of CDMP05. The basis for this claim was that the first block of each invocation of H was unique, since we had the DST and the counter in there. (We can understand Dan's suggestion (1) above, padding the first block of the b_0 computation, as strengthening this argument.)

Unfortunately, if we're putting the message first, we can't directly rely on the NMAC proof: since the adversary controls the first input block to the hash function, we have no guarantee that it's unique.

CDMP05 fix this problem by prepending a block of 0s to the message. This works because it prevents the adversary from controlling the contents of the first block. Using our notation, this is secure essentially because the probability is negligible that b_0 == 0 or strxor(b_0, b_(i-1)) == 0.

What is the cost?

Directly implementing this change costs an extra compression function invocation. But in practice it's possible to avoid this, just by saving the state of the hash function after it ingests 1 block of 0s, and hashing msg starting from that state.

The storage cost to implement this optimization is minimal (less than 100 bytes for the SHA-2 family), and in practice lots of SHA-2 implementations already let you do this. (This is true, for example, of the OpenSSL implementation.)

The other cost here is that it puts us back in a universe where we can't share a hash prefix with other computations of H(msg || ...). But that seems like it's OK---upper-layer protocols are welcome to define a "prehash mode" if they're really worried about the cost of hashing messages.

expand_message_xmd now instantiates NMAC in a secure way
@@ -1651,7 +1651,7 @@ In this case, security holds when the inner function is modeled as a
random transformation or as a random permutation {{BDPV08}}.

- Otherwise, H MUST be a hash function that has been proved indifferentiable
from a random oracle {{MRH04}} under a reasonable cryptographic assumption.
from a random oracle {{MRH04}} under a widely accepted cryptographic assumption.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@chris-wood
Copy link
Collaborator

Referencing the proof wholesale is great! Most hash function implementations allow the internal state to be set, so I don't foresee this to be a problem in practice (as you mention).

kwantam added 3 commits March 4, 2020 17:22
hyphen -> underscore

also, 'SHA.256' -> 'SHA-256' and similar

closes cfrg#213
@chris-wood
Copy link
Collaborator

@armfazh are you OK with this change? If so, I’ll merge it.

@armfazh
Copy link
Collaborator

armfazh commented Mar 5, 2020

I am not sure whether the XOR maintains the security, or gives the attacker a new surface for attacks. I am not able to prove or disprove this claim.
Let's move forward.

@chris-wood chris-wood merged commit 54818b0 into cfrg:master Mar 5, 2020
@kwantam kwantam deleted the new-h2f-code branch March 6, 2020 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants