Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand_message_xmd : a potential optimization #223

Closed
kwantam opened this issue Mar 14, 2020 · 7 comments
Closed

expand_message_xmd : a potential optimization #223

kwantam opened this issue Mar 14, 2020 · 7 comments

Comments

@kwantam
Copy link
Collaborator

kwantam commented Mar 14, 2020

Hi folks,

It occurs to me that there's one possible modification to expand_message_xmd that would make it slightly nicer for some users. It's not a clear and unqualified win, but it's worth at least discussing.

Right now, expand_message_xmd computes the following:

Z_pad = I2OSP(0, <block length of H>)
b_0 = H(Z_pad || msg || I2OSP(len_in_bytes, 2) || I2OSP(0, 1) || DST_prime)
b_1 = H(b_0 || I2OSP(1, 1) || DST_prime)
b_i = H(strxor(b_0, b_(i-1)) || I2OSP(i, 1) || DST_prime)

This is to ensure that b_0 and b_i (i >= 1) are hashed with different initialization vectors. The downside of this approach is that some protocols will already have computed H(msg || ...), and the above means that they have to pass over msg again to compute b_0.

We could instead consider doing this:

b_0 = H(msg || I2OSP(len_in_bytes, 2) || I2OSP(0, 1) || DST_prime)
b_1 = H(Z_pad || b_0 || I2OSP(1, 1) || DST_prime)
b_i = H(Z_pad || strxor(b_0, b_(i-1)) || I2OSP(i, 1) || DST_prime)

This also ensures that b_0 and b_i (i >= 1) are hashed starting from different initialization vectors, but now implementors can reuse the work computing H(msg || ...) in b_0.

This looks like strictly more hashing work, since all the b_i need to be prefixed with a block of 0s. But as we know, most of the time implementations allow you to preload the state of the hash function, so in practice it's possible to optimize this away. On the other hand, anyone who doesn't want to implement this optimization will pay extra for hashing.

So the question is, do we enable a potentially nice optimization at the cost of making un-optimized implementations slower?

(This assumes, of course, that we're willing to make any change at all. But now's the time for a change, since to my knowledge no one except me has actually implemented the new hash_to_field functions yet.)

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 14, 2020

(cc @armfazh @chris-wood )

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 14, 2020

My take is that most un-optimized implementations won't see much increase in cost, whereas optimized implementations would enjoy significant savings for long msg values. So on balance I'd lean towards making this change. Otherwise, we're kind of stuck in a middle ground: un-optimized implementations are almost imperceptibly faster, but optimized ones are 2x slower in the huge-msg asymptote.


One additional advantage is that changing the design as described above might discourage people from defining prehash modes, which is probably a good thing security-wise.

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 19, 2020

Ah, thinking about this more, the suggestion doesn't quite work: if msg starts with a full block of 0s, then indifferentiability is broken.

We might instead consider:

b_0 = H(msg || I2OSP(len_in_bytes, 2) || I2OSP(0, 1) || DST_prime)

Z_pad = I2OSP(0, r_in_bytes - b_in_bytes)
b_i = H(b_0 || Z_pad || b_(i-1) || I2OSP(i, 1) || DST_prime)

Now every b_i hash starts with a block comprising b_0 || 000...000, which is distinct from the first block of msg unless an attacker can find msg starting with the block X || 000...000 where X == H(msg). I have to think more about the difficulty of this problem.

The advantage of this construction over the current one is that this lets us reuse work for computing H(msg || ...). The cost is that an optimal implementation costs one extra compression function evaluation (to compute h(IV, b_0 || 000...000)). In other words, we make it slightly friendlier for reusing work, at the cost of exactly one more compression function invocation.

Thoughts?

@chris-wood
Copy link
Collaborator

@kwantam ACKing this issue! I'll implement the different variants and get some benchmarks.

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 19, 2020

Thinking more about this, I think the proposal in my prior message ends up assuming a lot more about the security of the compression function than the current version of xmd does (and, in particular, seems to assume a property that isn't really standard). So I think we'd be better off with the current version, just for the sake of being conservative with security.


Details:

Recall that the current version of xmd ensures that the "inner" invocation of H (to compute b_0) starts with a different initialization vector than the "outer" invocations (to compute b_i) by fixing the first block of the inner H's input to the all-zeros block (this suffices to guarantee that the outer and inner IVs are always different, thanks to the way we use the counter). And this is precisely the reason that the work computing the inner H invocation can't be reused by other pieces of an invoking protocol---we need to prepend a fixed block to msg to change the IV of the inner hash invocation.

(Of course, the invoking protocol is welcome to use H(Z_pad || msg || ...) rather than H(msg || ...) if they want!)

The version I proposed in the prior message tries to ensure that the inner and outer invocations use different IVs by fixing the first block of the outer invocation (rather than the inner one). But for this to work, it has to be the case that the adversary can't craft an input to the inner invocation that causes the outer and inner IVs to match---and to rule this out, we need to rely on some property of the hash function that doesn't seem like a standard one.

What property do we need? Well, contrary to what I initially posted above, it's certainly not enough for H to have preimage resistance, because this isn't quite a preimage attack:

  1. Preimage resistance: given some target X, come up with a message m such that X = H(m)
  2. What we would need: come up with any X and m such that X = H(X || 000...000 || m)

(The attacker has strictly more power in the second case than in the first, so we should expect hash functions to be weaker against the second kind of attack than against a preimage attack.)

It's also not any of the standard prefix-resistance properties, since those fix a prefix for the attacker to target. We might call what we're looking for prefix-fixpoint-resistance. But since this doesn't appear to be anything like a standard notion of security, it seems at best dubious to base the security of the construction on it.

As more evidence that this isn't a great idea, we know that it's relatively easy to find a fixpoint for a Davies-Meyer--style compression function like the one used in SHA-2. I have no idea whether this can be extended to make the attack work, but it seems to be a crack in the armor...

@armfazh
Copy link
Collaborator

armfazh commented Mar 19, 2020

I consider that we must be act with caution, since it is easy to make wrong decisions that could lead to attacks.
It would be ideal to have a security proof of the extraction of pseudo random bytes. However, even the authors of eprint.iacr.org/2018/349 had a difficulties on analyzing the algorithms in SP-800-90A.

As a matter of sanity, we could take some time to revisit the simplest and secure approach, and from it, departing to make some improvements.

@kwantam
Copy link
Collaborator Author

kwantam commented Mar 20, 2020

Well, since it seems like there's no way to make this work-savings proposal happen unless we're willing to make some weird assumption about a hash function, how about we table this for now and revisit if any of us comes up with a better idea?


Meanwhile, I think @armfazh is right that we want to write down a detailed analysis of the current expand_message_xmd function.

Just for the sake of completeness, here's a variant of our current function that gets rid of the XOR operation, at the cost of one more compression function invocation. (The security analysis of this one is somewhat easier because it doesn't use XOR chaining.)

b_0 = H(Z_pad || msg || I2OSP(len_in_bytes, 2) || I2OSP(0, 1) || DST_prime)
b_0_pad = b_0 || I2OSP(0, r_in_bytes - b_in_bytes)    # pad b_0 to 1 block
for i in (1, ..., ell):
    b_i = H(b_0_pad || b_(i - 1) || I2OSP(i, 1) || DST_prime)

Here, unless b_0_pad = Z_pad (which requires finding a first preimage), the inner and outer H functions have independent IVs. Meanwhile, because of the counter, all of the messages are distinct.

The extra compression function invocation is to compute h(IV, b_0_pad) once; this can be reused for all b_i computations, which thereafter require as little as one compression function invocation (depending on the length of DST). This is cheaper than what we considered in #214,

b_i = H(b_0 || b_(i-1) || I2OSP(i, 1) || DST_prime)

which would require at least two compression function invocations for every b_i.

@kwantam kwantam closed this as completed Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants