Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the valid segwit address lengths. #945

Conversation

roconnor-blockstream
Copy link
Contributor

We add a referece to the Analysis of insertion in Bech32 strings (see https://gist.github.com/sipa/a9845b37c1b298a7301c33a04090b2eb) and recommend that applications with variable length data parts explicitly include their length as part of their encoding.

In response to the above analysis, we restrict segwit address to only support a subset of witness program lengths to ensure the lengths of segwit addresses always differ by at least 5.

We add a referece to the Analysis of insertion in Bech32 strings (see https://gist.github.com/sipa/a9845b37c1b298a7301c33a04090b2eb) and recommend that applications with variable length data parts explicitly include their length as part of their encoding.

In response to the above analysis, we restrict segwit address to only support a subset of witness program lengths to ensure the lengths of segwit addresses always differ by at least 5.
@roconnor-blockstream
Copy link
Contributor Author

roconnor-blockstream commented Jul 16, 2020

One design choice I made was to eliminate witnesses programs whose length is less than 10 bytes from being addressable by this segwit address scheme. My reasoning here is that any cryptographic scheme that provides 80-bits of security will require at least 80-bits of data. That said, all commonly used schemes, such as hashes, and discrete log based crypto all require 160 bits of data to provide 80-bits of security. I don't think it would be unreasonable to prevent witness programs of length less than 20 bytes from being addressable by this segwit address scheme. (Alternatively we could add back in support for 6 and 3 byte witness programs if we don't want to restrict insecure schemes from being addressable.)

Copy link
Contributor

@harding harding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

@@ -209,11 +214,13 @@ implementations' assumptions about lengths), but still be visually
distinct.</ref> for testnet.
* The data-part values:
** 1 byte: the witness version
** A conversion of the 2-to-40-byte witness program (as defined by [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]) to base32:
** The witness program (as defined by [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]), which MUST have a size of 10, 13, 16, 20, 23, 26, 29, 32, 36, or 40 bytes<ref>'''Why are only witness programs of sizes of 10, 13, 16, 20, 23, 26, 29, 32, 36, and 40 bytes supported?''' To overcome Bech32's [[#Limitations|limitations]], which were discovered after deployment, we have reduced the selection of witness sizes to ensure that all segwit address lengths differ by a minimum of 5 characters, while also ensuring that (1) segwit v0's 20 and 32 byte witness programs are supported; (2) the 40 byte maximum segwit program size is supported; and (3) witness programs of fewer than 10 bytes, which would not have enough entropy to provide security, are excluded.</ref>, coverted to base32:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: most typesetting rules would prefer the comma come before the footnote superscript, e.g. or 40 bytes,<ref>'''Why [...]

@gmaxwell
Copy link
Contributor

I think this sort of change is okay, but I wanted to comment on what I think is a disconnect between consensus rule design and BIP173 design.

It was my intention in BIP173 so that common error modes like dropping or adding a character or two would never result in funds going off into space, at least to the greatest extent possible. Even without any knowledge of 'padding flaw' it was my expectation that as new address versions were released their specific length limitations would be back imposed through revisions to BIP173 or additional BIPs, in order to gain the additional protection. It's just simply much better to be able to say that certain classes of error are impossible rather than saying that they are very unlikely, especially when very is "only" one in a billion-- and not something where it's not even feasible to construct an example.

For consensus rules, sure it makes sense to only restrict the part of the input space that you've assigned meaning to... And yes, we generally intended BIP173 to be forward-compatible for new script types. But that doesn't necessarily mean that BIP173 can't restrict an input even though consensus doesn't currently. There would be no reason that a length couldn't be restricted but a decade from now allowed ... at least it would only be a 1LOC change to support sending to it.

Allowing only far-spaced lengths are probably sufficient for the purpose of making likely errors impossible.

Unfortunately a payload length of 33 bytes is pretty interesting and would probably be the next most wanted length after the currently existing ones.

@sipa had some other proposal to change the checksum algorithm slightly for lengths other than 20 and 32, which would retain compatibility for existing lengths yet allow all lengths to work.

@sipa
Copy link
Member

sipa commented Jul 18, 2020

Thanks for picking this up, @roconnor-blockstream.

My thinking was slightly different than what you're proposing: namely restricting the existing scheme in BIP173 to just length 20 and 32, and then (either later or at the same time), define a new scheme which is identical except using final xor constant 0x3fffffff instead of 0x1, and using that for other lengths.

I think using a different constant rather than length-prefixing is preferable, as it doesn't unnecessarily lengthen encodings. Doing the length-prefixing implicitly is possible, but doesn't have advantages - I think - over just changing the constant.

So this would mean there are two schemes - bech32 and bech32m say - and BIP173 (or its replacement) specifies using bech32 for length 20/32 and bech32m for other lengths. But unrelated applications could pick bech32m entirely (because they're new and don't care about the historical legacy), or they could stay at bech32 entirely (because they don't suffer from length variability).

So I'm not sure it's useful to keep the set of permitted lengths beyond 20/32 - if the above is the ultimate direction we want to go in, it seems preferable that all future lengths use the new scheme, even ones that technically would be safe to remain at bech32.

@roconnor-blockstream
Copy link
Contributor Author

Based on feedback I received here and elsewhere, I now understand that the issues surrounding bc1 addresses is far more complicated that I realized. The information below was gathered from talking with gmaxwell, sipa, and others.

There are two questions that I believe need to be resolved before proceeding here:

  1. Should we relay transactions involving unassigned segwit addresses by default or not?

If transactions are relayed by default then we run the risk of users accidentally using new segwit formats, such as taproot, prior to their activation. The rules around lock-in vs activation are a little bit subtle and, for example, it is probably easy for users to mistake lock-in for activation.

However if transactions are not relayed by default then wallet developers need to effectively disable support for unassigned segwit formats, lest during a multi-party payment, for example, one output to an unassigned segwit address prevents the entire multi-party payment from being relayed. Of course, disabling support for unassigned segwit formats brings us back to the situation where every time a new segwit format is activated, it is unusable until wallet developers upgrade their software to support the new address.

There may be room for some compromise, where deployed-but-not-activated segwit address are temporarily forbidden from being relayed. I'm not sure if that is the best of both solutions or the worst of both solutions.

  1. Should insertion/deletion of 1 (or more?) characters from a valid address always create an invalid address?

The BCH code is designed so that a single character replacement always transforms a valid address to an invalid address, however when it comes to insertions, the checksum can have a 2^-30 chance of remaining valid in many circumstances. If we develop a replacement for BIP173, there are plausible ways we can enforce that insertions always create invalid addresses at the data part level by requiring some kind of padding to space out the encodings and/or adding length information to the data part, (though insertions of characters into that length field require careful consideration). Just as proof that a solution can exist, if we put a length field at both the beginning and end of the witness program, then any single character insertion would create an illegal data value, even if the checksum doesn't fail.

If we are satisfied with 2^-30 chances of failing to catch insertion errors, using a xor constant 0x3fffffff in a BIP173 replacement should be all that is need.

The answers to these two questions interplay with each other. For example, if we are not going to relay unallocated segwit addresses by default then we could perhaps postpone worries about character insertions until such time as we create segwit versions of similar but not equal lengths (which tarpoot's design has narrowly avoided by moving from 33-byte witness programs to 32-byte witness programs).

The answers to the questions will also affect how we design a BIP173 replacement, and how urgently that replacement needs to be done. For example, if we are not going to replay unallocated segwit addresses by default then we might as well simply modify BIP173 to recommend that all non-v0 segwit addresses not be supported, and recommend that addresses for taproot only be supported after taproot activation.

I'm not sure how to go about resolving these two questions, but we should attempt to resolve them before taproot deployment and activation.

@gmaxwell
Copy link
Contributor

I don't understand why you believe wallet policy and relay policy have to be in lockstep? Relay policy can't be more strict without creating issues, yes, but wallets can be more strict than relay.

Of course, disabling support for unassigned segwit formats brings us back to the situation where every time a new segwit format is activated, it is unusable until wallet developers upgrade their software to support the new address.

Not all upgrades are the same though. Adding an allowed version or length (or removing some allowed lengths) from a list is radically less work than some alternatives.

I think I generally like permitting all things but then when a new version is defined, narrowing it down to what is actually used. E.g. 33 is allowed for witness version v1 bech32, but then v1 gets deployed and the usage is restricted to make it more sensitive to errors. This doesn't break the property that it's extremely robust to additional character errors, because you would have to also change the version (e.g. adding a character at the fourth position).

Don't-relay has other negative effects. For example, there is some scammy ethereum "distributed wrapped btc" thing (I don't want to link it because I think it's actually fraud, but I think it makes for a fine example in the abstract) which is hardcoding a requirement for users to pay to particular scriptpubkey templates specifically to avoid problems caused by users requesting payments to non-standard addresses and breaking rely. As a result it's going to be the case that v0 ends up baked into some automated system which is effectively unchangeable and then it turns into a huge burden against deploying new functionality in the future. (I don't think that'll happen in this case, because the whole thing is an and-its-gone scam, but the general structure will occur elsewhere).

I think: The network should relay future stuff, clear that if you use it you get to keep the pieces. Wallets probably shouldn't allow payments to undefined verisons, however, except via warning-guarded or default-off-config-guarded settings. For automated transaction processing things that are hard to upgrade, HSMs and the like, they should allow them.

"Unknown address verision. This address version was not defined at the time of this software's release and has not been tested in this software. You should consider upgrading before using these addresses. [Abort] [I know what I'm doing, send anyways]."

@roconnor-blockstream
Copy link
Contributor Author

roconnor-blockstream commented Jul 18, 2020

I seem to have been mistaken / confused about relay policy. As I understand, it is the case that by default Bitcoin Core does relay transactions that spend to unassigned segwit addresses, but doesn't relay transactions that spend from unassigned segwit addresses. That seems to address the concerns about taproot depolyment safety as well as allowing newly deployed address to work immediately.

I'm not sure why I thought there was a problem.

@roconnor-blockstream
Copy link
Contributor Author

Wallets probably shouldn't allow payments to undefined versions,

Setting aside whether I agree with this or not, BIP173 currently reads:

but implementations MUST allow the use of any version.

which contradicts that your advice or otherwise I have no understand what that MUST statement is referring to.

If we want to remove that statement from BIP173 and replace it something like "implementations SHOULD not allow the use of any unknown version.", then dealing with this insertion issue becomes less urgent because only 32 and 20 byte witness programs are currently allocated.

@gmaxwell
Copy link
Contributor

I'm not sure why I thought there was a problem.

Well there is a potential problem, but I don't think it's solvable.

For example. Right now some people could send a bunch of payments to some future version output and then make a fuss about having specific not yet settled consensus rules.

Fortunately, they'd be unconstrained so they could just be stolen or returned. ... though if they were P2SH embedded, they'd be "safe" until the preimage was revealed. Of course if they were P2SH embedded they couldn't be blocked anyways.

People sending to future versions isn't good. But it doesn't seem possible to avoid at least having relay allow it without creating other issues which are worse.

@maaku
Copy link
Contributor

maaku commented Jul 22, 2020

@sipa implicit length prefixing doesn't interfere with changing the xor constant, and they're both pretty simple to implement. For bech32m wouldn't it make sense to do both, for defense in depth against other unidentified issues?

@sipa
Copy link
Member

sipa commented Jul 22, 2020

@maaku Implicit length prefixing is actually equivalent to having a length-dependent xor constant. So my belief is that this just complicates analysis (we'd need to run it on the effective m(x,l) for every pair of nearby lengths l1, l2, instead of just picking a constant that's always good).

@maaku
Copy link
Contributor

maaku commented Jul 22, 2020

Makes sense, thanks!

@SomberNight
Copy link
Contributor

My thinking was slightly different than what you're proposing: namely restricting the existing scheme in BIP173 to just length 20 and 32, and then (either later or at the same time), define a new scheme which is identical except using final xor constant 0x3fffffff instead of 0x1, and using that for other lengths.

I think re BIP173 (bech32 in the context of Bitcoin addresses), either an approach of

  • (1) use existing encoding for witness program lengths of 20 and 32, but bech32m (new encoding) for all other lengths (i.e. introduce changes for all witness versions but only for so far unused lengths), or
  • (2) use existing encoding for witness version 0 and version 1, but bech32m for version 2 onwards (i.e. make incompatible change but only to "far in future" versions)

would be least intrusive and most feasible.

In particular I think that for witness version 1 it would be preferable not to introduce changes to BIP173, not to add unnecessary friction to adoption.

@luke-jr
Copy link
Member

luke-jr commented May 17, 2021

Where does this stand? @sipa @gmaxwell

@sipa
Copy link
Member

sipa commented May 18, 2021

I think this can be closed. The issues are addressed by BIP 350.

@kallewoof
Copy link
Contributor

@roconnor-blockstream If you don't have alternative views on the subject, this can be closed.

@roconnor-blockstream
Copy link
Contributor Author

We can close this issue, but shouldn't there be some sort of "replaced-by: 350" metadata field?

@kallewoof
Copy link
Contributor

You mean where BIP350 replaces BIP173? Yeah, I think that is reasonable, but isn't BIP173 still used for v0?

@kallewoof
Copy link
Contributor

I guess the title ("Base32 address format for native v0-16 witness outputs") should also be modified to say "for native v0 witness outputs".

@murchandamus
Copy link
Contributor

This suggestion was declined by the BIP authors. @sipa: What do you think about the suggestion to mention in BIP173 that it is only used for segwit version 0?

@sipa
Copy link
Member

sipa commented Apr 24, 2024

@murchandamus That sounds very ressonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants