-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: limit the length of key/value #14645
Conversation
so the limit is deterministicly defined, rather than some arbitrary limit decided by underlying database.
store/types/validity.go
Outdated
|
||
const ( | ||
// 32K | ||
MaxKeyLength = math.MaxUint16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaxUint16
is picked so the length can be encoded in 2bytes with fixed length, should be more than enough for all use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2^16 bytes (64 KiB) is definitely not enough for CosmWasm values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value is 2^32
, 2^16
is for keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any length limits for keys on DBs used that are shorter than 2^32? Personally, I don't see any reasons why we should have a shorter limit for keys than for values. The original motivation was deterministic behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👆 Data structures like sets (i.e. a map from key to nothing) store all its data in keys. So treating keys and value differently should be avoided if possible IMO.
store/types/validity.go
Outdated
// 32K | ||
MaxKeyLength = math.MaxUint16 | ||
// 2G | ||
MaxValueLength = math.MaxUint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should cover all cases even in CW?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alpe do you happen to know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is wrong, the limit is 4G, since it's uint32, I think there was no limit before? we are collecting opinions on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaxUint32
is 4G, yeah, it's a breaking change, but I believe it's a necessary one and need to be done sooner or later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, but breaking changes have enormous and wide-ranging impact, and so must be motivated by something stronger than belief 😃 What concrete rationale supports this breaking change across all Cosmos networks?
deterministic state machine, as I explained in PR description. the goal is set a deterministic limit which are supported by all supported db backends, and can cover hopefully all existing use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we need to be a little bit smaller than hard limit, because iavl leaf node encode both key and value and a few other bytes together as value in low level db, I'd prefer 2G(aka. MaxInt32) if it's enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got feedback from Mauro on CosmWasm limits for contracts. They are much lower than the suggested values: https://github.com/CosmWasm/cosmwasm/blob/main/packages/vm/src/imports.rs#L32-L35
Please note all contract data is stored in prefix store so the actual key/value length can be a bit bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note there is no good reason for having different values in MAX_LENGTH_DB_KEY
and MAX_LENGTH_DB_VALUE
in CosmWasm. It was a gut feeling from the early days and if contract developers exceed those they are most likely doing something wrong. For the next iteration of that code, I'd give them both the same value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know, thanks @peterbourgon.
In this case my only concern is that the current key limit is smaller than the limit that CosmWasm set before (64 KiB + wasmd prefix + iavl wrapper?) and we have no way to know who is using values close to this limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congratulations to this clever analysis. It's not helping though. You asked for empirical data, worried about breaking and other people have been asking for our insights. And yes, in a smart contract scenario we cannot control what people are using right now within the currently enforced range.
// AssertValidKey checks if the key is valid(key is not nil, not empty and within length limit) | ||
func AssertValidKey(key []byte) { | ||
if len(key) == 0 { | ||
panic("key is nil") | ||
panic("key is nil or empty") | ||
} | ||
if len(key) > MaxKeyLength { | ||
panic("key is too large") | ||
} | ||
} | ||
|
||
// AssertValidValue checks if the value is valid(value is not nil) | ||
// AssertValidValue checks if the value is valid(value is not nil and within length limit) | ||
func AssertValidValue(value []byte) { | ||
if value == nil { | ||
panic("value is nil") | ||
} | ||
if len(value) > MaxValueLength { | ||
panic("value is too large") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid functions like these need to return errors rather than panicking, so that callers can detect when the validation checks have failed. Otherwise, there's no reason to have these functions at all, because any access of invalid values would trigger panics, anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's the status of this? @alpe @webmaster128 do you think this is good as it stands?
I think the length limit for the keys introduced here is too low, given that there has not been any restriction in the past and leveldb supports arbitrary key lengths. I do understand that long keys are not desirable for performance reasons and I am happy to discourage the use of long keys from now on. But since the goal here is to get a deterministic limit, any other value works as well. I suggest |
done, now it's:
Minus one so the max length is all ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all for considering the CosmWasm use case ❤️
store/types/validity.go
Outdated
// 128K - 1 | ||
MaxKeyLength = 1<<17 - 1 | ||
// 2G - 1 | ||
MaxValueLength = 1<<31 - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of those values are off. Can we just write 128 * 1024
and 2 * 1024 * 1024 * 1024
and live a happy life?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just think about how many bits are needed to encode the length, that's why I prefer all 1
s value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, but 1<<17 is not 2^17 so I wonder if this notation is helpful.
Anyways, leave it up to you folks. But right now the comment and the values do not fit together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://go.dev/play/p/8pC3XGTDRZw
I think they are the same values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my bad sorry! I take everything back.
But nice TIL: In python the operator precedence is different:
>>> # What I tried
>>> 1<<17 - 1
65536
>>> # What Go does
>>> (1<<17) - 1
131071
>>> # What Python does
>>> 1<<(17 - 1)
65536
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, maybe we better to add the parentheses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's helpful, thanks. Python, Rust and JavaScript evaluate one way. Go and Swift evaluate a different way.
// AssertValidKey checks if the key is valid(key is not nil) | ||
const ( | ||
// 128K - 1 | ||
MaxKeyLength = (1 << 17) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have preferred these two constants to be variables that can be modified at startup, I think the key-lengths are reasonable but we don't have full knowledge of what other protocols are doing ( I doubt even 'them' have no clear idea of the size of their own keys)
EG in cmd/appd/main.go
:
func main() {
storetypes.MaxKeyLength = my safe limit
storetypes.MaxValueLength = my safe limit
rootCmd().Execute ....
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable to me. It would allow new chains to be much more restrictive on key lengths for example. Existing chains can scan their keys in usage and put a limit based on their real world data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving since the MaxKeyLength is now a non breaking change, but something that can be customized.
We should add a big changelog entry for this!
+1, and a section in the UPGRADING.md |
// 2G - 1 | ||
MaxValueLength = (1 << 31) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default size of a LevelDB log file is 2MB and compaction occurs at 4MB. It is a core assumption of LevelDB operations, like compaction, that these files will contain many key/val pairs. Many promises of the database, including but not limited to performance, rely on that assumption.
The SDK uses the default settings (edit: specifically these default settings) when creating a GoLevelDB backend. If the SDK writes a single key/val pair of larger than 4MB, then it will occupy its own unique file. This may not be a problem if there are only a handful of key/val pairs that are that large, but if there are more than just a few, then it does become an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large keys definitely hurt performance because they are usually part of the index structure to do binary search, and need to be loaded into ram to be performant. I guess no one is actually using large keys in practice.
Maybe a better thing to do is setting a smaller limit in sdk, but third-parties can enlarge it since it's a variable, it's safer than having a bigger limit in sdk while third-party setting a smaller one. But we'll have more communication to do in this way, WDYT? @webmaster128 @testinginprod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CosmWasm never allowed values greater than 128 KiB. I don't know how much wrapping overhead there is in wasmd (probably none) and iavl. But I guess any limit >= 256 KiB would not hurt us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large keys
We are talking about values, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keys ;D I think there's less worries about large values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2-4MB limits I discuss above are impacted by the total size of a key/val pair, not just the key size or val size in isolation. My main point is that if you want to increase either of those limits, or both of them, to something approaching those 2-4MB limits, then you would also need to increase the limits of LevelDB itself; increasing the former without changes to the latter would result in really bad outcomes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
wanted to give an update: after adr 038 is merged we will tag store package then merge this pr and work on integrating iavl changes. |
hey we just tagged v0.1.0 of store, this will be included in the next release so we can merge this now |
Description
There are always limits on size of key/values in underlying database/OS/hardware, so instead of relying on those arbitrary and indeterministic limits, we should define a deterministic and reasonable one in application.
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change