Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: limit the length of key/value #14645

Merged
merged 15 commits into from
Mar 17, 2023
Merged

feat!: limit the length of key/value #14645

merged 15 commits into from
Mar 17, 2023

Conversation

yihuang
Copy link
Collaborator

@yihuang yihuang commented Jan 17, 2023

Description

There are always limits on size of key/values in underlying database/OS/hardware, so instead of relying on those arbitrary and indeterministic limits, we should define a deterministic and reasonable one in application.


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

so the limit is deterministicly defined,
rather than some arbitrary limit decided by underlying database.
@yihuang yihuang requested a review from a team as a code owner January 17, 2023 10:37
@github-prbot github-prbot requested review from a team, aaronc and testinginprod and removed request for a team January 17, 2023 10:38
CHANGELOG.md Outdated Show resolved Hide resolved
store/types/validity.go Outdated Show resolved Hide resolved

const (
// 32K
MaxKeyLength = math.MaxUint16
Copy link
Collaborator Author

@yihuang yihuang Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxUint16 is picked so the length can be encoded in 2bytes with fixed length, should be more than enough for all use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2^16 bytes (64 KiB) is definitely not enough for CosmWasm values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value is 2^32, 2^16 is for keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any length limits for keys on DBs used that are shorter than 2^32? Personally, I don't see any reasons why we should have a shorter limit for keys than for values. The original motivation was deterministic behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👆 Data structures like sets (i.e. a map from key to nothing) store all its data in keys. So treating keys and value differently should be avoided if possible IMO.

@yihuang yihuang changed the title limit the length of key/value feat: limit the length of key/value Jan 17, 2023
@yihuang yihuang changed the title feat: limit the length of key/value feat!: limit the length of key/value Jan 17, 2023
// 32K
MaxKeyLength = math.MaxUint16
// 2G
MaxValueLength = math.MaxUint32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should cover all cases even in CW?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alpe do you happen to know?

Copy link
Collaborator Author

@yihuang yihuang Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is wrong, the limit is 4G, since it's uint32, I think there was no limit before? we are collecting opinions on this.

Copy link
Collaborator Author

@yihuang yihuang Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxUint32 is 4G, yeah, it's a breaking change, but I believe it's a necessary one and need to be done sooner or later.

Copy link
Collaborator Author

@yihuang yihuang Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, but breaking changes have enormous and wide-ranging impact, and so must be motivated by something stronger than belief 😃 What concrete rationale supports this breaking change across all Cosmos networks?

deterministic state machine, as I explained in PR description. the goal is set a deterministic limit which are supported by all supported db backends, and can cover hopefully all existing use cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we need to be a little bit smaller than hard limit, because iavl leaf node encode both key and value and a few other bytes together as value in low level db, I'd prefer 2G(aka. MaxInt32) if it's enough.

Copy link
Contributor

@alpe alpe Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got feedback from Mauro on CosmWasm limits for contracts. They are much lower than the suggested values: https://github.com/CosmWasm/cosmwasm/blob/main/packages/vm/src/imports.rs#L32-L35
Please note all contract data is stored in prefix store so the actual key/value length can be a bit bigger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note there is no good reason for having different values in MAX_LENGTH_DB_KEY and MAX_LENGTH_DB_VALUE in CosmWasm. It was a gut feeling from the early days and if contract developers exceed those they are most likely doing something wrong. For the next iteration of that code, I'd give them both the same value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks @peterbourgon.

In this case my only concern is that the current key limit is smaller than the limit that CosmWasm set before (64 KiB + wasmd prefix + iavl wrapper?) and we have no way to know who is using values close to this limit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congratulations to this clever analysis. It's not helping though. You asked for empirical data, worried about breaking and other people have been asking for our insights. And yes, in a smart contract scenario we cannot control what people are using right now within the currently enforced range.

store/types/validity.go Outdated Show resolved Hide resolved
Comment on lines +12 to 30
// AssertValidKey checks if the key is valid(key is not nil, not empty and within length limit)
func AssertValidKey(key []byte) {
if len(key) == 0 {
panic("key is nil")
panic("key is nil or empty")
}
if len(key) > MaxKeyLength {
panic("key is too large")
}
}

// AssertValidValue checks if the value is valid(value is not nil)
// AssertValidValue checks if the value is valid(value is not nil and within length limit)
func AssertValidValue(value []byte) {
if value == nil {
panic("value is nil")
}
if len(value) > MaxValueLength {
panic("value is too large")
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid functions like these need to return errors rather than panicking, so that callers can detect when the validation checks have failed. Otherwise, there's no reason to have these functions at all, because any access of invalid values would trigger panics, anyway.

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the status of this? @alpe @webmaster128 do you think this is good as it stands?

@webmaster128
Copy link
Member

webmaster128 commented Jan 24, 2023

So what's the status of this? @alpe @webmaster128 do you think this is good as it stands?

I think the length limit for the keys introduced here is too low, given that there has not been any restriction in the past and leveldb supports arbitrary key lengths. I do understand that long keys are not desirable for performance reasons and I am happy to discourage the use of long keys from now on. But since the goal here is to get a deterministic limit, any other value works as well.

I suggest 126 128 KiB (131072 bytes) for keys. This ensure all 64 KiB keys of CosmWasm + wrapping overhead remains supported.

@yihuang
Copy link
Collaborator Author

yihuang commented Jan 26, 2023

So what's the status of this? @alpe @webmaster128 do you think this is good as it stands?

I think the length limit for the keys introduced here is too low, given that there has not been any restriction in the past and leveldb supports arbitrary key lengths. I do understand that long keys are not desirable for performance reasons and I am happy to discourage the use of long keys from now on. But since the goal here is to get a deterministic limit, any other value works as well.

I suggest 126 128 KiB (131072 bytes) for keys. This ensure all 64 KiB keys of CosmWasm + wrapping overhead remains supported.

done, now it's:

  • key: 128K - 1
  • value: 2G - 1

Minus one so the max length is all ones.

Copy link
Member

@webmaster128 webmaster128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for considering the CosmWasm use case ❤️

// 128K - 1
MaxKeyLength = 1<<17 - 1
// 2G - 1
MaxValueLength = 1<<31 - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of those values are off. Can we just write 128 * 1024 and 2 * 1024 * 1024 * 1024 and live a happy life?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just think about how many bits are needed to encode the length, that's why I prefer all 1s value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, but 1<<17 is not 2^17 so I wonder if this notation is helpful.

Anyways, leave it up to you folks. But right now the comment and the values do not fit together.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://go.dev/play/p/8pC3XGTDRZw
I think they are the same values?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad sorry! I take everything back.

But nice TIL: In python the operator precedence is different:

>>> # What I tried
>>> 1<<17 - 1
65536

>>> # What Go does
>>> (1<<17) - 1
131071

>>> # What Python does
>>> 1<<(17 - 1)
65536

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, maybe we better to add the parentheses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's helpful, thanks. Python, Rust and JavaScript evaluate one way. Go and Swift evaluate a different way.

// AssertValidKey checks if the key is valid(key is not nil)
const (
// 128K - 1
MaxKeyLength = (1 << 17) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have preferred these two constants to be variables that can be modified at startup, I think the key-lengths are reasonable but we don't have full knowledge of what other protocols are doing ( I doubt even 'them' have no clear idea of the size of their own keys)

EG in cmd/appd/main.go:

func main() {
       storetypes.MaxKeyLength = my safe limit
       storetypes.MaxValueLength = my safe limit
       
       rootCmd().Execute ....
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me. It would allow new chains to be much more restrictive on key lengths for example. Existing chains can scan their keys in usage and put a limit based on their real world data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, done.

@testinginprod testinginprod self-requested a review January 27, 2023 10:06
Copy link
Contributor

@testinginprod testinginprod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since the MaxKeyLength is now a non breaking change, but something that can be customized.

We should add a big changelog entry for this!

@julienrbrt
Copy link
Member

Approving since the MaxKeyLength is now a non breaking change, but something that can be customized.

We should add a big changelog entry for this!

+1, and a section in the UPGRADING.md

Comment on lines +6 to +7
// 2G - 1
MaxValueLength = (1 << 31) - 1
Copy link

@peterbourgon peterbourgon Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default size of a LevelDB log file is 2MB and compaction occurs at 4MB. It is a core assumption of LevelDB operations, like compaction, that these files will contain many key/val pairs. Many promises of the database, including but not limited to performance, rely on that assumption.

The SDK uses the default settings (edit: specifically these default settings) when creating a GoLevelDB backend. If the SDK writes a single key/val pair of larger than 4MB, then it will occupy its own unique file. This may not be a problem if there are only a handful of key/val pairs that are that large, but if there are more than just a few, then it does become an issue.

Copy link
Collaborator Author

@yihuang yihuang Jan 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large keys definitely hurt performance because they are usually part of the index structure to do binary search, and need to be loaded into ram to be performant. I guess no one is actually using large keys in practice.
Maybe a better thing to do is setting a smaller limit in sdk, but third-parties can enlarge it since it's a variable, it's safer than having a bigger limit in sdk while third-party setting a smaller one. But we'll have more communication to do in this way, WDYT? @webmaster128 @testinginprod

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CosmWasm never allowed values greater than 128 KiB. I don't know how much wrapping overhead there is in wasmd (probably none) and iavl. But I guess any limit >= 256 KiB would not hurt us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large keys

We are talking about values, right?

Copy link
Collaborator Author

@yihuang yihuang Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys ;D I think there's less worries about large values

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2-4MB limits I discuss above are impacted by the total size of a key/val pair, not just the key size or val size in isolation. My main point is that if you want to increase either of those limits, or both of them, to something approaching those 2-4MB limits, then you would also need to increase the limits of LevelDB itself; increasing the former without changes to the latter would result in really bad outcomes.

Copy link
Contributor

@cool-develope cool-develope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@tac0turtle
Copy link
Member

wanted to give an update: after adr 038 is merged we will tag store package then merge this pr and work on integrating iavl changes.

@tac0turtle
Copy link
Member

hey we just tagged v0.1.0 of store, this will be included in the next release so we can merge this now

@tac0turtle tac0turtle enabled auto-merge (squash) March 17, 2023 14:05
@tac0turtle tac0turtle merged commit 8944ab6 into cosmos:main Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants