Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: adr-40: reduce multistore and make it atomic #9355

Merged
merged 27 commits into from
Oct 22, 2021

Conversation

robert-zaremba
Copy link
Collaborator

@robert-zaremba robert-zaremba commented May 18, 2021

Description

This ADR-40 update discusses the MultiStore removal.

Closes: #10013
Related discussion: #8297 (comment)


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (see CONTRIBUTING.md)
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards.
  • Wrote unit and integration tests
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • Added relevant godoc comments.
  • Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer
  • Review Codecov Report in the comment section below once CI passes

@robert-zaremba robert-zaremba added C:Store T: ADR An issue or PR relating to an architectural decision record labels May 18, 2021
@github-actions github-actions bot added the T:Docs Changes and features related to documentation. label May 18, 2021
@robert-zaremba
Copy link
Collaborator Author

robert-zaremba commented May 18, 2021

Alternative to the prefix store based design is to use a concept proposed by AiB (@jgimeno , @fdymylja ):
We require that all objects stored in store will need to implement the following interface:

type Record interface {
   proto.Message
   ID() []byte
}

Where, ID method will return a unique key for each proto.Message stored in the store. It will use proto type URL as a prefix to handle conflicts between object classes, example:

coin.ID() == "cosmos/base/coin/<holder_address>/<denom>"

NOTE: AiB proposed different interface, but the concept is similar.

@robert-zaremba
Copy link
Collaborator Author

@aaronc , @alexanderbez , @alessio, @adlerjohn, @i-norden - let me know what do you think about this update? Shall we do more research before creating a PR?

@tac0turtle
Copy link
Member

We require that all objects stored in store will need to implement the following interface:

This seems like it can lead to easy user errors. I could be wrong, but it adds more boilerplate IMO.

@robert-zaremba
Copy link
Collaborator Author

For better organization, let's continue the discussion about multistore in Replace IAVL and decouple state commitment from storage #8297 discussion.

@fdymylja
Copy link
Contributor

Just as a short update on the work being done, we're moving away from the GetID interface.

The user does not need to implement GetID anymore but rather just mark the PrimaryKey field during the StateObject registration and then the store implementation already knows how to encode that field (not matter what type it is) into a valid byte key.

For example: https://github.com/allinbits/cosmos-sdk-poc/blob/frojdi/crisis/runtime/orm/schema/schema.go#L71

This removes the need to implement any kind of interface as now the store is smart enough to recognize, given a protobuf message, which is its primary key, how to get it and how to encode it.

@alexanderbez
Copy link
Contributor

alexanderbez commented May 19, 2021

@aaronc , @alexanderbez , @alessio, @adlerjohn, @i-norden - let me know what do you think about this update? Shall we do more research before creating a PR?

I like it, but it requires more lift from devs, which is probably be OK? It seems like the updated proposal from @fdymylja is a bit cleaner.

@tac0turtle
Copy link
Member

@ValarDragon brought up a use case for multistores. If I as an app developer want to use a different state commitment for my module, multistores allow this. Is there a way to integrate such a feature into this design?

Maybe @ValarDragon can also say a bit more about the use case.

@robert-zaremba robert-zaremba marked this pull request as ready for review May 28, 2021 13:37
@robert-zaremba
Copy link
Collaborator Author

If I as an app developer want to use a different state commitment for my module

I see the motivation. I don't think it justifies the complexity and other problems listed in the Multistore section of this ADR.

Moreover, currently multistore is using the same DB, so if we were to allow different mechanisms for SC we would nee to separate everything and add one more layer of merkle tree to commit to all stores.

@robert-zaremba
Copy link
Collaborator Author

Updates

  • Added section about multistore removal
  • Added requirement about accessing old state of SC
  • Added more details about the database setup.

```

Where `store.Code(prefix)` is a Huffman Code of `prefix` in the given `store`.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronc originally suggested to map a module "verbose" key to a 2 byte sequence prefix for key prefix compression, eg:

bank -> 0x0001
staking -> 0x0002
....

In both mechanisms we will need to assure that the created map will be stable (we will always construct the same mapping pairs).

NOTE: Both Huffman Codes and module key map compress the keys only for the SS. It's not needed for SC because in SC we always operate on a hash of a key.

Copy link
Collaborator Author

@robert-zaremba robert-zaremba May 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Aaron idea is easier to manage. The limitation is that we bound the number of modules to 65536 (2^16) - which is big enough to not worry about it now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this prefix model work effectively, I think there should be some stateful prefix map - possibly stored under some special restricted schema prefix (probably 0) on the RootStore. We can use varint encoding and not require a restriction to 2 bytes, but also 2 bytes is probably fine.

In this restricted schema key-space, we would have a mapping from key name -> compressed prefix, which is itself prefixed to allow for other schema use cases.

How does that sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important is it to have this optimization at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String prefixes could potentially be rather long (maybe full proto msg names eventually as discussed in #9156). I think it's relatively important.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to enable variable prefix length, then huffman coding is better, because it takes frequency into account and it is also protects against common prefixes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: we decided to use varint.

@ValarDragon
Copy link
Contributor

ValarDragon commented Jun 3, 2021

IMO there are important usecases that want to be able to allow sub-trees to have a different merkelization / proof format.

E.g. for the use-case of having a multi-asset shielded pool that reuses existing SNARK circuits, we need the merkle tree to not be IAVL/ our SMT

For making SNARK proofs of merkle tree auth paths, you want different merkle tree structures (e.g. often non-binary) & different hash functions.

This was previously pretty doable by using a different store within the multistore.

Co-authored-by: Ryan Christoffersen <[email protected]>
* Update on multistore refactor and IBC proof

* cleanup whitespace

* Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

Co-authored-by: Robert Zaremba <[email protected]>

* revise for PR

* add todo

* Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

Co-authored-by: Robert Zaremba <[email protected]>

Co-authored-by: Robert Zaremba <[email protected]>
@robert-zaremba
Copy link
Collaborator Author

robert-zaremba commented Oct 6, 2021

Taking #9156 into account, a few things I think are important:

being able to generically compress any proto message typeURL to a 1-2 byte prefix for SS that is stable (I suggest a varint managed by a schema sub-store)
we probably can get in-memory caching for free because the ORM layer can handle that and even skip decoding for frequently accessed objects, so memory and transient stores may be less needed...

@aaronc I think this is out of the scope of this PR. That being said - it's a good update and I think we should design for key compression. I will add that note as a TODO in this PR and let's specify it in a new PR.

@robert-zaremba
Copy link
Collaborator Author

robert-zaremba commented Oct 6, 2021

Updates

  • I rephrased the paragraph about using two separate DB instances
  • Added a note about private stores and use of the low level interface
  • The RootStore interface is work in progress and will be updated once we will progress with the implementation.
  • Added more proposals for the key compression and left todo to make a decision about it. Let's iterate about it later on.
  • removed proposal to extend KVStore with WithPrefix

Let's move forward and merge this PR. I left few TODOs which are not directly related to this PR. They will be resolved in the next iteration - the ADR is still in DRAFT.

@robert-zaremba
Copy link
Collaborator Author

@alexanderbez , @aaronc - let's finalize this PR and merge as an iterative update. For not resolved parts I've added TODOs.

assert( !k1.hasPrefix(k2) )
```

NOTE: We need to assure that the codes won't change. Huffman Coding depends on the keys and its frequency - so we would need to generate the codes and then fix the mapping in a static variable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this couldn't change when stores are added/renamed, since it will only be used in the SS implementation. The current mapping could just be stored in a separate namespace in the DB.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense.

Copy link
Member

@aaronc aaronc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this looks good (pending @roysc's suggestions getting merged).

I don't agree with the Huffman Coding part and suggest we delete it. We don't know all prefixes a priori and this set can change. Instead, as I proposed before, we can keep a simple mapping of prefixes and use varint encoding. Specifically, how about this:

  • we define the prefix 0x0 in SS as the prefix for special schema sub-store to be used internally by RootStore
  • in the schema sub-store we store three things:
    • map of store key -> prefix
    • map of prefix -> store key
    • auto-incrementing sequence for the next prefix

We can just use auto-incrementing varints which ensure no collisions and are short and easy to assign.

@robert-zaremba robert-zaremba mentioned this pull request Oct 13, 2021
20 tasks
@tac0turtle
Copy link
Member

@robert-zaremba can you address aarons comment. Lets merge this. Its been open for 6 months

@robert-zaremba robert-zaremba added the A:automerge Automatically merge PR once all prerequisites pass. label Oct 21, 2021
@robert-zaremba robert-zaremba changed the title adr-40: reduce multistore and make it atomic doc: adr-40: reduce multistore and make it atomic Oct 21, 2021
@robert-zaremba robert-zaremba changed the title doc: adr-40: reduce multistore and make it atomic docs: adr-40: reduce multistore and make it atomic Oct 22, 2021
@robert-zaremba robert-zaremba merged commit f3ffb33 into master Oct 22, 2021
@robert-zaremba robert-zaremba deleted the robert/adr-40-update branch October 22, 2021 10:45
creachadair pushed a commit that referenced this pull request Oct 22, 2021
* adr-40: use prefix store instead of multistore

* add note about prefix.Store

* Update SC and SS setup information and historical versions sepc

* add note about key prefix optimization

* rephrased the changes related to multistore

* Apply suggestions from code review

Co-authored-by: Ryan Christoffersen <[email protected]>

* Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

* Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

* Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

Co-authored-by: Aleksandr Bezobchuk <[email protected]>

* design update

* update merkle proofs

* Apply suggestions from code review

Co-authored-by: Aleksandr Bezobchuk <[email protected]>

* reword huffman compression paragraph

* ADR-40: update on multi-store refactor and IBC proofs (#10191)

* Update on multistore refactor and IBC proof

* cleanup whitespace

* Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

Co-authored-by: Robert Zaremba <[email protected]>

* revise for PR

* add todo

* Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

Co-authored-by: Robert Zaremba <[email protected]>

Co-authored-by: Robert Zaremba <[email protected]>

* review updates

* add todo for protobuf message type compression

* add link to a discussion

* guarantee atomic commit with IBC workaround proposal

* adding more links to references

* Apply suggestions from code review

Co-authored-by: Roy Crihfield <[email protected]>

* reword the module key compression part

Co-authored-by: Ryan Christoffersen <[email protected]>
Co-authored-by: Federico Kunze <[email protected]>
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Roy Crihfield <[email protected]>
@robert-zaremba robert-zaremba mentioned this pull request Oct 26, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:automerge Automatically merge PR once all prerequisites pass. C:Store T: ADR An issue or PR relating to an architectural decision record T:Docs Changes and features related to documentation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update ADR-040 with design for multistore deprecation