Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented account state deltas storage and endpoint for getting account state deltas #418

Merged
merged 9 commits into from
Jul 27, 2024

Conversation

polydez
Copy link
Contributor

@polydez polydez commented Jul 25, 2024

Resolves #290

We decided to implement delta-based storage for public accounts (full account state for the latest state and deltas for all updates except creation of new account). This is needed for GetAccountStateDelta endpoint which is also implemented in this PR.

The only question I still have, if it enough to use block numbers for tracking account state updates on client or we need to use account nonces for that.

… account state deltas

chore: migrate to the latest `next` of `miden-base`
@polydez polydez requested review from igamigo and bobbinth July 25, 2024 15:46
@polydez polydez linked an issue Jul 25, 2024 that may be closed by this pull request
@polydez polydez marked this pull request as ready for review July 25, 2024 18:01
@polydez polydez force-pushed the polydez-account-state-deltas branch from d96b222 to 3b01b3d Compare July 25, 2024 18:03
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Thank you! I left a few comments inline - the main ones about figuring out whether we actually want to have both sides of the range to be inclusive, or if the from part should be exclusive.

In general, I think using block_num to define the range works better than using nonces for this - but would be good for @igamigo to confirm if this is fine on the client side.

Cargo.toml Show resolved Hide resolved
proto/store.proto Outdated Show resolved Hide resolved
proto/rpc.proto Outdated Show resolved Hide resolved
Comment on lines 104 to 108
message GetAccountStateDeltaRequest {
account.AccountId account_id = 1;
fixed32 from_block_num = 2;
fixed32 to_block_num = 3;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add some comments here - especially to clarify whether from/to ranges are inclusive or exclusive. I'm thinking that from_block_num should be exclusive but to_block_num should be inclusive. That is, if you get the delta, the account state should be the same as it was after to_block_num.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to depend on semantics and expectations and there are many different ways to interpret this. As a caller I might expect from..to to be the deltas between my current state in from to get to the state in to i.e.

deltas = state[to] - state[from]

crates/store/src/db/migrations/001-init.sql Outdated Show resolved Hide resolved
Comment on lines 570 to 571
/// Returns the state delta between `from_block` and `to_block` for the given account.
pub(crate) async fn get_account_state_delta(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to one of the previous comments: here and in other similar places we should clarify whether from/to values are inclusive or exclusive.

crates/store/src/db/sql.rs Outdated Show resolved Hide resolved
@polydez polydez requested a review from bobbinth July 26, 2024 05:30
@polydez
Copy link
Contributor Author

polydez commented Jul 26, 2024

In general, I think using block_num to define the range works better than using nonces for this - but would be good for @igamigo to confirm if this is fine on the client side.

I want to add, that we might want to use nonces in order to enforce limits to endpoint for current implementation. Our current implementation is great in terms of space consumption and it's pretty fast to get needed deltas, but for some accounts it might be needed to get enormous number of deltas in order to sync to the latest state. Obvious solution is to limit number of deltas processed by one single call to the endpoint. For cases, when deltas number is too hight it might be better to just request account's full state.

I see two solutions for it:

  1. Limit maximum number of deltas processed in current solution. This will require adding block number to the response in order for the client to be aware that not all requested deltas were returned and make next requests starting from the last returned block number until it get all needed deltas. Another solution is to just return an error, if number of deltas is too high and client will request account's full latest state.
  2. Limit maximum number of account changes requested (by nonce). Return an error, if number of nonces (to_nonce - from_nonce) requested is higher than the limit. In this case client should request account's full latest state.

Copy link
Contributor

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some sql comments, but looks good!

Cargo.toml Show resolved Hide resolved
crates/store/src/db/migrations/001-init.sql Outdated Show resolved Hide resolved

PRIMARY KEY (account_id, block_num),
CONSTRAINT fk_block_num FOREIGN KEY (block_num) REFERENCES block_headers(block_num)
) STRICT, WITHOUT ROWID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful using WITHOUT ROWID as it does have performance implications:

WITHOUT ROWID tables work best when individual rows are not too large. A good rule-of-thumb is that the average size of a single row in a WITHOUT ROWID table should be less than about 1/20th the size of a database page. That means that rows should not contain more than about 50 bytes each for a 1KiB page size or about 200 bytes each for 4KiB page size. WITHOUT ROWID tables will work (in the sense that they get the correct answer) for arbitrarily large rows - up to 2GB in size - but traditional rowid tables tend to work faster for large row sizes. This is because rowid tables are implemented as B*-Trees where all content is stored in the leaves of the tree, whereas WITHOUT ROWID tables are implemented using ordinary B-Trees with content stored on both leaves and intermediate nodes. Storing content in intermediate nodes causes each intermediate node entry to take up more space on the page and thus reduces the fan-out, increasing the search cost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good gut feel for sizes yet; but I imagine most non-trivial delta's will be larger than 200 bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will remove "WITHOUT ROWID" for at least tables with integer primary key.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left just one tiny nit inline.

Comment on lines 104 to 105
// Returns delta of the account states in the range from `from_block_num` (exclusive) to `to_block_num` (inclusive).
message GetAccountStateDeltaRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: we should wrap comments after 100 chars.

@bobbinth
Copy link
Contributor

I want to add, that we might want to use nonces in order to enforce limits to endpoint for current implementation. Our current implementation is great in terms of space consumption and it's pretty fast to get needed deltas, but for some accounts it might be needed to get enormous number of deltas in order to sync to the latest state. Obvious solution is to limit number of deltas processed by one single call to the endpoint. For cases, when deltas number is too hight it might be better to just request account's full state.

I see two solutions for it:

  1. Limit maximum number of deltas processed in current solution. This will require adding block number to the response in order for the client to be aware that not all requested deltas were returned and make next requests starting from the last returned block number until it get all needed deltas. Another solution is to just return an error, if number of deltas is too high and client will request account's full latest state.
  2. Limit maximum number of account changes requested (by nonce). Return an error, if number of nonces (to_nonce - from_nonce) requested is higher than the limit. In this case client should request account's full latest state.

I'm leaning toward the first approach. In general, I think we could introduce "epoch-based pagination" I described in #404 (comment) to a couple of endpoints and this would be one of them. We should also "stress-test" at some point to see what happens if the response is too big. The default Tonic message size is 4MB. Would be good to find out what happens if we exceed this limit (i.e., what will break and what kind of an error message we'll get). Let's create an issue for this.

@polydez polydez merged commit 5d3e0cb into next Jul 27, 2024
9 checks passed
@polydez polydez deleted the polydez-account-state-deltas branch July 27, 2024 04:34
@polydez
Copy link
Contributor Author

polydez commented Jul 27, 2024

Added issue: #422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement delta-based historical account storage/endpoints
3 participants