Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented function for computing of old account states based on deltas #563

Open
wants to merge 44 commits into
base: next
Choose a base branch
from

Conversation

polydez
Copy link
Contributor

@polydez polydez commented Dec 4, 2024

Related to #527

This function (compute_old_account_states) computes state for given accounts for the specified block number based on deltas for accounts stored in the database.

The current solution is based on getting final account states and apply changes from the final states to the given block number in reverse order.

Copy link
Contributor

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just style/nit suggestions :) LGTM.

crates/store/src/db/sql/utils.rs Show resolved Hide resolved
crates/store/src/db/tests.rs Show resolved Hide resolved
crates/store/src/db/sql/mod.rs Show resolved Hide resolved
crates/store/src/db/sql/mod.rs Outdated Show resolved Hide resolved
crates/store/src/db/sql/mod.rs Outdated Show resolved Hide resolved
crates/store/src/db/sql/mod.rs Outdated Show resolved Hide resolved
crates/store/src/db/sql/mod.rs Outdated Show resolved Hide resolved
crates/store/src/db/sql/mod.rs Outdated Show resolved Hide resolved
crates/store/src/db/sql/mod.rs Outdated Show resolved Hide resolved
crates/store/src/db/sql/mod.rs Outdated Show resolved Hide resolved
@bobbinth
Copy link
Contributor

bobbinth commented Dec 7, 2024

This function (compute_old_account_states) computes state for given accounts for the specified block number based on deltas for accounts stored in the database. Since it takes into account all deltas from the creation of account up to the current block, if account has a lot of changes, it might be more expensive than applying deltas backward from the latest account state to the specified block, so it might be a subject for future improvement.

Are you thinking of this as a temporary solution? Because I agree that computing account states based on deltas from account creation is probably not a good solution.

And if it is a temporary solution - is it actually worth it? Maybe we should go for a longer-terms solution.

In general, as I mentioned in one other comment, let's look at the whole account data storage/retrieval area more holistically. Let's list the use-cases we are trying to solve for and then see what solution would address them well.

@@ -188,7 +188,7 @@ impl api_server::Api for RpcApi {
self.block_producer.clone().submit_proven_transaction(request).await
}

/// Returns details for public (public) account by id.
/// Returns details for public account by ID.
Copy link
Collaborator

@igamigo igamigo Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit misleading. This method still works for non-public accounts, right? It just doesn't return details for them (ie, it just returns the stored hash). There are some other related functions that have the same problem IIRC (like select_accounts_by_ids which get_account_proofs uses, both for private and public accounts).

Copy link
Collaborator

@igamigo igamigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have already been discussed (apologies if so), but considering the "usability" of account states decreases as they move further and further from the chain tip, would it make sense (if it is even possible) to retrieve the latest state and subtract deltas instead of building them up from 0?

a.block_num < b.block_num
)
ORDER BY
account_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this work? It looks a bit more readable to me, although there's a good possibility it's not as performant (but with indices and fairly large amount of updates on each account, it might?).

SELECT
    account_id,
    nonce
FROM
    account_deltas
WHERE
    account_id IN rarray(?1)
    AND block_num <= ?2
    AND block_num = (
        SELECT MAX(block_num)
        FROM account_deltas b
        WHERE b.account_id = account_deltas.account_id AND b.block_num <= ?2
    )
ORDER BY
    account_id;

crates/store/src/db/sql/mod.rs Show resolved Hide resolved
Comment on lines 382 to 384
}

let mut rows =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could be nice to have some section separators or just comments to neatly divide each segment of the account retrieval (or maybe just make these their own functions directly)

crates/store/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +225 to +234
asset: Asset,
is_remove: bool,
) -> Result<()> {
let account = self.try_get_mut(account_id)?;

if is_remove {
account.vault_mut().remove_asset(asset)
} else {
account.vault_mut().add_asset(asset)
}
Copy link
Contributor

@Mirko-von-Leipzig Mirko-von-Leipzig Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidebar: could be worth creating AssetUpdate which bundles Asset + is_remove, deprecating remove_asset and add_asset in favor of Asset::update(&AssetUpdate)?

I imagine we have quite a few of these if is_remove { ... } implementations around.

crates/store/src/db/sql/mod.rs Outdated Show resolved Hide resolved
Comment on lines +1391 to +1394
// TODO: What if account was also changed after requested block number? In this case we will
// miss the update (since `accounts` table stores only latest account states,
// the corresponding record in the table will be updated to the state beyond the requested
// block range).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, vacation brain rot occurred. What is this saying?

If the accounts table only stores the latest state then how/why do we allow querying by a block range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's might be confusing, but accounts table does store the latest state of an account. If the account is updated, its state is replaced in the accounts table. I re-read the code now and now I'm thinking that I probably misunderstood the idea of state sync and this behavior is expected:

/// - `account_ids`: Include the account's hash if their _last change_ was in the result's block
/// range.

@bobbinth what do you think?

@polydez
Copy link
Contributor Author

polydez commented Jan 10, 2025

I had proposed to write old values in state delta updates, but later we decided to remove it, because we can find old values from previous records. But now I'm thinking, it wasn't that bad idea to have old values stored. It would simplify and speed-up such queries significantly: https://github.com/0xPolygonMiden/miden-node/pull/563/files#diff-90615b33cfa131df3c490d264026c75a73772c02c43d3adaa373a632aca40503R306-R324

But in order to implement this, we would also need to implement reverse delta calculation on delta application in miden-base.

@bobbinth, @Mirko-von-Leipzig, what do you think?

crates/store/src/db/sql/mod.rs Show resolved Hide resolved
crates/store/src/db/sql/mod.rs Outdated Show resolved Hide resolved
impl Accounts {
fn try_get_mut(&mut self, id: &AccountId) -> Result<&mut Account> {
self.0.get_mut(id).ok_or_else(|| {
DatabaseError::DataCorrupted(format!("Account not found in DB: {id}"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't DatabaseError::AccountNotFoundInDb fit this description better?

Comment on lines +245 to +247
if !id.is_public() {
return Err(DatabaseError::AccountNotOnChain(*id));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is used, this should likely be validated at the RPC level so maybe it's not entirely necessary to validate here as well if we assume calls from RPC to the store are to be trusted. Not sure if we still would want to remove this validation since it's very harmless, but just a note.

select_latest_states_stmt.query(params![Rc::clone(&account_ids), block_number])?;
while let Some(row) = rows.next()? {
let account_details = row.get_ref(1)?.as_blob_or_null()?;
if let Some(details) = account_details {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the else for this if panic or error? Since it would mean there is something wrong in the database since you checked previously that account IDs are public (so details should be there)

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good. I left a couple of comments inline - but I do wonder if the current approach is going to be very computationally intensive.

Taking a step back, I wonder if we should break this up into a couple of endpoints. What the client is interested in is getting to a state of a public account not too far from the chain tip and there are really 3 parts to it:

  1. Get account header + storage header.
  2. Get account code.
  3. Get account vault.
  4. Get account storage maps.

The first one we can probably just cache in the database for every update of a public account for some number of recent blocks (e.g., 100 - 200). This will make it very fast to retrieve and will probably not result in too much extra storage pressure.

The second one could be handled pretty easily by having a separate table for storing account code keyed by code commitment. This table could be periodically pruned to get rid of obsolete code entries - but I would imaging that'll happen rather rarely.

For the vault, if the vault is small enough, we could return the entire vault. The size of the vault should be pretty easy to track for each of the account updates (it is just the number of assets in the vault). If the vault is big, we'd need to come up with a different strategy - e.g., either sending vault deltas or somehow paginating through assets.

For storage maps, we could try to apply the same strategy as for the vault - i.e., if the map is small enough, just send all key-value pairs, otherwise either send deltas or do some kind of pagination over key-value pairs.

Sending vault/storage map deltas should be pretty simple - and we kind of already do that. How to do pagination is not super clear to me yet.

Overall, before knowing if the above approach makes sense, we'll need to sketch out how would it affect the existing endpoints. I think these endpoints are:

  • SyncState - maybe we should return account header for all public accounts?
  • GetAccountStateDelta - would we still need this endpoint?
  • GetAccountDetails - should we change this to be about getting vault and storage maps (either via deltas or pagination)?
  • GetAccountProofs - should we modify the map responses to send just key-value pairs if the maps are small enough? Or maybe we should transform this into a more general GetAccountHeader endpoint which could also optionally return account proofs?

Comment on lines +206 to +211
#[cfg_attr(not(test), expect(dead_code))] // TODO: remove once the function is used
pub fn compute_old_account_states(
conn: &mut Connection,
account_ids: &[AccountId],
block_number: BlockNumber,
) -> Result<Vec<Account>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move the contents of this function into a separate module? (e.g., under /src/db/sql/account_state.rs).

Also, maybe it would be better to make it work with one account at a time? Is there a significant benefit to trying to process many accounts at the same time?

Comment on lines +255 to +297
// *** Query latest states with calculated nonces ***

// We select states with possible nonce updates in order to (re)construct `Account` with final
// nonce, since we're not able to decrease nonce in `Account`.
let mut select_latest_states_stmt = transaction.prepare_cached(
"
SELECT a.account_id, a.details, d.nonce
FROM accounts a LEFT JOIN account_deltas d ON
a.account_id = d.account_id AND
d.block_num = (
SELECT MAX(block_num)
FROM account_deltas
WHERE
block_num <= ?2 AND
account_id = a.account_id
)
WHERE a.account_id IN rarray(?1)",
)?;
let mut rows =
select_latest_states_stmt.query(params![Rc::clone(&account_ids), block_number])?;
while let Some(row) = rows.next()? {
let account_details = row.get_ref(1)?.as_blob_or_null()?;
let id = read_from_blob_column(row, 0)?;
let details = account_details.ok_or_else(|| {
DatabaseError::DataCorrupted(format!(
"No details for public account {id} in the database"
))
})?;
let account = Account::read_from_bytes(details)?;
let old_nonce = row.get::<_, Option<u64>>(2)?;

// If there is updated (old) nonce, reconstruct loaded details with the nonce.
let account = if let Some(old_nonce) = old_nonce {
let (id, vault, storage, code, _nonce) = account.into_parts();
let nonce = old_nonce.try_into().map_err(DatabaseError::DataCorrupted)?;

Account::from_parts(id, vault, storage, code, nonce)
} else {
account
};

accounts.0.insert(id, account);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this into a separate function to make reading the code easier?

Also, if I understood correctly, we do this so that we could set an older delta in the account, right? If so, this seems like a lot of work for a relatively simple update. Another option could be to enable nonce updates in minde-base. But also, maybe we could just store some amount of historical data for each account to simplify this process.

One example, is that we could store account header + storage headers for all public accounts for the last $n$ blocks (where $n$ is the "look-back period" for which we allow users to query older states).

We could also move code into a separate table where code_commitment is the primary key. Given that code doesn't change too frequently (and there is probably going to be a lot of duplication) we probably don't sacrifice much by doing so.

If we do the above, the main problem we'd need to do is how to update storage maps and asset vault to to some past state - and this actually means that we won't need to deal with full account deltas but only with storage map and asset vault deltas. This, I think, should simplify the task quite a bit.

@polydez
Copy link
Contributor Author

polydez commented Jan 27, 2025

@bobbinth thank you so much for in-deep review! Yes, this idea might help to make things simpler as well, as the new solution might work faster.

Overall, before knowing if the above approach makes sense, we'll need to sketch out how would it affect the existing endpoints. I think these endpoints are:

  • SyncState - maybe we should return account header for all public accounts?
  • GetAccountStateDelta - would we still need this endpoint?
  • GetAccountDetails - should we change this to be about getting vault and storage maps (either via deltas or pagination)?
  • GetAccountProofs - should we modify the map responses to send just key-value pairs if the maps are small enough? Or maybe we should transform this into a more general GetAccountHeader endpoint which could also optionally return account proofs?

It's unclear for me, which endpoints work good for consumers and which ones should be improved or removed. I was wondering, if @igamigo could give us more context regarding these endpoints and how useful (or not) they are from a client's perspective and what we should improve.

@igamigo
Copy link
Collaborator

igamigo commented Jan 27, 2025

It's unclear for me, which endpoints work good for consumers and which ones should be improved or removed. I was wondering, if @igamigo could give us more context regarding these endpoints and how useful (or not) they are from a client's perspective and what we should improve.

I think in general this is still the case. In terms of whether the endpoints that Bobbin mentions are useful or not, we use all of them except GetAccountStateDelta, as mentioned in the linked comment. As for the others, all of them work fine right now, except for the limitation placed on GetAccountProofs since we cannot get past states.

The way syncing works right now on the client is that we get the response, which includes the AccountSummary for any account. If the account is public, we end up using the GetAccountDetails endpoint. This is probably not future proof because public accounts can obviously get pretty big. I think we always care about the latest state of the public account here, but Bobbin suggested refactoring this to allow for non-current states. I don't think this is necessary now from the client's standpoint, but also he might have some other use case in mind.

So, in terms of what we don't have now but would be nice to have, I guess it boils down to allowing for non-current states for GetAccountProofs and also having a way to (incrementally) get a public account's state because we do want to keep public accounts in sync as well (though maybe this process should be reworked entirely). If any of this is not clear enough feel free to let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants