-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented function for computing of old account states based on deltas #563
base: next
Are you sure you want to change the base?
Conversation
# Conflicts: # crates/store/Cargo.toml
# Conflicts: # Cargo.lock # Cargo.toml # crates/block-producer/Cargo.toml # crates/store/Cargo.toml # crates/store/src/db/tests.rs
…-acc-states # Conflicts: # crates/store/src/db/sql/mod.rs # crates/store/src/db/sql/queries/compute_old_account_states.sql
…-acc-states # Conflicts: # crates/store/src/db/sql/queries/select_merged_deltas.sql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just style/nit suggestions :) LGTM.
Are you thinking of this as a temporary solution? Because I agree that computing account states based on deltas from account creation is probably not a good solution. And if it is a temporary solution - is it actually worth it? Maybe we should go for a longer-terms solution. In general, as I mentioned in one other comment, let's look at the whole account data storage/retrieval area more holistically. Let's list the use-cases we are trying to solve for and then see what solution would address them well. |
crates/rpc/src/server/api.rs
Outdated
@@ -188,7 +188,7 @@ impl api_server::Api for RpcApi { | |||
self.block_producer.clone().submit_proven_transaction(request).await | |||
} | |||
|
|||
/// Returns details for public (public) account by id. | |||
/// Returns details for public account by ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit misleading. This method still works for non-public accounts, right? It just doesn't return details for them (ie, it just returns the stored hash). There are some other related functions that have the same problem IIRC (like select_accounts_by_ids
which get_account_proofs
uses, both for private and public accounts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might have already been discussed (apologies if so), but considering the "usability" of account states decreases as they move further and further from the chain tip, would it make sense (if it is even possible) to retrieve the latest state and subtract deltas instead of building them up from 0?
crates/store/src/db/sql/mod.rs
Outdated
a.block_num < b.block_num | ||
) | ||
ORDER BY | ||
account_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this work? It looks a bit more readable to me, although there's a good possibility it's not as performant (but with indices and fairly large amount of updates on each account, it might?).
SELECT
account_id,
nonce
FROM
account_deltas
WHERE
account_id IN rarray(?1)
AND block_num <= ?2
AND block_num = (
SELECT MAX(block_num)
FROM account_deltas b
WHERE b.account_id = account_deltas.account_id AND b.block_num <= ?2
)
ORDER BY
account_id;
crates/store/src/db/sql/mod.rs
Outdated
} | ||
|
||
let mut rows = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could be nice to have some section separators or just comments to neatly divide each segment of the account retrieval (or maybe just make these their own functions directly)
# Conflicts: # crates/proto/src/generated/account.rs # crates/rpc-proto/proto/account.proto # crates/store/src/db/sql/mod.rs # crates/store/src/db/tests.rs # proto/account.proto
asset: Asset, | ||
is_remove: bool, | ||
) -> Result<()> { | ||
let account = self.try_get_mut(account_id)?; | ||
|
||
if is_remove { | ||
account.vault_mut().remove_asset(asset) | ||
} else { | ||
account.vault_mut().add_asset(asset) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidebar: could be worth creating AssetUpdate
which bundles Asset
+ is_remove
, deprecating remove_asset
and add_asset
in favor of Asset::update(&AssetUpdate)
?
I imagine we have quite a few of these if is_remove { ... }
implementations around.
// TODO: What if account was also changed after requested block number? In this case we will | ||
// miss the update (since `accounts` table stores only latest account states, | ||
// the corresponding record in the table will be updated to the state beyond the requested | ||
// block range). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, vacation brain rot occurred. What is this saying?
If the accounts table only stores the latest state then how/why do we allow querying by a block range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's might be confusing, but accounts
table does store the latest state of an account. If the account is updated, its state is replaced in the accounts
table. I re-read the code now and now I'm thinking that I probably misunderstood the idea of state sync and this behavior is expected:
miden-node/crates/store/src/state.rs
Lines 500 to 501 in b36a0fd
/// - `account_ids`: Include the account's hash if their _last change_ was in the result's block | |
/// range. |
@bobbinth what do you think?
Co-authored-by: Mirko <[email protected]>
I had proposed to write old values in state delta updates, but later we decided to remove it, because we can find old values from previous records. But now I'm thinking, it wasn't that bad idea to have old values stored. It would simplify and speed-up such queries significantly: https://github.com/0xPolygonMiden/miden-node/pull/563/files#diff-90615b33cfa131df3c490d264026c75a73772c02c43d3adaa373a632aca40503R306-R324 But in order to implement this, we would also need to implement reverse delta calculation on delta application in miden-base. @bobbinth, @Mirko-von-Leipzig, what do you think? |
crates/store/src/db/sql/mod.rs
Outdated
impl Accounts { | ||
fn try_get_mut(&mut self, id: &AccountId) -> Result<&mut Account> { | ||
self.0.get_mut(id).ok_or_else(|| { | ||
DatabaseError::DataCorrupted(format!("Account not found in DB: {id}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't DatabaseError::AccountNotFoundInDb
fit this description better?
if !id.is_public() { | ||
return Err(DatabaseError::AccountNotOnChain(*id)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this is used, this should likely be validated at the RPC level so maybe it's not entirely necessary to validate here as well if we assume calls from RPC to the store are to be trusted. Not sure if we still would want to remove this validation since it's very harmless, but just a note.
crates/store/src/db/sql/mod.rs
Outdated
select_latest_states_stmt.query(params![Rc::clone(&account_ids), block_number])?; | ||
while let Some(row) = rows.next()? { | ||
let account_details = row.get_ref(1)?.as_blob_or_null()?; | ||
if let Some(details) = account_details { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the else
for this if
panic or error? Since it would mean there is something wrong in the database since you checked previously that account IDs are public (so details should be there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks good. I left a couple of comments inline - but I do wonder if the current approach is going to be very computationally intensive.
Taking a step back, I wonder if we should break this up into a couple of endpoints. What the client is interested in is getting to a state of a public account not too far from the chain tip and there are really 3 parts to it:
- Get account header + storage header.
- Get account code.
- Get account vault.
- Get account storage maps.
The first one we can probably just cache in the database for every update of a public account for some number of recent blocks (e.g., 100 - 200). This will make it very fast to retrieve and will probably not result in too much extra storage pressure.
The second one could be handled pretty easily by having a separate table for storing account code keyed by code commitment. This table could be periodically pruned to get rid of obsolete code entries - but I would imaging that'll happen rather rarely.
For the vault, if the vault is small enough, we could return the entire vault. The size of the vault should be pretty easy to track for each of the account updates (it is just the number of assets in the vault). If the vault is big, we'd need to come up with a different strategy - e.g., either sending vault deltas or somehow paginating through assets.
For storage maps, we could try to apply the same strategy as for the vault - i.e., if the map is small enough, just send all key-value pairs, otherwise either send deltas or do some kind of pagination over key-value pairs.
Sending vault/storage map deltas should be pretty simple - and we kind of already do that. How to do pagination is not super clear to me yet.
Overall, before knowing if the above approach makes sense, we'll need to sketch out how would it affect the existing endpoints. I think these endpoints are:
SyncState
- maybe we should return account header for all public accounts?GetAccountStateDelta
- would we still need this endpoint?GetAccountDetails
- should we change this to be about getting vault and storage maps (either via deltas or pagination)?GetAccountProofs
- should we modify the map responses to send just key-value pairs if the maps are small enough? Or maybe we should transform this into a more generalGetAccountHeader
endpoint which could also optionally return account proofs?
#[cfg_attr(not(test), expect(dead_code))] // TODO: remove once the function is used | ||
pub fn compute_old_account_states( | ||
conn: &mut Connection, | ||
account_ids: &[AccountId], | ||
block_number: BlockNumber, | ||
) -> Result<Vec<Account>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move the contents of this function into a separate module? (e.g., under /src/db/sql/account_state.rs
).
Also, maybe it would be better to make it work with one account at a time? Is there a significant benefit to trying to process many accounts at the same time?
// *** Query latest states with calculated nonces *** | ||
|
||
// We select states with possible nonce updates in order to (re)construct `Account` with final | ||
// nonce, since we're not able to decrease nonce in `Account`. | ||
let mut select_latest_states_stmt = transaction.prepare_cached( | ||
" | ||
SELECT a.account_id, a.details, d.nonce | ||
FROM accounts a LEFT JOIN account_deltas d ON | ||
a.account_id = d.account_id AND | ||
d.block_num = ( | ||
SELECT MAX(block_num) | ||
FROM account_deltas | ||
WHERE | ||
block_num <= ?2 AND | ||
account_id = a.account_id | ||
) | ||
WHERE a.account_id IN rarray(?1)", | ||
)?; | ||
let mut rows = | ||
select_latest_states_stmt.query(params![Rc::clone(&account_ids), block_number])?; | ||
while let Some(row) = rows.next()? { | ||
let account_details = row.get_ref(1)?.as_blob_or_null()?; | ||
let id = read_from_blob_column(row, 0)?; | ||
let details = account_details.ok_or_else(|| { | ||
DatabaseError::DataCorrupted(format!( | ||
"No details for public account {id} in the database" | ||
)) | ||
})?; | ||
let account = Account::read_from_bytes(details)?; | ||
let old_nonce = row.get::<_, Option<u64>>(2)?; | ||
|
||
// If there is updated (old) nonce, reconstruct loaded details with the nonce. | ||
let account = if let Some(old_nonce) = old_nonce { | ||
let (id, vault, storage, code, _nonce) = account.into_parts(); | ||
let nonce = old_nonce.try_into().map_err(DatabaseError::DataCorrupted)?; | ||
|
||
Account::from_parts(id, vault, storage, code, nonce) | ||
} else { | ||
account | ||
}; | ||
|
||
accounts.0.insert(id, account); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to move this into a separate function to make reading the code easier?
Also, if I understood correctly, we do this so that we could set an older delta in the account, right? If so, this seems like a lot of work for a relatively simple update. Another option could be to enable nonce updates in minde-base
. But also, maybe we could just store some amount of historical data for each account to simplify this process.
One example, is that we could store account header + storage headers for all public accounts for the last
We could also move code
into a separate table where code_commitment
is the primary key. Given that code doesn't change too frequently (and there is probably going to be a lot of duplication) we probably don't sacrifice much by doing so.
If we do the above, the main problem we'd need to do is how to update storage maps and asset vault to to some past state - and this actually means that we won't need to deal with full account deltas but only with storage map and asset vault deltas. This, I think, should simplify the task quite a bit.
@bobbinth thank you so much for in-deep review! Yes, this idea might help to make things simpler as well, as the new solution might work faster.
It's unclear for me, which endpoints work good for consumers and which ones should be improved or removed. I was wondering, if @igamigo could give us more context regarding these endpoints and how useful (or not) they are from a client's perspective and what we should improve. |
I think in general this is still the case. In terms of whether the endpoints that Bobbin mentions are useful or not, we use all of them except The way syncing works right now on the client is that we get the response, which includes the So, in terms of what we don't have now but would be nice to have, I guess it boils down to allowing for non-current states for |
Related to #527
This function (
compute_old_account_states
) computes state for given accounts for the specified block number based on deltas for accounts stored in the database.The current solution is based on getting final account states and apply changes from the final states to the given block number in reverse order.