Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement tracking of top-N online accounts in the ledger #1361

Merged
merged 13 commits into from
Sep 9, 2020

Conversation

zeldovich
Copy link
Contributor

This PR extends the ledger's account tracking to be able to efficiently get the top-N online accounts, for any round between the account base round and the latest round.

This will be used in a subsequent PR to periodically commit to a Merkle array of the top-N online accounts in the block header.

The most annoying complication in this PR has to do with our rewards scheme. When you take rewards into account, then it's possible for there to be two accounts, A and B, where the balance of A is larger than B at some round R, and then the balance of A becomes smaller than the balance of B at a later round R'>R, even though no transactions executed on either account A or B.

This PR deals with the above complication by using a notion of a "normalized" balance, which is a hypothetical number of microalgos your account should have had at round 0 to accrue your current balance by the round in which your account was last modified. Comparisons of these normalized balances are stable over time, and thus give us a deterministic way to compute the top-N accounts, so that all nodes agree on what they are.

@zeldovich zeldovich changed the base branch from nickolai/compactcert to master August 6, 2020 17:31
@zeldovich
Copy link
Contributor Author

(Force-pushed to rebase against master, which had some conflicting changes.)

Copy link
Contributor

@tsachiherman tsachiherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if the solution wouldn't be much simpler if we were to extend the accountbase by adding another optional column onlineaccountnormalizedbalance and having a single partial index for these ?

@zeldovich
Copy link
Contributor Author

I was initially worried that the full account state might be pretty large, compared to just the state needed for my purpose (online balance and the voting key), which is why I materialized a separate table to cache top online accounts.

But, I do like the simplicity of just adding a column to accountbase. And we can always optimize it later, if the database queries start becoming slow, if need be.

So, I'll get rid of the separate table, and revise this PR accordingly. Thank you for the suggestion!

Instead, add the normalized balance as a separate column to the existing
accountbase table, and construct an index on that table.
@zeldovich
Copy link
Contributor Author

I pushed a revised version. It ends up being a little shorter, but mostly because msgp no longer generates encoding logic for the onlineAccount struct. But it is simpler in that the database only has one table that stores account state.

func accountsAddNormalizedBalance(tx *sql.Tx, proto config.ConsensusParams) error {
var exists bool
err := tx.QueryRow("SELECT 1 FROM pragma_table_info('accountbase') WHERE name='normalizedonlinebalance'").Scan(&exists)
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the database versioning, you dont really need to verify that the column present/missing, but it wouldn't be incorrect either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out to be convenient because in the accountdb test code, I don't want to be invoking acctupdates code, but I still need some way to add the normalizedonlinebalance column. Without this check, I would need some other way to determine if this column has been added or not.

var accountsResetExprs = []string{
`DROP TABLE IF EXISTS acctrounds`,
`DROP TABLE IF EXISTS accounttotals`,
`DROP TABLE IF EXISTS accountbase`,
`DROP TABLE IF EXISTS onlineaccountbase`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line can be dropped, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, good catch.

@tsachiherman
Copy link
Contributor

I pushed a revised version. It ends up being a little shorter, but mostly because msgp no longer generates encoding logic for the onlineAccount struct. But it is simpler in that the database only has one table that stores account state.

thanks. I think it looks much better, and more imporrant : will perform much better on a high tx rate.

@tsachiherman
Copy link
Contributor

In onlineTop, why don’t you cache the most recent modification to any account, i.e.

modifiedAccounts := make(map[basics.Address]accountDelta)

and use these in order to update the candidates ? or maybe I'm missing something ?

i.e. either way you want to get the "latest" state for an address.

@zeldovich
Copy link
Contributor Author

That seems like a reasonable optimization. Implemented and pushed.

@tsachiherman
Copy link
Contributor

The changes that you've made so far looks good, but we also need to have the corresponding catchpoint tables changes. The catchpointbalances is using the same schema as accountbase, so that when flipping the switch, we can perform a quick rename from catchpointbalances-> accountbase.

As is, there are few missing changes:

  1. in applyCatchpointStagingBalances you'll want to recreate the index onlineaccountbals. The index will get deleted by DROP TABLE IF EXISTS accountbase_old, so you'll need to recreate it against the new table.
  2. the catchpointbalances schema need to be extended in resetCatchpointStagingBalances
  3. adding items to catchpointbalances is done by writeCatchpointStagingBalances where you'll want to re-implement your changes to accountsNewRound

The above does looks like lots of changes, but it really does boil down to handful of lines. Beside these, I think that the code looks good.

@ian-algorand ian-algorand added this to the Sprint 7 milestone Aug 10, 2020
@tsachiherman
Copy link
Contributor

btw - regarding your idea to materialize some of the accounts onto a separate table :
I think that your intuition is correct. We should do that ( eventually ), but as a MRU to accelerate the access to frequently used accounts. This would also go hand by hand with some more advanced data packing ( I don't have a concrete plan for that yet ).

@zeldovich
Copy link
Contributor Author

@tsachiherman , thanks for the pointers to where these normalized online balances need to be added for catchpoints.

I pushed the changes to support that; it'd be great if you can take a look when you have time, to see if I missed something.

One difference from what you were proposing to do lies in how indexes are handled. When a table is renamed, SQLite actually moves the indexes along with the table. So, there's no need to explicitly drop indices. However, we do need to create an index for the catchpointbalances table. The annoyance is that SQLite has no way to rename an index. Thus, there's this hack to create a unique name for the index (using the current timestamp).

@tsachiherman
Copy link
Contributor

@tsachiherman , thanks for the pointers to where these normalized online balances need to be added for catchpoints.

I pushed the changes to support that; it'd be great if you can take a look when you have time, to see if I missed something.

One difference from what you were proposing to do lies in how indexes are handled. When a table is renamed, SQLite actually moves the indexes along with the table. So, there's no need to explicitly drop indices. However, we do need to create an index for the catchpointbalances table. The annoyance is that SQLite has no way to rename an index. Thus, there's this hack to create a unique name for the index (using the current timestamp).

Thanks; I won't be able to take a look of these changes this week, but will do so right after the release.

@@ -136,60 +148,95 @@ func writeCatchpointStagingCreatable(ctx context.Context, tx *sql.Tx, addr basic
return nil
}

func writeCatchpointStagingBalances(ctx context.Context, tx *sql.Tx, bals []encodedBalanceRecord) error {
insertStmt, err := tx.PrepareContext(ctx, "INSERT INTO catchpointbalances(address, data) VALUES(?, ?)")
func writeCatchpointStagingBalances(ctx context.Context, tx *sql.Tx, bals []encodedBalanceRecord, proto config.ConsensusParams) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty confident that this isn't what you want to do:
Using the genesis config.ConsensusParams here means that accounts that would be using fast catchup would calculate the normalized online balances based on the genesis params and not by the most recent params.

I would suggest you'll:

  1. add the calculated normalized online balances to the catchpoint file
  2. "restore" it in the same way, without recalculating anything.
  3. add the calculated normalized online balances to the hashing of the account, so that the merkletrie would include that.
  4. update the version of the catchpoint file, so that it would "break" compatibility with older files.

I have yet had the need to perform #4, so I don't know how hard it would be. ( shouldn't be too hard ).

I will be available to help you with this one if you'd like next week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proto is used only for the value of proto.RewardUnit, which cannot change in a given network, so it's OK to pass the proto for the genesis block or the proto for a recent block.

Copy link
Contributor

@tsachiherman tsachiherman Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we should always use the genesis round proto in the call to
err = accountsNewRound(tx, deltas[i], creatableDeltas[i], protos[i+1]) as well.

since we don't want to make the assumption that "it won't change"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a reasonable idea; thanks! Pushed.

@zeldovich zeldovich marked this pull request as ready for review August 20, 2020 07:28
Copy link
Contributor

@tsachiherman tsachiherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

@tsachiherman tsachiherman merged commit 4849f51 into algorand:master Sep 9, 2020
@zeldovich zeldovich deleted the compactcert-top branch September 9, 2020 02:15
tsachiherman pushed a commit to tsachiherman/go-algorand that referenced this pull request Jul 7, 2021
)

Implement tracking of top-N online accounts in the ledger
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants