-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catchpoint: Optimize catchpoint #4254
Catchpoint: Optimize catchpoint #4254
Conversation
ledger/catchupaccessor.go
Outdated
@@ -330,12 +337,80 @@ func (c *CatchpointCatchupAccessorImpl) processStagingBalances(ctx context.Conte | |||
} | |||
|
|||
normalizedAccountBalances, err = prepareNormalizedBalancesV6(balances.Balances, c.ledger.GenesisProto()) | |||
isNotFinalEntry = make([]bool, len(balances.Balances)) | |||
for i, balance := range balances.Balances { | |||
isNotFinalEntry[i] = balance.IsNotFinalEntry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to store? the look below always accesses the current value ie. isNotFinalEntry[i]
while processing i'th entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old version catchpointFileBalancesChunkV5
won't have that field. This way it wont break with old catchpoint versions.
Codecov Report
@@ Coverage Diff @@
## master #4254 +/- ##
==========================================
+ Coverage 55.19% 55.28% +0.09%
==========================================
Files 398 398
Lines 50165 50263 +98
==========================================
+ Hits 27689 27789 +100
- Misses 20159 20162 +3
+ Partials 2317 2312 -5
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
} | ||
|
||
if moreRows { | ||
// we're done with this iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks inconsistent:
the original implementation supplies accountCount
as a batch size, but for resources we limit it in resCb. I think it would be clearer and consistent if processAllResources
would accept a batch size like this function. In this case we'll not need to circle the moreRows
/ chunkFull
from the callback down to the readers and back.
@@ -79,14 +85,18 @@ type encodedBalanceRecordV6 struct { | |||
Address basics.Address `codec:"a,allocbound=crypto.DigestSize"` | |||
AccountData msgp.Raw `codec:"b,allocbound=basics.MaxEncodedAccountDataSize"` | |||
Resources map[uint64]msgp.Raw `codec:"c,allocbound=basics.MaxEncodedAccountDataSize"` | |||
|
|||
// flag indicating whether there are more records for the same account coming up | |||
ExpectingMoreEntries bool `codec:"e"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if a client is running a version that doesn't have this change, they will get a message about "unrecognized msgp field e" or similar? Seems fine ... especially if there is a consensus upgrade coming up
Catchpoint apply code appears broken
master
|
Verified catchpoints creation and applying on betanet, all right. |
} else { | ||
var sqliteErr sqlite3.Error | ||
if errors.As(err, &sqliteErr) && sqliteErr.Code == sqlite3.ErrConstraint && sqliteErr.ExtendedCode == sqlite3.ErrConstraintUnique { | ||
// address exists: overflowed account record: find addrid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't expect you would rely on the DB to tell you if you re-used the same account twice, I assumed you would just be able to tell as you were going through the bals
list and saw a duplicate. But I guess you didn't want to assume the bals
list is ordered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could add a check here to make sure that the balance.encodedAccountData is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should balance.encodedAccountData
be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this mean you're on the second chunk for this account, and it's all resources now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea is base account info is always set, and overflowed entries just chunks with some more data.
Nicholas had a prev version with tracking account address from the top (see d895c5d) and this did not look elegant at all.
We do have resources counting logic so the applying catchpoint would break if Total* counters mismatch to actual counts (@nicholasguoalgorand do we have a test for this?) so relying on DB looks safe here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see. so the chunks really could come in any order, then, too. OK! but only one of the encodedAccountData blobs would actually make it into the accounts table — hopefully they are all the same.
} | ||
err = callback(addr, aidx, &resData, buf) | ||
count++ | ||
if resourceCount > 0 && count == resourceCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a comment here to explain what this callback(..., true) is for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 need a comment
ledger/catchpointwriter.go
Outdated
|
||
// DefaultMaxResourcesPerChunk defines the max number of resources that go in a singular chunk | ||
// 3000000 resources * 300B/resource => roughly max 1GB per chunk | ||
DefaultMaxResourcesPerChunk = 3000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1GB is a lot of RAM to handle a chunk, plus it's probably more with various copying/decoding activity going on... plus there are really big resources (app data) that can be greater than 300B. how about e.g. 10x or 100x smaller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also be tracking the size of the encoded resources, e.g. resourceBytesCount, but maybe just setting this to 1000, 5000, 10000 or something would be easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to 10x smaller
ledger/catchpointwriter.go
Outdated
@@ -35,26 +35,32 @@ const ( | |||
// BalancesPerCatchpointFileChunk defines the number of accounts that would be stored in each chunk in the catchpoint file. | |||
// note that the last chunk would typically be less than this number. | |||
BalancesPerCatchpointFileChunk = 512 | |||
|
|||
// DefaultMaxResourcesPerChunk defines the max number of resources that go in a singular chunk | |||
// 3000000 resources * 300B/resource => roughly max 1GB per chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the comment as well
ledger/catchpointwriter.go
Outdated
@@ -35,26 +35,32 @@ const ( | |||
// BalancesPerCatchpointFileChunk defines the number of accounts that would be stored in each chunk in the catchpoint file. | |||
// note that the last chunk would typically be less than this number. | |||
BalancesPerCatchpointFileChunk = 512 | |||
|
|||
// DefaultMaxResourcesPerChunk defines the max number of resources that go in a singular chunk | |||
// 3000000 resources * 300B/resource => roughly max 100MB per chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line still has 3,000,000 not 300k
Summary
Test Plan