Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catchpoint: Optimize catchpoint #4254

Merged
merged 33 commits into from
Aug 23, 2022

Conversation

ghost
Copy link

@ghost ghost commented Jul 12, 2022

Summary

Test Plan

@ghost ghost self-assigned this Jul 12, 2022
@ghost ghost marked this pull request as draft July 12, 2022 16:01
@ghost ghost changed the title Optimize catchpoint Catchpoint: Optimize catchpoint Jul 12, 2022
@@ -330,12 +337,80 @@ func (c *CatchpointCatchupAccessorImpl) processStagingBalances(ctx context.Conte
}

normalizedAccountBalances, err = prepareNormalizedBalancesV6(balances.Balances, c.ledger.GenesisProto())
isNotFinalEntry = make([]bool, len(balances.Balances))
for i, balance := range balances.Balances {
isNotFinalEntry[i] = balance.IsNotFinalEntry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to store? the look below always accesses the current value ie. isNotFinalEntry[i] while processing i'th entry

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old version catchpointFileBalancesChunkV5 won't have that field. This way it wont break with old catchpoint versions.

@codecov
Copy link

codecov bot commented Jul 27, 2022

Codecov Report

Merging #4254 (cfa250d) into master (e4d6d42) will increase coverage by 0.09%.
The diff coverage is 72.13%.

@@            Coverage Diff             @@
##           master    #4254      +/-   ##
==========================================
+ Coverage   55.19%   55.28%   +0.09%     
==========================================
  Files         398      398              
  Lines       50165    50263      +98     
==========================================
+ Hits        27689    27789     +100     
- Misses      20159    20162       +3     
+ Partials     2317     2312       -5     
Impacted Files Coverage Δ
ledger/catchupaccessor.go 62.15% <56.36%> (-0.87%) ⬇️
ledger/accountdb.go 73.02% <75.45%> (+0.44%) ⬆️
ledger/catchpointtracker.go 62.89% <100.00%> (ø)
ledger/catchpointwriter.go 59.13% <100.00%> (+1.47%) ⬆️
ledger/tracker.go 74.78% <0.00%> (ø)
ledger/acctonline.go 79.41% <0.00%> (+0.52%) ⬆️
catchup/service.go 70.12% <0.00%> (+0.74%) ⬆️
data/transactions/verify/txn.go 44.64% <0.00%> (+0.89%) ⬆️
catchup/peerSelector.go 100.00% <0.00%> (+1.04%) ⬆️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ghost ghost marked this pull request as ready for review July 27, 2022 23:40
}

if moreRows {
// we're done with this iteration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks inconsistent:
the original implementation supplies accountCount as a batch size, but for resources we limit it in resCb. I think it would be clearer and consistent if processAllResources would accept a batch size like this function. In this case we'll not need to circle the moreRows / chunkFull from the callback down to the readers and back.

@@ -79,14 +85,18 @@ type encodedBalanceRecordV6 struct {
Address basics.Address `codec:"a,allocbound=crypto.DigestSize"`
AccountData msgp.Raw `codec:"b,allocbound=basics.MaxEncodedAccountDataSize"`
Resources map[uint64]msgp.Raw `codec:"c,allocbound=basics.MaxEncodedAccountDataSize"`

// flag indicating whether there are more records for the same account coming up
ExpectingMoreEntries bool `codec:"e"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if a client is running a version that doesn't have this change, they will get a message about "unrecognized msgp field e" or similar? Seems fine ... especially if there is a consensus upgrade coming up

@algorandskiy
Copy link
Contributor

Catchpoint apply code appears broken
feature

~/go/bin-cp-oom/catchpointdump net -l -n betanet.algodev.network -r 19780000 -s
[ Done ] Downloaded http://r4.betanet.algodev.network:4160/v1/betanet-v1.0/ledger/brycg
[ Done ] Loaded
Unable to load catchpoint file into in-memory database : sql: no rows in result set

master

catchpointdump net -l -n betanet.algodev.network -r 19780000 -s
[ Done ] Downloaded http://r3.betanet.algodev.network:4160/v1/betanet-v1.0/ledger/brycg
[ Done ] Loaded

@algorandskiy
Copy link
Contributor

vet: ledger/catchpointwriter_test.go:361:22: undeclared name: ioutil

Verified catchpoints creation and applying on betanet, all right.

} else {
var sqliteErr sqlite3.Error
if errors.As(err, &sqliteErr) && sqliteErr.Code == sqlite3.ErrConstraint && sqliteErr.ExtendedCode == sqlite3.ErrConstraintUnique {
// address exists: overflowed account record: find addrid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't expect you would rely on the DB to tell you if you re-used the same account twice, I assumed you would just be able to tell as you were going through the bals list and saw a duplicate. But I guess you didn't want to assume the bals list is ordered?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add a check here to make sure that the balance.encodedAccountData is empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should balance.encodedAccountData be empty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this mean you're on the second chunk for this account, and it's all resources now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is base account info is always set, and overflowed entries just chunks with some more data.
Nicholas had a prev version with tracking account address from the top (see d895c5d) and this did not look elegant at all.
We do have resources counting logic so the applying catchpoint would break if Total* counters mismatch to actual counts (@nicholasguoalgorand do we have a test for this?) so relying on DB looks safe here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see. so the chunks really could come in any order, then, too. OK! but only one of the encodedAccountData blobs would actually make it into the accounts table — hopefully they are all the same.

}
err = callback(addr, aidx, &resData, buf)
count++
if resourceCount > 0 && count == resourceCount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment here to explain what this callback(..., true) is for

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 need a comment


// DefaultMaxResourcesPerChunk defines the max number of resources that go in a singular chunk
// 3000000 resources * 300B/resource => roughly max 1GB per chunk
DefaultMaxResourcesPerChunk = 3000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1GB is a lot of RAM to handle a chunk, plus it's probably more with various copying/decoding activity going on... plus there are really big resources (app data) that can be greater than 300B. how about e.g. 10x or 100x smaller?

Copy link
Contributor

@cce cce Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also be tracking the size of the encoded resources, e.g. resourceBytesCount, but maybe just setting this to 1000, 5000, 10000 or something would be easier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to 10x smaller

@@ -35,26 +35,32 @@ const (
// BalancesPerCatchpointFileChunk defines the number of accounts that would be stored in each chunk in the catchpoint file.
// note that the last chunk would typically be less than this number.
BalancesPerCatchpointFileChunk = 512

// DefaultMaxResourcesPerChunk defines the max number of resources that go in a singular chunk
// 3000000 resources * 300B/resource => roughly max 1GB per chunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the comment as well

@@ -35,26 +35,32 @@ const (
// BalancesPerCatchpointFileChunk defines the number of accounts that would be stored in each chunk in the catchpoint file.
// note that the last chunk would typically be less than this number.
BalancesPerCatchpointFileChunk = 512

// DefaultMaxResourcesPerChunk defines the max number of resources that go in a singular chunk
// 3000000 resources * 300B/resource => roughly max 100MB per chunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line still has 3,000,000 not 300k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants