Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - atx syncer that persists results #5599

Closed
wants to merge 11 commits into from

Conversation

dshulyak
Copy link
Contributor

@dshulyak dshulyak commented Feb 24, 2024

part of: #5553

when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end.

concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr).

to make error checking possible i extended errors emitted by p2p/server and fetcher.

@dshulyak dshulyak force-pushed the persisted-atxsync branch from 5d36ed9 to addbaa6 Compare March 5, 2024 07:25
@dshulyak dshulyak marked this pull request as ready for review March 5, 2024 08:20
Copy link

codecov bot commented Mar 5, 2024

Codecov Report

Attention: Patch coverage is 85.11327% with 46 lines in your changes are missing coverage. Please review.

Project coverage is 79.8%. Comparing base (3e59df3) to head (11d1a45).

Files Patch % Lines
syncer/atxsync/syncer.go 89.5% 12 Missing and 8 partials ⚠️
sql/atxsync/atxsync.go 80.5% 6 Missing and 7 partials ⚠️
fetch/mesh_data.go 68.0% 8 Missing ⚠️
checkpoint/recovery.go 60.0% 1 Missing and 1 partial ⚠️
log/zap.go 60.0% 1 Missing and 1 partial ⚠️
syncer/syncer.go 83.3% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop   #5599    +/-   ##
========================================
  Coverage     79.7%   79.8%            
========================================
  Files          274     276     +2     
  Lines        27883   28180   +297     
========================================
+ Hits         22228   22490   +262     
- Misses        4108    4128    +20     
- Partials      1547    1562    +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

syncer/atxsync/syncer.go Show resolved Hide resolved
syncer/atxsync/syncer.go Outdated Show resolved Hide resolved
}
}

for atx, requests := range state {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something or requests will always be 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is updated in code below

if errors.As(err, &batchError) {
	for hash, err := range batchError.Errors {
		if errors.Is(err, server.ErrPeerResponseFailed) {
			state[types.ATXID(hash)]++
		} else if errors.Is(err, pubsub.ErrValidationReject) {
			state[types.ATXID(hash)] = s.cfg.RequestsLimit
		}
	}
}

syncer/atxsync/syncer.go Outdated Show resolved Hide resolved
syncer/atxsync/syncer.go Outdated Show resolved Hide resolved
@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

@ivan4th @poszu thanks for review, i addressed your comments

@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Mar 7, 2024
@spacemesh-bors
Copy link

spacemesh-bors bot commented Mar 7, 2024

try

Build failed:

@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

need to understand what changes wrt retries

image

@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Mar 7, 2024
@spacemesh-bors
Copy link

spacemesh-bors bot commented Mar 7, 2024

try

Build succeeded:

@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Mar 7, 2024
@spacemesh-bors
Copy link

spacemesh-bors bot commented Mar 7, 2024

try

Build succeeded:

@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Mar 7, 2024
@spacemesh-bors
Copy link

spacemesh-bors bot commented Mar 7, 2024

try

Build succeeded:

@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Mar 7, 2024
part of: #5553

when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end.

concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr).

to make error checking possible i extended errors emitted by p2p/server and fetcher.
@spacemesh-bors
Copy link

spacemesh-bors bot commented Mar 7, 2024

Build failed:

  • ci-status

@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

bors cancel

#5652

@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Mar 7, 2024
part of: #5553

when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end.

concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr).

to make error checking possible i extended errors emitted by p2p/server and fetcher.
@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

bors cancel

@spacemesh-bors
Copy link

spacemesh-bors bot commented Mar 7, 2024

Canceled.

@dshulyak
Copy link
Contributor Author

dshulyak commented Mar 7, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Mar 7, 2024
part of: #5553

when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end.

concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr).

to make error checking possible i extended errors emitted by p2p/server and fetcher.
@spacemesh-bors
Copy link

spacemesh-bors bot commented Mar 7, 2024

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title atx syncer that persists results [Merged by Bors] - atx syncer that persists results Mar 7, 2024
@spacemesh-bors spacemesh-bors bot closed this Mar 7, 2024
spacemesh-bors bot pushed a commit that referenced this pull request Mar 8, 2024
…edness (#5600)

closes: #5553

this change is on top of #5599, and will be rebased after that one is merged

the change always spawns a background worker to download atxs for the ongoing epoch. such background worker for epoch X is spawned when half of the layers of epoch X have passed. this is done so that we always do some useful work to do. such worker will be asking N peers (with a default of 2) every 30 minutes, until we are ready to spawn worker for epoch X + 1.

the side effect of this change, is that we are not going to block syncedness if node has all previous atx, but not from the last epoch. so node will be able to rejoin consensus faster after restart without any risk.
@xearl4
Copy link
Contributor

xearl4 commented Mar 12, 2024

@dshulyak could you explain why atx syncer state is persisted into local.sql and not into state.sql? do i understand it correctly, that, if we want to copy a synced blockchain state.sql from one full node to another, we'd now also have to copy the atx sync state tables from local.sql?

@dshulyak
Copy link
Contributor Author

no that information is meant only as an optimization to avoid asking peers for a data every time node boots. i would suggest not to copy that, just wait a bit longer when node boots up the first time. in future versions that data will be removed completely, and peers will checking atx consistency always in background.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants