Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Hotfix for API queries for activations #5334

Closed
wants to merge 30 commits into from

Conversation

lrettig
Copy link
Member

@lrettig lrettig commented Dec 7, 2023

Modify MeshService.LayersQuery to return ONLY the single, canonical block for each queried layer. Fixes performance issues for now by not querying any activations, proposals, etc.

Modify MeshService.AccountMeshDataQuery not to query activations to prevent memory/CPU bomb

Motivation

Closes #5315
Closes #5006

Changes

  • Add method to mesh to return single, valid, applied block for a given layer
  • Modify MeshService.LayersQuery to use this new method
  • Additionally, remove query for activations (leave this empty for now)
  • Modify MeshService.AccountMeshDataQuery not to query activations

Test Plan

Tests updated

TODO

  • Explain motivation or link existing issue(s)
  • Test changes and document test plan
  • Update documentation as needed
  • Update changelog as needed

Add a method to the mesh to return the valid, applied block for a given
layer and return only this one
Copy link

codecov bot commented Dec 9, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (2e7d09e) 77.2% compared to head (6fe005f) 77.7%.
Report is 17 commits behind head on develop.

Files Patch % Lines
mesh/mesh.go 63.6% 4 Missing ⚠️
api/grpcserver/mesh_service.go 93.7% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           develop   #5334     +/-   ##
=========================================
+ Coverage     77.2%   77.7%   +0.4%     
=========================================
  Files          262     265      +3     
  Lines        30681   30862    +181     
=========================================
+ Hits         23716   23987    +271     
+ Misses        5462    5362    -100     
- Partials      1503    1513     +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lrettig lrettig changed the title Hotfix for LayersQuery Hotfix for API queries for activations Dec 12, 2023
@lrettig lrettig marked this pull request as ready for review December 13, 2023 18:11
lrettig and others added 15 commits December 14, 2023 14:52
## Motivation
Several changes listed as part of the 1.3.0 release have already been released in prior versions.

## Changes
- Updated CHANGELOG to list changes at the correct versions
- renamed 1.3.0 back to UNRELEASED, since more changes might be added before it is officially released

## Test Plan
- n/a

## TODO
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed
- [x] Update [changelog](../CHANGELOG.md) as needed
We're seeing a lot of `I/O deadline` and general timeouts currently in the network. Increasing the timeouts to 25s from standard 10s could help with some of them.
## Motivation
This is in anticipation for upcoming epochs, so that nodes can handle 300k+ active identities.

## Changes
- Increase sizes for various wire types
- Increase ATXHeader Cache to be able to hold the ATXs for Epoch 11 and Epoch 12 at the same time

## Test Plan
n/a

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed
- [x] Update [changelog](../CHANGELOG.md) as needed
## Motivation
Reverts part of #5339

## Changes
`PutUvarint` from standard library and `github.com/multiformats/go-varint` do not behave exactly the same. With Go 1.20 to the stdlib implementation was fixed to detect short reads, but in contrast to `github.com/multiformats/go-varint` it will still not return an error when decoding non-minimal encodings of integers.

## Test Plan
n/a

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed
- [x] Update [changelog](../CHANGELOG.md) as needed
Bumps google.golang.org/grpc from 1.59.0 to 1.60.0.
Don't return activations from AccountMeshDataQuery

Fixes #5006
This is also a hotfix. We should really return the full contents of the
block with a state/status for each tx, but the gRPC model doesn't
currently include this. Will handle this correctly in API v2.
@lrettig lrettig requested a review from ivan4th as a code owner January 4, 2024 23:50

// Save activations too
var activations []types.ATXID
) (pbLayer *pb.Layer, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: The named returns are not actually used (every return statement explicitly states what values are returned). If the names here are removed you can use err instead of err2 everywhere in this function 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, oof, i like named returns so I added this, but our strict linting rules no longer allow this =\

Comment on lines +261 to +263
type MeshAPIMockInstrumented struct {
MeshAPIMock
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I know this extends MeshAPIMock but would it make sense to instead use a gomock Mock here instead? These handwritten mocks are really hard to maintain, because changing anything usually breaks several tests at once because they are shared between them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. If we were doing this from scratch I'd agree, but the methods are all already mocked here:

type MeshAPIMock struct{}
// latest layer received.
func (m *MeshAPIMock) LatestLayer() types.LayerID {
return layerLatest
}
// latest layer approved/confirmed/applied to state
// The real logic here is a bit more complicated, as it depends whether the node
// is syncing or not. If it's not syncing, layers are applied to state as they're
// verified by Hare. If it's syncing, Hare is not run, and they are applied to
// state as they're confirmed by Tortoise and it advances pbase. This is all in
// flux right now so keep this simple for the purposes of testing.
func (m *MeshAPIMock) LatestLayerInState() types.LayerID {
return layerVerified
}
func (m *MeshAPIMock) ProcessedLayer() types.LayerID {
return layerVerified
}
func (m *MeshAPIMock) GetRewardsByCoinbase(types.Address) (rewards []*types.Reward, err error) {
return []*types.Reward{
{
Layer: layerFirst,
TotalReward: rewardAmount,
LayerReward: rewardAmount,
Coinbase: addr1,
SmesherID: rewardSmesherID,
},
}, nil
}
func (m *MeshAPIMock) GetRewardsBySmesherId(types.NodeID) (rewards []*types.Reward, err error) {
return []*types.Reward{
{
Layer: layerFirst,
TotalReward: rewardAmount,
LayerReward: rewardAmount,
Coinbase: addr1,
SmesherID: rewardSmesherID,
},
}, nil
}
func (m *MeshAPIMock) GetLayer(tid types.LayerID) (*types.Layer, error) {
if tid.After(layerCurrent) {
return nil, errors.New("requested layer later than current layer")
} else if tid.After(m.LatestLayer()) {
return nil, errors.New("haven't received that layer yet")
}
ballots := []*types.Ballot{ballot1}
blocks := []*types.Block{block1, block2, block3}
return types.NewExistingLayer(tid, ballots, blocks), nil
}
func (m *MeshAPIMock) GetLayerVerified(tid types.LayerID) (*types.Block, error) {
return block1, nil
}
func (m *MeshAPIMock) GetATXs(
context.Context,
[]types.ATXID,
) (map[types.ATXID]*types.VerifiedActivationTx, []types.ATXID) {
atxs := map[types.ATXID]*types.VerifiedActivationTx{
globalAtx.ID(): globalAtx,
globalAtx2.ID(): globalAtx2,
}
return atxs, nil
}
func (m *MeshAPIMock) MeshHash(types.LayerID) (types.Hash32, error) {
return types.RandomHash(), nil
}
. I don't think it makes sense to fully rewrite this at this stage, and I don't see much benefit from basing the existing MeshAPIMock type on the auto-generated MockmeshAPI type. let me know if you disagree.

mesh/mesh.go Outdated Show resolved Hide resolved
@lrettig
Copy link
Member Author

lrettig commented Jan 5, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Jan 5, 2024
Modify `MeshService.LayersQuery` to return ONLY the single, canonical block for each queried layer. Fixes performance issues for now by not querying any activations, proposals, etc.

Modify `MeshService.AccountMeshDataQuery` not to query activations to prevent memory/CPU bomb

## Motivation
Closes #5315
Closes #5006

## Changes
- Add method to mesh to return single, valid, applied block for a given layer
- Modify `MeshService.LayersQuery` to use this new method
- Additionally, remove query for activations (leave this empty for now)
- Modify `MeshService.AccountMeshDataQuery` not to query activations

## Test Plan
Tests updated

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed
- [x] Update [changelog](../CHANGELOG.md) as needed


Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias Fasching <[email protected]>
Co-authored-by: Jędrzej Nowak <[email protected]>
@spacemesh-bors
Copy link

spacemesh-bors bot commented Jan 5, 2024

Build failed:

  • systest-status

To make troubleshooting systests easier
@kacpersaw
Copy link
Contributor

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Jan 8, 2024
Modify `MeshService.LayersQuery` to return ONLY the single, canonical block for each queried layer. Fixes performance issues for now by not querying any activations, proposals, etc.

Modify `MeshService.AccountMeshDataQuery` not to query activations to prevent memory/CPU bomb

## Motivation
Closes #5315
Closes #5006

## Changes
- Add method to mesh to return single, valid, applied block for a given layer
- Modify `MeshService.LayersQuery` to use this new method
- Additionally, remove query for activations (leave this empty for now)
- Modify `MeshService.AccountMeshDataQuery` not to query activations

## Test Plan
Tests updated

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed
- [x] Update [changelog](../CHANGELOG.md) as needed


Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias Fasching <[email protected]>
Co-authored-by: Jędrzej Nowak <[email protected]>
@spacemesh-bors
Copy link

spacemesh-bors bot commented Jan 8, 2024

Build failed:

@bobbychen
Copy link

Why this hotfix haven't merge into this release~~~,I think it is a problem, hang on long time when query a layer detail.

@kacpersaw
Copy link
Contributor

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Jan 16, 2024
Modify `MeshService.LayersQuery` to return ONLY the single, canonical block for each queried layer. Fixes performance issues for now by not querying any activations, proposals, etc.

Modify `MeshService.AccountMeshDataQuery` not to query activations to prevent memory/CPU bomb

## Motivation
Closes #5315
Closes #5006

## Changes
- Add method to mesh to return single, valid, applied block for a given layer
- Modify `MeshService.LayersQuery` to use this new method
- Additionally, remove query for activations (leave this empty for now)
- Modify `MeshService.AccountMeshDataQuery` not to query activations

## Test Plan
Tests updated

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed
- [x] Update [changelog](../CHANGELOG.md) as needed


Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias Fasching <[email protected]>
Co-authored-by: Jędrzej Nowak <[email protected]>
Co-authored-by: Kacper Sawicki <[email protected]>
@spacemesh-bors
Copy link

Build failed:

@kacpersaw
Copy link
Contributor

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Jan 16, 2024
Modify `MeshService.LayersQuery` to return ONLY the single, canonical block for each queried layer. Fixes performance issues for now by not querying any activations, proposals, etc.

Modify `MeshService.AccountMeshDataQuery` not to query activations to prevent memory/CPU bomb

## Motivation
Closes #5315
Closes #5006

## Changes
- Add method to mesh to return single, valid, applied block for a given layer
- Modify `MeshService.LayersQuery` to use this new method
- Additionally, remove query for activations (leave this empty for now)
- Modify `MeshService.AccountMeshDataQuery` not to query activations

## Test Plan
Tests updated

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed
- [x] Update [changelog](../CHANGELOG.md) as needed


Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias Fasching <[email protected]>
Co-authored-by: Jędrzej Nowak <[email protected]>
Co-authored-by: Kacper Sawicki <[email protected]>
@spacemesh-bors
Copy link

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title Hotfix for API queries for activations [Merged by Bors] - Hotfix for API queries for activations Jan 16, 2024
@spacemesh-bors spacemesh-bors bot closed this Jan 16, 2024
@spacemesh-bors spacemesh-bors bot deleted the api-fix branch January 16, 2024 14:34
kacpersaw added a commit that referenced this pull request Jan 16, 2024
Modify `MeshService.LayersQuery` to return ONLY the single, canonical block for each queried layer. Fixes performance issues for now by not querying any activations, proposals, etc.

Modify `MeshService.AccountMeshDataQuery` not to query activations to prevent memory/CPU bomb

## Motivation
Closes #5315
Closes #5006

## Changes
- Add method to mesh to return single, valid, applied block for a given layer
- Modify `MeshService.LayersQuery` to use this new method
- Additionally, remove query for activations (leave this empty for now)
- Modify `MeshService.AccountMeshDataQuery` not to query activations

## Test Plan
Tests updated

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed
- [x] Update [changelog](../CHANGELOG.md) as needed


Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias Fasching <[email protected]>
Co-authored-by: Jędrzej Nowak <[email protected]>
Co-authored-by: Kacper Sawicki <[email protected]>
kacpersaw added a commit that referenced this pull request Jan 17, 2024
Backport #5334: Hotfix for API queries for activations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MeshService.LayersQuery broken AccountMeshDataQuery CPU and memory bomb
5 participants