Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Reduce RelayMiner memory consumption under load #739

Merged
merged 16 commits into from
Aug 22, 2024

Conversation

okdas
Copy link
Member

@okdas okdas commented Aug 16, 2024

Summary

Issue

Type of change

Select one or more:

  • New feature, functionality or library
  • Bug fix
  • Code health or cleanup
  • Documentation
  • Other (specify)

Testing

Documentation changes (only if making doc changes)

  • make docusaurus_start; only needed if you make doc changes

Local Testing (only if making code changes)

  • Unit Tests: make go_develop_and_test
  • LocalNet E2E Tests: make test_e2e
  • See quickstart guide for instructions

PR Testing (only if making code changes)

  • DevNet E2E Tests: Add the devnet-test-e2e label to the PR.
    • THIS IS VERY EXPENSIVE, so only do it after all the reviews are complete.
    • Optionally run make trigger_ci if you want to re-trigger tests without any code changes
    • If tests fail, try re-running failed tests only using the GitHub UI as shown here

Sanity Checklist

  • I have tested my changes using the available tooling
  • I have commented my code
  • I have performed a self-review of my own code; both comments & source code
  • I create and reference any new tickets, if applicable
  • I have left TODOs throughout the codebase, if applicable

@okdas okdas added relayminer Changes related to the Relayminer loadtest Work related to load testing scalability labels Aug 16, 2024
@okdas okdas added this to the Shannon Beta TestNet Launch milestone Aug 16, 2024
@okdas okdas self-assigned this Aug 16, 2024
go.mod Outdated

// fix upstream GHSA-h395-qcrw-5vmq vulnerability.
github.com/gin-gonic/gin => github.com/gin-gonic/gin v1.7.0
github.com/pokt-network/smt => ../smt
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contingent on pokt-network/smt#52

@okdas okdas requested a review from Olshansk August 16, 2024 01:29
@okdas
Copy link
Member Author

okdas commented Aug 20, 2024

Overall I'm pretty happy with how relayminer performs now. There might be some goroutine leaking (not 100% sure it is even happening), but we'll find out soon enough in #742! :) This particular goroutine seems suspicious.
Screenshot 2024-08-19 at 6 28 21 PM

@okdas okdas marked this pull request as ready for review August 21, 2024 01:00
@okdas okdas added the push-image CI related - pushes images to ghcr.io label Aug 21, 2024
Copy link

The image is going to be pushed after the next commit.

You can use make trigger_ci to push an empty commit.

If you also want to run E2E tests, please add devnet-test-e2e label.

go.mod Outdated

// fix upstream GHSA-h395-qcrw-5vmq vulnerability.
github.com/gin-gonic/gin => github.com/gin-gonic/gin v1.7.0
// TODO_IN_THIS_PR: bump and remove

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[linter-name (fail-on-found)] reported by reviewdog 🐶
// TODO_IN_THIS_PR: bump and remove

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okdas Let's push to finish the SMT PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Olshansk requested another review there

Copy link
Contributor

@bryanchriswhite bryanchriswhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @okdas! 🚀

NOTE: I have not run the load test as I'm currently working on an underpowered machine. 😅

docusaurus/docs/develop/developer_guide/quickstart.md Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
Comment on lines +13 to +15
| application | 4 | 10 | 12 |
| gateway | 1 | 10 | 3 |
| supplier | 1 | 10 | 3 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that the "blocks per inc" needed to be a multiple of the blocks per session param to maintain constant rates of change across various metrics as the test scales actors. I also thought that there was a check for this somewhere in the load test helpers, around the "plans".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryanchriswhite Are you sure about this?

  1. If so -> can you help @okdas look into the code and find the reasoning.
  2. If not -> my intuition is that it keeps things simpler (to reason about) but doesn't need to be enforced.

In real life, we'll be staking / unstaking irrespective of the blocker per session so it doesn't make sense for the framework to have this limitation.

@okdas W/e the resolution ends up being, seems like the helper in the code needs a #PUC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryanchriswhite interesting. This check is the reason I adjusted the number - as I was getting an error. Given our current blocks per session is 10 (bumped from 4 a couple of weeks ago), what do you think should be the best value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okdas that's the check I was referring to. 👍 I did not realize that blocks per session was 10; this all makes sense now.

pkg/relayer/session/sessiontree.go Outdated Show resolved Hide resolved
pkg/relayer/session/sessiontree.go Outdated Show resolved Hide resolved
pkg/relayer/session/sessiontree.go Outdated Show resolved Hide resolved
if err := st.treeStore.ClearAll(); err != nil {
return err
}
// We used to `st.treeStore.ClearAll()` here, but don't need to clean up the database, causing IO load,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find!

Makefile Outdated Show resolved Hide resolved
go.mod Outdated

// fix upstream GHSA-h395-qcrw-5vmq vulnerability.
github.com/gin-gonic/gin => github.com/gin-gonic/gin v1.7.0
// TODO_IN_THIS_PR: bump and remove
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okdas Let's push to finish the SMT PR.

Comment on lines +13 to +15
| application | 4 | 10 | 12 |
| gateway | 1 | 10 | 3 |
| supplier | 1 | 10 | 3 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryanchriswhite Are you sure about this?

  1. If so -> can you help @okdas look into the code and find the reasoning.
  2. If not -> my intuition is that it keeps things simpler (to reason about) but doesn't need to be enforced.

In real life, we'll be staking / unstaking irrespective of the blocker per session so it doesn't make sense for the framework to have this limitation.

@okdas W/e the resolution ends up being, seems like the helper in the code needs a #PUC.

Co-authored-by: Daniel Olshansky <[email protected]>
Co-authored-by: Bryan White <[email protected]>
go.mod Outdated
@@ -57,8 +61,8 @@ require (
// repo is the first obvious idea, but has to be carefully considered, automated, and is not
// a hard blocker.
github.com/pokt-network/shannon-sdk v0.0.0-20240814144717-dfa95b525d46
// TODO_IN_THIS_PR: bump after https://github.com/pokt-network/smt/pull/52 is in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[linter-name (fail-on-found)] reported by reviewdog 🐶
// TODO_IN_THIS_PR: bump after pokt-network/smt#52 is in

go.mod Outdated
@@ -79,7 +83,11 @@ require (
gopkg.in/yaml.v2 v2.4.0
)

require golang.org/x/text v0.16.0
require (
// TODO_IN_THIS_PR: bump to the main branch commit after https://github.com/pokt-network/smt/pull/52 is in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[linter-name (fail-on-found)] reported by reviewdog 🐶
// TODO_IN_THIS_PR: bump to the main branch commit after pokt-network/smt#52 is in

Copy link

The CI will now also run the e2e tests on devnet, which increases the time it takes to complete all CI checks.

You may need to run make trigger_ci to submit an empty commit that'll trigger the tests.

GCP workloads (requires changing the namespace to 739)
Grafana network dashboard for devnet-issue-739

@okdas okdas requested a review from bryanchriswhite August 22, 2024 18:14
Copy link
Contributor

@bryanchriswhite bryanchriswhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀

One final small suggestion but otherwise this LGTM! 🚢

pkg/relayer/session/sessiontree.go Outdated Show resolved Hide resolved
@okdas okdas merged commit 5455068 into main Aug 22, 2024
10 checks passed
bryanchriswhite added a commit that referenced this pull request Aug 27, 2024
…ke-transfer

* pokt/main:
  [Application] Implement unbonding period (#735)
  [Docs] Move over docs from poktroll-docker-compose-example (#757)
  [Performance] Reduce RelayMiner memory consumption under load (#739)
okdas added a commit that referenced this pull request Nov 14, 2024
## Summary

## Issue

- #551 
- #621 

## Type of change

Select one or more:

- [ ] New feature, functionality or library
- [ ] Bug fix
- [x] Code health or cleanup
- [ ] Documentation
- [ ] Other (specify)

## Testing

**Documentation changes** (only if making doc changes)
- [ ] `make docusaurus_start`; only needed if you make doc changes

**Local Testing** (only if making code changes)
- [ ] **Unit Tests**: `make go_develop_and_test`
- [ ] **LocalNet E2E Tests**: `make test_e2e`
- See [quickstart
guide](https://dev.poktroll.com/developer_guide/quickstart) for
instructions

**PR Testing** (only if making code changes)
- [ ] **DevNet E2E Tests**: Add the `devnet-test-e2e` label to the PR.
- **THIS IS VERY EXPENSIVE**, so only do it after all the reviews are
complete.
- Optionally run `make trigger_ci` if you want to re-trigger tests
without any code changes
- If tests fail, try re-running failed tests only using the GitHub UI as
shown
[here](https://github.com/pokt-network/poktroll/assets/1892194/607984e9-0615-4569-9452-4c730190c1d2)


## Sanity Checklist

- [ ] I have tested my changes using the available tooling
- [ ] I have commented my code
- [ ] I have performed a self-review of my own code; both comments &
source code
- [ ] I create and reference any new tickets, if applicable
- [ ] I have left TODOs throughout the codebase, if applicable

---------

Co-authored-by: Daniel Olshansky <[email protected]>
Co-authored-by: Bryan White <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devnet devnet-test-e2e loadtest Work related to load testing push-image CI related - pushes images to ghcr.io relayminer Changes related to the Relayminer scalability
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants