Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support using badgerDB as a repository for data regulation suppressions #2619

Merged
merged 3 commits into from
Nov 21, 2022

Conversation

atzoum
Copy link
Contributor

@atzoum atzoum commented Oct 26, 2022

Description

It is already possible to have millions of data regulation suppressions for a given workspace/namespace, thus keeping them in-memory is an non-viable solution. For that purpose, we are introducing support for using badgerDB as a backing repository to store and query suppressions at the gateway component.

Other improvements

  • Refactored the suppression package, so that some of its modules can be reused for other purposes:
    • suppression.Repository: stores and queries suppressions, we support both an in-memory and a badgerdb-backed repository
    • suppression.Syncer: polls the data regulation service and syncs suppressions to the suppression.Repository
    • suppression.handler: implements types.UserSuppression enterprise feature and uses a suppression.Repository
    • suppression.Factory: creates and glues all the modules above according to the environment's configuration
  • Increased test coverage

Benchmarks

As expected, the in-memory repository can achieve 5x faster write times and 5.5x faster read times.
Nevertheless, badgerdb's performance is satisfactory, as it exhibits 500K wps and 500Mil rps.

go test -run=^$ -bench ^BenchmarkAddAndSuppress$ github.com/rudderlabs/rudder-server/enterprise/suppress-user -v -count=1 

goos: darwin
goarch: arm64
pkg: github.com/rudderlabs/rudder-server/enterprise/suppress-user
BenchmarkAddAndSuppress
BenchmarkAddAndSuppress/badger
BenchmarkAddAndSuppress/badger-10                      1        42729787333 ns/op           518757 suppressions/s(add)   445789123 suppressions/s(read)
BenchmarkAddAndSuppress/memory
BenchmarkAddAndSuppress/memory-10                      1        12530184667 ns/op          2858211 suppressions/s(add)  2564874328 suppressions/s(read)
PASS
ok      github.com/rudderlabs/rudder-server/enterprise/suppress-user    56.053s

Further Work

To speed up suppressions' sync time in gateway containers, we are investigating a solution which involves a periodic backup of the badgerdb repo, so that any new gateway container can startup in a hot state after performing a short restore of its badgerdb data.

Notion Ticket

Link

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@atzoum atzoum force-pushed the feat.badgerSuppressions branch from 2f3d556 to 3ff091e Compare October 26, 2022 09:19
@atzoum atzoum force-pushed the feat.badgerSuppressions branch 3 times, most recently from 37e514a to ef2c282 Compare October 26, 2022 15:25
@codecov
Copy link

codecov bot commented Oct 26, 2022

Codecov Report

Base: 45.55% // Head: 45.87% // Increases project coverage by +0.31% 🎉

Coverage data is based on head (49cb34b) compared to base (4e0008c).
Patch coverage: 73.90% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2619      +/-   ##
==========================================
+ Coverage   45.55%   45.87%   +0.31%     
==========================================
  Files         290      295       +5     
  Lines       48066    48499     +433     
==========================================
+ Hits        21895    22247     +352     
- Misses      24781    24852      +71     
- Partials     1390     1400      +10     
Impacted Files Coverage Δ
app/apphandlers/gatewayAppHandler.go 0.00% <0.00%> (ø)
enterprise/config-env/configEnv.go 0.00% <0.00%> (ø)
enterprise/config-env/setup.go 28.57% <0.00%> (-7.80%) ⬇️
enterprise/replay/dumpsloader.go 0.00% <0.00%> (-1.17%) ⬇️
enterprise/replay/replay.go 0.00% <0.00%> (ø)
enterprise/replay/setup.go 0.00% <0.00%> (ø)
enterprise/replay/sourceWorker.go 0.00% <0.00%> (ø)
enterprise/suppress-user/factory.go 9.43% <9.43%> (ø)
enterprise/reporting/reporting.go 15.65% <27.27%> (-0.24%) ⬇️
...rprise/suppress-user/internal/badgerdb/badgerdb.go 83.00% <83.00%> (ø)
... and 16 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@atzoum atzoum force-pushed the feat.badgerSuppressions branch 11 times, most recently from 66c8cc7 to cbbd856 Compare November 1, 2022 11:46
@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 1, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 11 Code Smells

No Coverage information No Coverage information
9.3% 9.3% Duplication

@atzoum atzoum force-pushed the feat.badgerSuppressions branch 3 times, most recently from d5bd120 to aa5c179 Compare November 1, 2022 17:07
@atzoum atzoum changed the title [WIP] feat: support using badgerDB as a repository for data regulation suppressions feat: support using badgerDB as a repository for data regulation suppressions Nov 2, 2022
path,
log,
WithSeederSource(seederSource),
WithMaxSeedWait(config.GetDuration("BackendConfig.Regulations.maxSeedWait", 5, time.Second)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain: if restore doesn't complete in 5 seconds, the gateway container will startup. Only after restore completes will the gateway be able to successfully perform suppressions, in the meantime, no suppressions will be performed.

@atzoum atzoum force-pushed the feat.badgerSuppressions branch 3 times, most recently from bce1865 to d42344d Compare November 2, 2022 13:23
@atzoum atzoum marked this pull request as ready for review November 2, 2022 14:24
@atzoum atzoum force-pushed the feat.badgerSuppressions branch 2 times, most recently from bcfe3f7 to 283f588 Compare November 11, 2022 11:44
@atzoum atzoum requested review from chandumlg and cisse21 November 11, 2022 16:19
@atzoum atzoum force-pushed the feat.badgerSuppressions branch from 283f588 to b5f683a Compare November 15, 2022 06:51
Copy link
Contributor

@saurav-malani saurav-malani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, we are not really making use of Restore & Backup function in this PR & we are going to handler them in future PR, some clarity on that would be really help. To think if something in this PR is required for future or is unnecessary.

enterprise/suppress-user/repository.go Outdated Show resolved Hide resolved
enterprise/suppress-user/repository.go Show resolved Hide resolved
Comment on lines +27 to +30
Backup(w io.Writer) error

// Restore restores the repository from the given reader
Restore(r io.Reader) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why do we need to expose Backup & Restore func. Given that Restore happens implicitly when we call NewRepository.

And, for Backup I am assuming that it will be implemented in next PR. So, I am not sure of the implementation that you are going to have. But, still with an assumption that it will be called implicitly from Stop() function. Since, we might want to take a backup when stoping or maybe periodically using some go routine. In either case, I feel it will be implicit inside the badgerdb package itself. So, here also won't it make sense to not expose it?

In short, if we associate NewRepository() with Restore() & Stop() with Backup(), then won't need to expose backup & restore.

Copy link
Contributor Author

@atzoum atzoum Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backup & Restore are both used now by tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I saw that, but isn't it a bad practice to export something just to test? Instead, won't it be better to have tests in the same package for these internal function. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backup & Restore are not something you wouldn't expect from a repository & we are going to use them in the future in non-testing code too.

log: log,
path: path.Join(basePath, "badgerdbv3"),
maxGoroutines: 1,
maxSeedWait: 10 * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, we have WithMaxSeedWait as Opt. Should we check for it, before setting a default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the default value, opts are applied later in line 70, i.e. can override the default values.

enterprise/suppress-user/handler.go Outdated Show resolved Hide resolved
enterprise/suppress-user/internal/badgerdb/badgerdb.go Outdated Show resolved Hide resolved
@atzoum atzoum force-pushed the feat.badgerSuppressions branch from b5f683a to e3d78dd Compare November 15, 2022 13:15
@atzoum atzoum requested a review from saurav-malani November 15, 2022 13:20
@atzoum atzoum force-pushed the feat.badgerSuppressions branch 6 times, most recently from b999439 to 94b7b37 Compare November 20, 2022 15:10
Copy link
Contributor

@saurav-malani saurav-malani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@atzoum atzoum requested a review from chandumlg November 21, 2022 08:47
@atzoum atzoum force-pushed the feat.badgerSuppressions branch from 94b7b37 to 49cb34b Compare November 21, 2022 08:55
Copy link
Member

@chandumlg chandumlg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very Neat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants