-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support using badgerDB as a repository for data regulation suppressions #2619
Conversation
2f3d556
to
3ff091e
Compare
37e514a
to
ef2c282
Compare
Codecov ReportBase: 45.55% // Head: 45.87% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #2619 +/- ##
==========================================
+ Coverage 45.55% 45.87% +0.31%
==========================================
Files 290 295 +5
Lines 48066 48499 +433
==========================================
+ Hits 21895 22247 +352
- Misses 24781 24852 +71
- Partials 1390 1400 +10
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
66c8cc7
to
cbbd856
Compare
SonarCloud Quality Gate failed. 0 Bugs No Coverage information |
d5bd120
to
aa5c179
Compare
path, | ||
log, | ||
WithSeederSource(seederSource), | ||
WithMaxSeedWait(config.GetDuration("BackendConfig.Regulations.maxSeedWait", 5, time.Second))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain: if restore doesn't complete in 5 seconds, the gateway container will startup. Only after restore completes will the gateway be able to successfully perform suppressions, in the meantime, no suppressions will be performed.
bce1865
to
d42344d
Compare
bcfe3f7
to
283f588
Compare
283f588
to
b5f683a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, we are not really making use of Restore
& Backup
function in this PR & we are going to handler them in future PR, some clarity on that would be really help. To think if something in this PR is required for future or is unnecessary.
Backup(w io.Writer) error | ||
|
||
// Restore restores the repository from the given reader | ||
Restore(r io.Reader) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why do we need to expose Backup
& Restore
func. Given that Restore happens implicitly when we call NewRepository
.
And, for Backup
I am assuming that it will be implemented in next PR. So, I am not sure of the implementation that you are going to have. But, still with an assumption that it will be called implicitly from Stop()
function. Since, we might want to take a backup when stoping or maybe periodically using some go routine. In either case, I feel it will be implicit inside the badgerdb
package itself. So, here also won't it make sense to not expose it?
In short, if we associate NewRepository()
with Restore()
& Stop()
with Backup()
, then won't need to expose backup & restore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backup
& Restore
are both used now by tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I saw that, but isn't it a bad practice to export something just to test? Instead, won't it be better to have tests in the same package for these internal function. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backup
& Restore
are not something you wouldn't expect from a repository & we are going to use them in the future in non-testing code too.
log: log, | ||
path: path.Join(basePath, "badgerdbv3"), | ||
maxGoroutines: 1, | ||
maxSeedWait: 10 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, we have WithMaxSeedWait
as Opt
. Should we check for it, before setting a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the default value, opts are applied later in line 70, i.e. can override the default values.
b5f683a
to
e3d78dd
Compare
b999439
to
94b7b37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
94b7b37
to
49cb34b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very Neat!
Description
It is already possible to have millions of data regulation suppressions for a given workspace/namespace, thus keeping them in-memory is an non-viable solution. For that purpose, we are introducing support for using badgerDB as a backing repository to store and query suppressions at the gateway component.
Other improvements
suppression
package, so that some of its modules can be reused for other purposes:suppression.Repository
: stores and queries suppressions, we support both an in-memory and a badgerdb-backed repositorysuppression.Syncer
: polls the data regulation service and syncs suppressions to thesuppression.Repository
suppression.handler
: implementstypes.UserSuppression
enterprise feature and uses asuppression.Repository
suppression.Factory
: creates and glues all the modules above according to the environment's configurationBenchmarks
As expected, the in-memory repository can achieve 5x faster write times and 5.5x faster read times.
Nevertheless, badgerdb's performance is satisfactory, as it exhibits
500K wps
and500Mil rps
.go test -run=^$ -bench ^BenchmarkAddAndSuppress$ github.com/rudderlabs/rudder-server/enterprise/suppress-user -v -count=1 goos: darwin goarch: arm64 pkg: github.com/rudderlabs/rudder-server/enterprise/suppress-user BenchmarkAddAndSuppress BenchmarkAddAndSuppress/badger BenchmarkAddAndSuppress/badger-10 1 42729787333 ns/op 518757 suppressions/s(add) 445789123 suppressions/s(read) BenchmarkAddAndSuppress/memory BenchmarkAddAndSuppress/memory-10 1 12530184667 ns/op 2858211 suppressions/s(add) 2564874328 suppressions/s(read) PASS ok github.com/rudderlabs/rudder-server/enterprise/suppress-user 56.053s
Further Work
To speed up suppressions' sync time in gateway containers, we are investigating a solution which involves a periodic backup of the badgerdb repo, so that any new gateway container can startup in a hot state after performing a short restore of its badgerdb data.
Notion Ticket
Link
Security