-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update badger to latest version #11546
Comments
When we're ready for this, we should revive #11344 and replace/update the GC test. |
This needs to be tested thoroughly, as my experiments show that the memory profiles (v2 vs v4) look quite different: v2: https://github.com/carsonip/tbs-badger-playground/tree/main/prefetch |
Updated the playground: https://github.com/carsonip/tbs-badger-playground TLDRTLDR: The point that worries me is that the fix we applied in #13464 will be less effective as you may notice a perf regression in v2:
v4:
full resultsbadger v2
badger v4
|
Opened #14427 to update badger, still need to figure out a broken test and any change in behaviour |
broken test has been fixed, PR is ready |
Benchmark diff main vs badger v4 (tail sampling enabled and sample_rate:1):
|
To reproduce: Requirements:
apm-server:
host: "127.0.0.1:8200"
expvar:
enabled: true
sampling.tail:
enabled: true
policies:
- sample_rate: 1
output.elasticsearch:
hosts: ["localhost:9200"]
|
Looking at the benchmark results. Is it fair to say that memory usage increased by ~24-30% while throughput remained generally the same? Just to double check,
|
Given the performance degression that upgrading to v4 would bring, @axw and I discussed following alternative: Short term (
Are we overlooking some other version dependency problems here or should this solve the current issues without requiring us to update to v4 for the TBS logic ?
|
Yes, that's correct 👍
I think we can workaround the issue by pinning the ristretto version (#14469). ristretto also retracted |
For reference, I get more extreme results running it locally. The left is using
|
Quickly running the same experiment and comparing the memory profiles difference between v4 and v2, it looks to be partially attributed to ristretto cache (as @kruskall mentioned above) it showed about +250MB of memory in use plus extra allocations. Looking at the badger documentation for v2 and v4. It seems to track with the default BlockCacheSize size change from 0 to 256MB. I ran two extra benchmarks with reducing the cache size down to 1MB and 0MB (for 0MB also compression has to be disabled link). Both runs indeed reduced memory usage and allocation but inflicted throughput penalties. It seems that we need to adjust this cache size based on the available memory dynamically, similarly to how we configure go-docappender. |
Further testing related to beats version:
|
For |
@kruskall can you please update this issue with the latest evolvements and eventually close it. |
A small summary of the current situation:
right now APM Server is using badger v2 and with a replace directive to avoid the compatibility issue with jaeger (can be dropped once the new jaeger version is out). I think the current situation is unlikely to create issues in 8.x (especially that ristretto has proper versioning). Not sure if we want to close this or schedule it for 9.x. |
Agreed, that would ideal, otherwise we may face similar issues in the future. |
Just ran some macro benchmarks by running apmbench Contrary to @1pkg 's observation, when setting compression to None and block cache to 0MB, badger v4 is able to maintain badger v2's throughput, which is 20% faster than badgerv4 with 256MB block cache and compression. bench badgerv4 at commit 940de53 (disable block cache) bench badgerv4 at commit d175b05 (v4 with new default block cache) bench badger v2 at commit d1b93c9 Haven't measured memory usage yet. |
@carsonip interesting results, I didn't log well the setup that I used last time and I don't think I spent much time doing the full benchmark last time (I think I ran BenchmarkAgentAll for 1m). So the results you got should be more comprehensive, I remember eyeballing memory profile the most last time and trying to correlate cache influence over allocations and memory in use. |
Another set of result:
badgerv2: https://github.com/elastic/apm-server/actions/runs/12552635217
Similar throughput, but badgerv4 has 20% higher RSS, way bigger LSM tree (300MB vs 50MB) but smaller vlog (100MB vs 300MB). The badgerv4 vlog size number is slightly suspicious though, as it didn't change throughout the bench. edit: this is likely because ValueThreshold default changed (from v2's 1KB to v4's 1MB). Now running another benchmark with old ValueThreshold default https://github.com/elastic/apm-server/actions/runs/12553135045
main baseline without TBS:
|
marked as Critical but in Pending refinement, therefore moving this from it105 to it106 |
@raultorrecilla agree that this is important for |
As discussed, alternatively, we may adopt pebble instead of badger if #15246 goes well. The target is still 9.0. |
We are using an older badger version that is no longer supported. We should switch to the newer/latest version (currently v4)
The text was updated successfully, but these errors were encountered: