-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seek for help on benchmarking of Dragonfly and KeyDB in Kubernetes #113
Comments
Hi Jianbin, very impressive work so far! I will tell you what I know and what I do not know. Facts that I know:
Now, it's hard for me to say what causes this based on the data you put here because a) I do not have hands-on experience with K8S as a deployment system b) there is some missing data Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea. When you put multiple pods like Redis/KeyDB/DF on the same node, do not expect that they all get dedicated networking capacity: you are bounded by limitations of the underlying hardware but now it's divided between two hungry pods. You did not write were do you benchmark them from. Is it a different node? same node? same zone? What I would do is the following:
by running on a raw GCP instance, you will learn what are the "normal" performance ranges of each server and what are the normal latencies and what are the optimal configurations for memtier. Once you have this, you may start working your way to K8S. But I would not jump straight there. I would first run your favorite configuration above but with running servers from a container instead running a native binary. If you use pipelining, be ready to reduce To summarize, signs of a good benchmark:
Regarding (2) for some instance types you won't be able to reach full CPU utilization (i.e. 16 cores working at 100%) if they are network bound. But you should probably still see well above 1M qps on DF on n2 with 16 cores. |
And do not forget to drink beer! |
Just noticed your other memtier parameters. You can be a bit more frisky with the keyspace lengths, you use big instances, so it's ok: I do not know if |
Thank you so much for your reply. I think your suggestion of testing with GCP instance is great. I will follow the steps and try to get ideal P99 latency. I will update the results here once I finish the tests.
No. They are running on two different nodes in the same cluster, so same region ( And two memtier jobs are running sequentially to avoid saturating the network.
I first runs |
I would run memtier separately from the server node as well. Not that it's impossible to get 1M running both on the same machine but it greatly affects benchmark numbers when reaching high throughput ranges. |
A good point. I originally thought that running them on the same node will save some networks. I will run them in separate instances when benchmarking with GCP instances. |
Any chance you have a reproducible bash script that would cover the tests you are trying between These are wonderful bits of feedback; it would be interesting to make a canonical test script and deployment While I don't have any better feedback than what @romange provided, I could take a swing at dockerizing a test script to be more consistently reproducible and include other platforms as well in the future. |
@ryanrussell in terms of priority for the project, writing canonical benchmarking scripts is less important right now. |
Hey, @romange @ryanrussell , I followed your suggestions and re-ran all the tests in GCP VM instances. Dragonfly overwhelms KeyDB in P99 latency (
My next step is to benchmark with Docker and Kubernetes. And will update the results in this issue. Update (2022-05-09)TL;DRDragonfly
Memory Usage: 2.84GiB KeyDB
Memory Usage: 3.70G Set upI provisioned three VM instances:
Dragonfly:
KeyDB:
DragonflyPure Set
Pure Get
Mixed Set-Get
Memory
DashboardKeyDBPure Set
Pure Get
Mixed Set-Get
Memory Usage
Dashboard |
@drinkbeer , do not expect to reach anywhere close to 3.8M qps on GCP. AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput. I will benchmark GCP and get back to you. You provided a great reference point with your results! It will take me a week or so. Hope it's ok. |
If you take a look at the two dashboards, you can find that we are not even close to saturate the network I think. But I can check with google guys if our tests start to drop packets.
Thank you so much! I will use the results as a benchmark, continue testing with Docker and Kubernetes. Hoping that we can achieve similar results in Docker and Kubernetes (I guess Docker and Kubernetes will introduce overhead, but I am curious how many overhead it is).
It is totally fine. I am really appreciate your time and looking forward to your benchmarking in GCP. |
Yeah, it's not close to saturating the bandwidth. Throughput is another matter and is a bit more complicated.
|
Hey, dont use high performance apps inside Kubernetes or docker or aws or azure cloud. Pay in skilled admin with performance and security focus and inest money in bare metal server!!! Even my 9 years old notebook did more requests - laboratory notebook. |
@drinkbeer preliminary results... fetched DF binary v0.2.0 from https://github.com/dragonflydb/dragonfly/releases/download/v0.2.0/dragonfly-x86_64.unstripped.tar.gz ev@test-c1:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
dev@test-c1:~$ uname -a
Linux test-c1 5.15.0-1008-gcp #12-Ubuntu SMP Wed Jun 1 21:29:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux Discloser: it's my development image created via a packer pipeline defined here: https://github.com/romange/image-bakery After scanning it now, I think that the only substantial change performance-wise that I did there - is turning off mitigations: besides this - it's just convenience configs and utilities. I run only the first SET benchmark - I copy-pasted your command: DRAGONFLY_SERVER="10.142.0.18" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Already much better than your result. Lets try improving it.
much lower than before (4580% vs 3360%). Also p99 pretty good for both cases. DRAGONFLY_SERVER="10.142.0.18" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=30 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1977947.42 --- --- 0.46646 0.43900 1.07100 1.44700 664232.85
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 1977947.42 0.00 0.00 0.46646 0.43900 1.07100 1.44700 664232.85
p99.9 is too high IMHO. Lets take it down a notch:
pretty good - p99.9 1ms under with 1.6M QPS. |
Now I see you used 1 vCPU per core ratio. I use the regular 2 vCPU / core |
Step2: I took a plain Ubuntu image 22.04. client (loadtest) instance: took DRAGONFLY_SERVER="10.142.0.20" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 100000 -d 300 --pipeline=1 --clients=15 --threads=50 --run-count=1 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 45 secs] 0 threads: 75000000 ops, 2052410 (avg: 1651450) ops/sec, 673.08MB/sec (avg: 541.59MB/sec), 0.36 (avg: 0.45) msec latency
50 Threads
15 Connections per thread
100000 Requests per client
ALL STATS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1788894.22 --- --- 0.45224 0.41500 0.84700 1.51900 600745.16
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 1788894.22 0.00 0.00 0.45224 0.41500 0.84700 1.51900 600745.16
Seems that DF works ok on Ubuntu 22.04 out of the box. Next step - to check debian. |
Step 3: used BullsEye - Everything else like before. As you can see - I can confirm that Debian 11 is very bad performance-wise. Jianbin, I think there are enough data points here to continue evaluating DF. dev@test-c1:~$ DRAGONFLY_SERVER="10.142.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 100000 -d 300 --pipeline=1 --clients=15 --threads=50 --run-count=1 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 184 secs] 0 threads: 75000000 ops, 435703 (avg: 406529) ops/sec, 142.89MB/sec (avg: 133.32MB/sec), 1.72 (avg: 1.84) msec latency
50 Threads
15 Connections per thread
100000 Requests per client
ALL STATS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 432159.18 --- --- 1.84153 1.45500 7.32700 18.43100 145127.38
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 432159.18 0.00 0.00 1.84153 1.45500 7.32700 18.43100 145127.38
|
@drinkbeer hey man, did you have a chance to experiment with it? |
@drinkbeer I am closing. Feel free to reopen if you have any questions |
Thank you @romange , this issue can be closed. The next step, we will probably build Dragonfly in our staging environment, and benchmark it along with Envoy proxy (which is the proxy used with KeyDB in our Prod). Here are some results of our benchmarking. The performance of Dragonfly looks great. (updated at July 4th, 2022) TL:DRWe deployed Dragonfly, KeyDB on
SetupHardward
Dragonfly
KeyDB
Memtier
DragonflyResource UsageSet
Get
Mixed
Dragonfly (Docker)Resource UsageSet
Get
Mixed
KeyDB (4 threads)Resource UsageSet
Get
Mixed
KeyDB (16 threads)Resource UsageSet
Get
Mixed
|
@drinkbeer These are fantastic results! It really makes me happy 🕺🏼 to see that Dragonfly provides value! |
I would love to. I sent you an invitation through your LinkedIn. Let's chat. |
Hey, Dragonfly maintainers,
Thank you for your great work on this fantastic project. My teammates and I are impressed by the benchmark results and are trying to reproduce the benchmarking in Kubernetes (the reason we want to benchmark it in Kubernetes is we use k8s in our production environment).
I followed the set up in the readme and dashtable doc. I found my benchmarking result is not as good as you guys did, so I would like to publish my benchmarking results here, and hear the suggestions from all of you on how to improve the performance.
Any feedback are greatly appreciated. Thank you!
Test Environment Setup
Node:
v1.22.9-gke.1500
5.10.109+
)Dragonfly pod:
docker.dragonflydb.io/dragonflydb/dragonfly
Dragonfly info:
Dragonfly yaml file:
Keydb pod:
KeyDB info
We are using an internal version of KeyDB. KeyDB yaml file:
The memtier_benchmark job for Dragonfly:
The memtier_benchmark job for keydb:
Test Result
Here are the results of the tests.
I am impressed by the memory utilization of Dragonfly. Dragonfly uses only (31.19/117.3*100=) 26.59% of memory in KeyDB. Dragonfly also has better
Get
performance (higher throughput, lower latency).But KeyDB performs better in
Set
throughput and latency. In the mixed-set-get case, KeyDB also has better throughput, and latency.Pure Set
VECache (KeyDB)
Dragonfly
Pure Get
VECache (KeyDB)
Dragonfly
Mixed Set-Get (1:3)
VECache (KeyDB):
Dragonfly:
The text was updated successfully, but these errors were encountered: