Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seek for help on benchmarking of Dragonfly and KeyDB in Kubernetes #113

Closed
drinkbeer opened this issue Jun 7, 2022 · 22 comments
Closed
Labels
question Further information is requested

Comments

@drinkbeer
Copy link

drinkbeer commented Jun 7, 2022

Hey, Dragonfly maintainers,

Thank you for your great work on this fantastic project. My teammates and I are impressed by the benchmark results and are trying to reproduce the benchmarking in Kubernetes (the reason we want to benchmark it in Kubernetes is we use k8s in our production environment).

I followed the set up in the readme and dashtable doc. I found my benchmarking result is not as good as you guys did, so I would like to publish my benchmarking results here, and hear the suggestions from all of you on how to improve the performance.

Any feedback are greatly appreciated. Thank you!

Test Environment Setup

Node:

  • n2-highmem-16 (16 vCPU, 128GB memory)
  • Kubernetes version: v1.22.9-gke.1500
  • OS: Container-Optimized OS from Google ( Kernel Version: 5.10.109+)

Dragonfly pod:

  • Image: docker.dragonflydb.io/dragonflydb/dragonfly
  • CPU: 8 vCPU
  • Memory: 16 GB

Dragonfly info:

# Server
redis_version:df-0.1
redis_mode:standalone
arch_bits:64
multiplexing_api:iouring
tcp_port:6379
uptime_in_seconds:301378
uptime_in_days:3

# Clients
connected_clients:1
client_read_buf_capacity:256
blocked_clients:0

Dragonfly yaml file:

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: dragonfly
    type: test
  name: dragonfly
  namespace: jason-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dragonfly
      name: dragonfly
  serviceName: dragonfly
  template:
    metadata:
      annotations:
        ad.datadoghq.com/redis.check_names: '["redisdb"]'
        ad.datadoghq.com/redis.init_configs: '[{}]'
      labels:
        app: dragonfly
        name: dragonfly
    spec:
      automountServiceAccountToken: false
      containers:
      - args:
        image: docker.dragonflydb.io/dragonflydb/dragonfly
        command: ["/bin/sh"]
        args: ["-c", "ulimit -l unlimited && dragonfly --logtostderr"]
        imagePullPolicy: IfNotPresent
        name: redis
        ports:
        - containerPort: 6379
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 3
          successThreshold: 3
          tcpSocket:
            port: 6379
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "8"
            memory: 16000Mi
          requests:
            cpu: "8"
            memory: 16000Mi
        securityContext:
          capabilities:
            drop:
            - AUDIT_WRITE
            - CHOWN
            - DAC_OVERRIDE
            - FOWNER
            - FSETID
            - KILL
            - MKNOD
            - NET_BIND_SERVICE
            - SETGID
            - SETFCAP
            - SETPCAP
            - SETUID
            - SYS_CHROOT
          privileged: true
          runAsNonRoot: false
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - |
          sysctl -w net.core.somaxconn=8192;sysctl -w net.ipv4.tcp_max_syn_backlog=8192;echo never > /host-sys/kernel/mm/transparent_hugepage/enabled;echo never > /host-sys/kernel/mm/transparent_hugepage/defrag
        image: gcr.io/shopify-docker-images/cloud/busybox:1.0
        imagePullPolicy: IfNotPresent
        name: system-init
        resources:
          limits:
            cpu: "1"
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 1Gi
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /host-sys
          name: host-sys
      nodeSelector:
        role: vecache
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 1
      tolerations:
      - effect: NoExecute
        key: app
        operator: Equal
        value: vecache
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 30
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 30
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: vecache
        maxSkew: 4
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
      - labelSelector:
          matchLabels:
            app: vecache
        maxSkew: 4
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      volumes:
      - hostPath:
          path: /sys
          type: ""
        name: host-sys
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    namebuddy.shopify.io/cname: dragonfly-0.jason-test.staging-cq-state-us-east1-1.test
    namebuddy.shopify.io/dns: ttl=5
  labels:
    app: dragonfly
    type: test
  name: dragonfly-0
  namespace: jason-test
spec:
  ports:
  - port: 6379
    protocol: TCP
    targetPort: 6379
  selector:
    statefulset.kubernetes.io/pod-name: dragonfly-0

Keydb pod:

KeyDB info

# Server
redis_version:6.0.16
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4d5c208d4d774a11
redis_mode:standalone
os:Linux 5.10.109+ x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:7.5.0
process_id:1
run_id:8d4e6f7ffe3a90f66aa4ce8f84155d463a74d546
tcp_port:6379
uptime_in_seconds:306392
uptime_in_days:3
hz:10
configured_hz:10
lru_clock:10415189
executable:/usr/local/bin/keydb-server
config_file:/etc/redis/redis.conf

# Clients
connected_clients:1
client_recent_max_input_buffer:4
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:0
thread_0_clients:1
thread_1_clients:0
thread_2_clients:0
thread_3_clients:0

We are using an internal version of KeyDB. KeyDB yaml file:

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: vecache
    type: test
  name: vecache
  namespace: jason-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vecache
      name: vecache
  serviceName: vecache
  template:
    metadata:
      annotations:
        ad.datadoghq.com/redis.check_names: '["redisdb"]'
        ad.datadoghq.com/redis.init_configs: '[{}]'
      labels:
        app: vecache
        name: vecache
    spec:
      automountServiceAccountToken: false
      containers:
      - args:
        - --maxmemory 8000Mb
        - --server-threads 4
        image: gcr.io/shopify-docker-images/cloud/vecache:1.0-6.0.16
        imagePullPolicy: IfNotPresent
        name: redis
        ports:
        - containerPort: 6379
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 3
          successThreshold: 3
          tcpSocket:
            port: 6379
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "8"
            memory: 16000Mi
          requests:
            cpu: "8"
            memory: 16000Mi
        securityContext:
          capabilities:
            drop:
            - AUDIT_WRITE
            - CHOWN
            - DAC_OVERRIDE
            - FOWNER
            - FSETID
            - KILL
            - MKNOD
            - NET_BIND_SERVICE
            - SETGID
            - SETFCAP
            - SETPCAP
            - SETUID
            - SYS_CHROOT
          privileged: false
          runAsNonRoot: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - |
          sysctl -w net.core.somaxconn=8192;sysctl -w net.ipv4.tcp_max_syn_backlog=8192;echo never > /host-sys/kernel/mm/transparent_hugepage/enabled;echo never > /host-sys/kernel/mm/transparent_hugepage/defrag
        image: gcr.io/shopify-docker-images/cloud/busybox:1.0
        imagePullPolicy: IfNotPresent
        name: system-init
        resources:
          limits:
            cpu: "1"
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 1Gi
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /host-sys
          name: host-sys
      nodeSelector:
        role: vecache
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 1
      tolerations:
      - effect: NoExecute
        key: app
        operator: Equal
        value: vecache
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 30
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 30
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: vecache
        maxSkew: 4
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
      - labelSelector:
          matchLabels:
            app: vecache
        maxSkew: 4
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      volumes:
      - hostPath:
          path: /sys
          type: ""
        name: host-sys
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    namebuddy.shopify.io/cname: vecache-0.jason-test.staging-cq-state-us-east1-1.test
    namebuddy.shopify.io/dns: ttl=5
  labels:
    app: vecache
    type: test
  name: vecache-0
  namespace: jason-test
spec:
  ports:
  - port: 6379
    protocol: TCP
    targetPort: 6379
  selector:
    statefulset.kubernetes.io/pod-name: vecache-0

The memtier_benchmark job for Dragonfly:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: memtier-dragonfly
  namespace: jason-test
spec:
  completions: 1
  parallelism: 1
  template:
    metadata:
      labels:
        app: memtier
    spec:
      containers:
      - name: memtier
        image: redislabs/memtier_benchmark
        args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=1:3"]
        resources:
          limits:
            cpu: "200m"
            memory: "250Mi"
          requests:
            cpu: "100m"
            memory: "128Mi"
        env:
         - name: REDIS_PORT
           value: "6379"
         - name: REDIS_SERVER
           value: "dragonfly-0.jason-test.svc.cluster.local"  # can be full hostname or just the resource name in k8s
        imagePullPolicy: Always
      tolerations:
      - effect: NoExecute
        key: app
        operator: Equal
        value: vecache
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 30
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 30
      restartPolicy: OnFailure

The memtier_benchmark job for keydb:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: memtier-vecache
  namespace: jason-test
spec:
  completions: 1
  parallelism: 1
  template:
    metadata:
      labels:
        app: memtier
    spec:
      containers:
      - name: memtier
        image: redislabs/memtier_benchmark
        args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=1:3"]
        resources:
          limits:
            cpu: "200m"
            memory: "250Mi"
          requests:
            cpu: "100m"
            memory: "128Mi"
        env:
         - name: REDIS_PORT
           value: "6379"
         - name: REDIS_SERVER
           value: "vecache-0.jason-test.staging-cq-state-us-east1-1.test"  # can be full hostname or just the resource name in k8s
        imagePullPolicy: Always
      tolerations:
      - effect: NoExecute
        key: app
        operator: Equal
        value: vecache
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 30
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 30
      restartPolicy: OnFailure

Test Result

Here are the results of the tests.

I am impressed by the memory utilization of Dragonfly. Dragonfly uses only (31.19/117.3*100=) 26.59% of memory in KeyDB. Dragonfly also has better Get performance (higher throughput, lower latency).
But KeyDB performs better in Set throughput and latency. In the mixed-set-get case, KeyDB also has better throughput, and latency.

image

Pure Set

    args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=1:0"]

VECache (KeyDB)

  • Throughput: 25246.27 ops/sec, 8478.19 KB/sec
  • Latency: 10.07700 ms
  • used_memory_human:117.30M
➜  Documents k logs memtier-vecache-v5s97
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 405 secs]  0 threads:    10000000 ops,   39564 (avg:   24665) ops/sec, 12.98MB/sec (avg: 8.09MB/sec),  6.36 (avg: 10.08) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 404 secs]  0 threads:    10000000 ops,   31802 (avg:   24697) ops/sec, 10.43MB/sec (avg: 8.10MB/sec),  7.78 (avg: 10.07) msec latency

5         Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets        25378.16          ---          ---     10.07100      8522.48
Gets            0.00         0.00         0.00      0.00000         0.00
Waits           0.00          ---          ---      0.00000          ---
Totals      25378.16         0.00         0.00     10.07100      8522.48


WORST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets        25114.38          ---          ---     10.08300      8433.89
Gets            0.00         0.00         0.00      0.00000         0.00
Waits           0.00          ---          ---      0.00000          ---
Totals      25114.38         0.00         0.00     10.08300      8433.89


AGGREGATED AVERAGE RESULTS (2 runs)
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets        25246.27          ---          ---     10.07700      8478.19
Gets            0.00         0.00         0.00      0.00000         0.00
Waits           0.00          ---          ---      0.00000          ---
Totals      25246.27         0.00         0.00     10.07700      8478.19

Dragonfly

  • Throughput: 19307.83 ops/sec, 6483.94 KB/sec
  • Latency: 12.99300 ms
  • used_memory_human:31.19MiB
➜  Documents k logs memtier-dragonfly-j9l8m
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 519 secs]  0 threads:     9999996 ops,   23093 (avg:   19231) ops/sec, 7.58MB/sec (avg: 6.31MB/sec), 10.80 (avg: 12.96) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 522 secs]  0 threads:    10000000 ops,   21434 (avg:   19131) ops/sec, 7.03MB/sec (avg: 6.27MB/sec), 11.61 (avg: 13.03) msec latency

5         Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets        19647.63          ---          ---     12.96000      6598.05
Gets            0.00         0.00         0.00      0.00000         0.00
Waits           0.00          ---          ---      0.00000          ---
Totals      19647.63         0.00         0.00     12.96000      6598.05


WORST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets        18968.02          ---          ---     13.02600      6369.83
Gets            0.00         0.00         0.00      0.00000         0.00
Waits           0.00          ---          ---      0.00000          ---
Totals      18968.02         0.00         0.00     13.02600      6369.83


AGGREGATED AVERAGE RESULTS (2 runs)
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets        19307.83          ---          ---     12.99300      6483.94
Gets            0.00         0.00         0.00      0.00000         0.00
Waits           0.00          ---          ---      0.00000          ---
Totals      19307.83         0.00         0.00     12.99300      6483.94

Pure Get

    args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=0:1"]

VECache (KeyDB)

  • Throughput: 25357.45 ops/sec, 8364.74 KB/sec
  • Latency: 9.93550 ms
  • used_memory_human:117.30M
➜  Documents k logs memtier-vecache-xh6xj
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 400 secs]  0 threads:    10000000 ops,   46369 (avg:   24938) ops/sec, 14.94MB/sec (avg: 8.03MB/sec),  5.37 (avg:  9.97) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 398 secs]  0 threads:     9999999 ops,   57272 (avg:   25116) ops/sec, 18.45MB/sec (avg: 8.09MB/sec),  4.36 (avg:  9.90) msec latency

5         Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets            0.00          ---          ---      0.00000         0.00
Gets        25426.75     25426.75         0.00      9.90000      8387.60
Waits           0.00          ---          ---      0.00000          ---
Totals      25426.75     25426.75         0.00      9.90000      8387.60


WORST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets            0.00          ---          ---      0.00000         0.00
Gets        25288.15     25288.15         0.00      9.97100      8341.88
Waits           0.00          ---          ---      0.00000          ---
Totals      25288.15     25288.15         0.00      9.97100      8341.88


AGGREGATED AVERAGE RESULTS (2 runs)
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets            0.00          ---          ---      0.00000         0.00
Gets        25357.45     25357.45         0.00      9.93550      8364.74
Waits           0.00          ---          ---      0.00000          ---
Totals      25357.45     25357.45         0.00      9.93550      8364.74

Dragonfly

  • Throughput: 27705.71 ops/sec, 9139.37 KB/sec
  • Latency: 9.11100 ms
  • used_memory_human:31.19MiB
➜  Documents k logs memtier-dragonfly-5kzsm
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 365 secs]  0 threads:     9999999 ops,   83523 (avg:   27366) ops/sec, 26.91MB/sec (avg: 8.82MB/sec),  2.77 (avg:  9.11) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 363 secs]  0 threads:     9999999 ops,   84975 (avg:   27502) ops/sec, 27.37MB/sec (avg: 8.86MB/sec),  2.69 (avg:  9.07) msec latency

5         Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets            0.00          ---          ---      0.00000         0.00
Gets        27705.71     27705.71         0.00      9.11100      9139.37
Waits           0.00          ---          ---      0.00000          ---
Totals      27705.71     27705.71         0.00      9.11100      9139.37


WORST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets            0.00          ---          ---      0.00000         0.00
Gets            0.00         0.00         0.00      9.06700         0.00
Waits           0.00          ---          ---      0.00000          ---
Totals          0.00         0.00         0.00      9.06700         0.00


AGGREGATED AVERAGE RESULTS (2 runs)
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets            0.00          ---          ---      0.00000         0.00
Gets        13852.86     13852.86         0.00      9.08900      4569.68
Waits           0.00          ---          ---      0.00000          ---
Totals      13852.86     13852.86         0.00      9.08900      4569.68

Mixed Set-Get (1:3)

    args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=1:3"]

VECache (KeyDB):

  • Throughput: 31736.69 ops/sec, 10507.80 KB/sec
  • Latency: 8.16500 ms
  • used_memory_human:117.30M
➜  Documents k logs memtier-vecache-qvgq6
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 327 secs]  0 threads:    10000000 ops,   36385 (avg:   30555) ops/sec, 11.77MB/sec (avg: 9.88MB/sec),  6.85 (avg:  8.13) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 330 secs]  0 threads:    10000000 ops,   34522 (avg:   30286) ops/sec, 11.16MB/sec (avg: 9.79MB/sec),  7.22 (avg:  8.20) msec latency

5         Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets         8009.01          ---          ---      8.13400      2681.06
Gets        24027.04     24027.04         0.00      8.12800      7925.85
Waits           0.00          ---          ---      0.00000          ---
Totals      32036.06     24027.04         0.00      8.12900     10606.91


WORST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets         7859.33          ---          ---      8.20400      2630.95
Gets        23578.00     23578.00         0.00      8.20000      7777.73
Waits           0.00          ---          ---      0.00000          ---
Totals      31437.33     23578.00         0.00      8.20100     10408.68


AGGREGATED AVERAGE RESULTS (2 runs)
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets         7934.17          ---          ---      8.16900      2656.01
Gets        23802.52     23802.52         0.00      8.16400      7851.79
Waits           0.00          ---          ---      0.00000          ---
Totals      31736.69     23802.52         0.00      8.16500     10507.80

Dragonfly:

  • Throughput: 23444.44 ops/sec, 7762.29 KB/sec
  • Latency: 10.65200 ms
  • used_memory_human:31.19MiB
➜  Documents k logs memtier-dragonfly-ws9rr
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 425 secs]  0 threads:    10000000 ops,   25440 (avg:   23479) ops/sec, 8.23MB/sec (avg: 7.59MB/sec), 10.22 (avg: 10.60) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 430 secs]  0 threads:    10000000 ops,   31976 (avg:   23230) ops/sec, 10.34MB/sec (avg: 7.51MB/sec),  7.79 (avg: 10.71) msec latency

5         Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets         5922.85          ---          ---     10.61200      1982.71
Gets        17768.56     17768.56         0.00     10.59100      5861.35
Waits           0.00          ---          ---      0.00000          ---
Totals      23691.41     17768.56         0.00     10.59600      7844.06


WORST RUN RESULTS
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets         5799.37          ---          ---     10.71700      1941.37
Gets        17398.11     17398.11         0.00     10.70500      5739.15
Waits           0.00          ---          ---      0.00000          ---
Totals      23197.48     17398.11         0.00     10.70800      7680.52


AGGREGATED AVERAGE RESULTS (2 runs)
=========================================================================
Type         Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
-------------------------------------------------------------------------
Sets         5861.11          ---          ---     10.66450      1962.04
Gets        17583.33     17583.33         0.00     10.64800      5800.25
Waits           0.00          ---          ---      0.00000          ---
Totals      23444.44     17583.33         0.00     10.65200      7762.29
@romange
Copy link
Collaborator

romange commented Jun 7, 2022

Hi Jianbin,

very impressive work so far! I will tell you what I know and what I do not know.

Facts that I know:

  1. 10-25K qps is very bad result and both KeyDb and Dragonfly can do much better than that.
  2. Similarly 10ms (is it avg? 99th? ) is also an awful number for an in-memory store. You should not get there.

Now, it's hard for me to say what causes this based on the data you put here because a) I do not have hands-on experience with K8S as a deployment system b) there is some missing data

Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea.

When you put multiple pods like Redis/KeyDB/DF on the same node, do not expect that they all get dedicated networking capacity: you are bounded by limitations of the underlying hardware but now it's divided between two hungry pods.

You did not write were do you benchmark them from. Is it a different node? same node? same zone?

What I would do is the following:

  1. Run a gcp instance (8-16 CPU cores) with plain ubuntu. Say 22.04. download DF/KeyDB binaries there and run it. Do not use docker images first.
  2. Run memtier on a different node. It should be at least the same size as your software under test. Rule of thumb - it should have more CPUs than your maximal --threads argument in memtier.
  3. run all your nodes in the same VPC and zone.
  4. Do not benchmark both servers concurrently on the same node! They will just compete over CPU/networking there.
  5. Run memtier without pipelining mode first. Learn your software first. --clients in 10-40 range is fine. --thread should be set in such way that a server under test would give you the highest throughput but still give you low latency. For DF it's easy to see - if it uses more than 95% of the total CPU it means it reaches the limits of the underlying machine. For KeyDB - it depends on the server-threads argument you pass (the suggest 4 but I used 8 in my tests). In any case, if you see in htop that KeyDB K cpus are at 100% then it means it won't go higher as well. If you see that avg latency is above 1ms it means that the server is overloaded and you should probably decrease --threads in your memtier.

by running on a raw GCP instance, you will learn what are the "normal" performance ranges of each server and what are the normal latencies and what are the optimal configurations for memtier.

Once you have this, you may start working your way to K8S. But I would not jump straight there. I would first run your favorite configuration above but with running servers from a container instead running a native binary.
Now, there are some options here too. You can run docker run --network=host or with port mappings. I suspect that port mapping will degrade your numbers greatly but maybe --network=host will also affect them.

If you use pipelining, be ready to reduce --clients parameter to 5-10. The pipelining affects latency as well.

To summarize, signs of a good benchmark:

  1. low latency with non-pipeline mode (my rule was 99th percentile around 1ms)
  2. High CPU utilization of the instance with server under test.
  3. Try to reduce noise as much as possible and to use dedicated VMs.

Regarding (2) for some instance types you won't be able to reach full CPU utilization (i.e. 16 cores working at 100%) if they are network bound. But you should probably still see well above 1M qps on DF on n2 with 16 cores.

@romange
Copy link
Collaborator

romange commented Jun 7, 2022

And do not forget to drink beer!

@romange
Copy link
Collaborator

romange commented Jun 7, 2022

Just noticed your other memtier parameters. You can be a bit more frisky with the keyspace lengths, you use big instances, so it's ok: --key-maximum=10000000.

I do not know if --key-pattern=S:R was a deliberate choice. I used R:R but also used --distinct-client-seed. Otherwise, each client connection goes through exactly the same route...

@drinkbeer
Copy link
Author

drinkbeer commented Jun 7, 2022

Thank you so much for your reply. I think your suggestion of testing with GCP instance is great. I will follow the steps and try to get ideal P99 latency. I will update the results here once I finish the tests.

Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea.

No. They are running on two different nodes in the same cluster, so same region (us-east1), and same zone (us-east1-d). Memtier jobs are also running on the same nodes (memtier job for keydb runs in the same node as keydb cache; memtier job for dragonfly runs in the same node as dragonfly cache).

And two memtier jobs are running sequentially to avoid saturating the network.

I do not know if --key-pattern=S:R was a deliberate choice. I used R:R but also used --distinct-client-seed. Otherwise, each client connection goes through exactly the same route...

I first runs pure set memtier job that does only set from key:1 to key:10000000 to test set operations only; then I run pure get memtier job which has 100% hit rate as all the keys in the key space are filled; then I run mixed-set-get memtier job which has 25% of set and 75% of get, and the hit rate is 100% as well (Check the Misses/sec metric, they are 0 which means hit rate is 100%). I think the --key-pattern=S:R only affects the third memtier job (mixed ops one). Because hit rate is 100%, I think the key-pattern doesn't affect our performance a lot. But I could try --distinct-client-seed option in my test in GCP instance.

@romange
Copy link
Collaborator

romange commented Jun 7, 2022

I would run memtier separately from the server node as well. Not that it's impossible to get 1M running both on the same machine but it greatly affects benchmark numbers when reaching high throughput ranges.

@drinkbeer
Copy link
Author

I would run memtier separately from the server node as well. Not that it's impossible to get 1M running both on the same machine but it greatly affects benchmark numbers when reaching high throughput ranges.

A good point. I originally thought that running them on the same node will save some networks. I will run them in separate instances when benchmarking with GCP instances.

@romange romange added the question Further information is requested label Jun 7, 2022
@ryanrussell
Copy link
Contributor

@drinkbeer

Any chance you have a reproducible bash script that would cover the tests you are trying between DF and Key?

These are wonderful bits of feedback; it would be interesting to make a canonical test script and deployment yaml so that different users on different platforms can execute the same test suite.

While I don't have any better feedback than what @romange provided, I could take a swing at dockerizing a test script to be more consistently reproducible and include other platforms as well in the future.

@romange
Copy link
Collaborator

romange commented Jun 7, 2022

@ryanrussell in terms of priority for the project, writing canonical benchmarking scripts is less important right now.
You have great knowledge on how to improve the maintainability and manageability of the project. I think these areas will have the highest ROI if tackled sooner.

@drinkbeer
Copy link
Author

Hey, @romange @ryanrussell , I followed your suggestions and re-ran all the tests in GCP VM instances. Dragonfly overwhelms KeyDB in P99 latency (1.24700 ms vs 1.99100 ms), throughput (578167.18 ops/sec vs 322822.64 ops/sec), memory used (2.84GiB vs 3.70G). But in the machine observability dashboard, I found that the peak CPU usage for Dragonfly is much higher than KeyDB (50% vs 10%).

  • In my test, the throughput of Dragonfly is about twice of the throughput of KeyDB. In your benchmarking, Dragonfly can achieve 3.8M QPS, and is 25x of the throughput in Redis. Do you have some suggestions for me to further optimize the Dragonfly and KeyDB test to achieve better performance?
  • In our production, our CPU is always under-used. In this test, Dragonfly uses 5X more CPUs than KeyDB. Could we reduce the CPU usage of Dragonfly?

My next step is to benchmark with Docker and Kubernetes. And will update the results in this issue.

Update (2022-05-09)

TL;DR

Dragonfly

===============================================================================================================================
Type         Ops/sec     Hits/sec       Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
-------------------------------------------------------------------------------------------------------------------------------
Sets        580235.73         0.00         0.00         0.52789         0.48700         1.34300         2.36700        194854.33
Gets        585411.39    585411.39         0.00         0.51945         0.47900         1.27900         2.71900        193733.96
Mixed       578167.18    433625.38         0.00         0.52565         0.48700         1.24700         1.71900        192042.35

Memory Usage: 2.84GiB

KeyDB

===============================================================================================================================
Type         Ops/sec     Hits/sec       Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
-------------------------------------------------------------------------------------------------------------------------------
Sets       288430.42         0.00         0.00         1.03785         0.89500         2.71900         7.16700         96860.48
Gets       380243.32    380243.32         0.00         0.80936         0.68700         1.48700         2.03100        125836.37
Mixed      322822.64    242116.98         0.00         0.95814         0.80700         1.99100         6.01500        107227.84

Memory Usage: 3.70G

Set up

I provisioned three VM instances:

  • c2-standard-60, 30 vCPU/ 30 cores + 240 GB memory
  • 10 GB balanced PD
  • OS: Debian GNU/Linux 11 (bullseye)

Dragonfly:

jchome@dragonfly-worker:~/dragonfly/build-opt$ ./dragonfly --alsologtostderr
I20220609 19:01:26.782763 22665 init.cc:56] ./dragonfly running in opt mode.
I20220609 19:01:26.783080 22665 dfly_main.cc:179] maxmemory has not been specified. Deciding myself....
I20220609 19:01:26.783149 22665 dfly_main.cc:184] Found 234.06GiB available memory. Setting maxmemory to 187.25GiB
I20220609 19:01:26.783819 22666 proactor.cc:456] IORing with 1024 entries, allocated 102720 bytes, cq_entries is 2048
I20220609 19:01:26.787039 22665 proactor_pool.cc:66] Running 30 io threads
I20220609 19:01:26.797847 22665 server_family.cc:198] Data directory is "/home/jchome/dragonfly/build-opt"
I20220609 19:01:26.797976 22665 server_family.cc:122] Checking "/home/jchome/dragonfly/build-opt/dump"
I20220609 19:01:26.798053 22669 listener_interface.cc:79] sock[96] AcceptServer - listening on port 6379

KeyDB:

jchome@keydb-worker:~/KeyDB/src$ ./keydb-server --server-threads 4 --maxmemory 188G --port 6379 --protected-mode no
97236:97236:C 10 Jun 2022 02:00:30.422 # oO0OoO0OoO0Oo KeyDB is starting oO0OoO0OoO0Oo
97236:97236:C 10 Jun 2022 02:00:30.422 # KeyDB version=255.255.255, bits=64, commit=aa032d30, modified=0, pid=97236, just started
97236:97236:C 10 Jun 2022 02:00:30.422 # Configuration loaded
97236:97236:M 10 Jun 2022 02:00:30.423 * Increased maximum number of open files to 10032 (it was originally set to 1024).
97236:97236:M 10 Jun 2022 02:00:30.423 * monotonic clock: POSIX clock_gettime

                  _
               _-(+)-_
            _-- /   \ --_
         _--   /     \   --_            KeyDB  255.255.255 (aa032d30/0) 64 bit
     __--     /       \     --__
    (+) _    /         \    _ (+)       Running in standalone mode
     |   -- /           \ --   |        Port: 6379
     |     /--_   _   _--\     |        PID: 97236
     |    /     -(+)-     \    |
     |   /        |        \   |        https://docs.keydb.dev
     |  /         |         \  |
     | /          |          \ |
    (+)_ -- -- -- | -- -- -- _(+)
        --_       |       _--
            --_   |   _--
                -(+)-        KeyDB has now joined Snap! See the announcement at:  https://docs.keydb.dev/news


97236:97236:M 10 Jun 2022 02:00:30.424 # Server initialized
97236:97236:M 10 Jun 2022 02:00:30.424 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
97236:97236:M 10 Jun 2022 02:00:30.424 * Loading RDB produced by version 255.255.255
97236:97236:M 10 Jun 2022 02:00:30.424 * RDB age 11 seconds
97236:97236:M 10 Jun 2022 02:00:30.424 * RDB memory usage when created 2.97 Mb
97236:97236:M 10 Jun 2022 02:00:30.424 # Done loading RDB, keys loaded: 0, keys expired: 0.
97236:97236:M 10 Jun 2022 02:00:30.424 * DB loaded from disk: 0.000 seconds
97236:97249:M 10 Jun 2022 02:00:30.424 * Thread 0 alive.
97236:97250:M 10 Jun 2022 02:00:30.424 * Thread 1 alive.
97236:97251:M 10 Jun 2022 02:00:30.424 * Thread 2 alive.
97236:97252:M 10 Jun 2022 02:00:30.424 * Thread 3 alive.

Dragonfly

Pure Set

jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 105 secs]  0 threads:    60000000 ops, 1283453 (avg:  570846) ops/sec, 420.91MB/sec (avg: 187.21MB/sec),  0.23 (avg:  0.52) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 106 secs]  0 threads:    60000000 ops, 1055065 (avg:  565307) ops/sec, 346.00MB/sec (avg: 185.39MB/sec),  0.28 (avg:  0.53) msec latency

30        Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       592927.98          ---          ---         0.52517         0.48700         1.27100         1.84700    199116.64
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     592927.98         0.00         0.00         0.52517         0.48700         1.27100         1.84700    199116.64


WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       567543.49          ---          ---         0.53062         0.48700         1.42300         2.67100    190592.01
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     567543.49         0.00         0.00         0.53062         0.48700         1.42300         2.67100    190592.01


AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       580235.73          ---          ---         0.52789         0.48700         1.34300         2.36700    194854.33
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     580235.73         0.00         0.00         0.52789         0.48700         1.34300         2.36700    194854.33

Pure Get

jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 105 secs]  0 threads:    60000000 ops, 1076305 (avg:  569751) ops/sec, 347.84MB/sec (avg: 184.13MB/sec),  0.28 (avg:  0.53) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 102 secs]  0 threads:    60000000 ops,  942359 (avg:  584836) ops/sec, 304.55MB/sec (avg: 189.01MB/sec),  0.32 (avg:  0.51) msec latency

30        Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       593178.84    593178.84         0.00         0.51283         0.47900         1.25500         1.79900    196304.47
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     593178.84    593178.84         0.00         0.51283         0.47900         1.25500         1.79900    196304.47


WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       577643.93    577643.93         0.00         0.52607         0.47900         1.31100         5.47100    191163.44
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     577643.93    577643.93         0.00         0.52607         0.47900         1.31100         5.47100    191163.44


AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       585411.39    585411.39         0.00         0.51945         0.47900         1.27900         2.71900    193733.96
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     585411.39    585411.39         0.00         0.51945         0.47900         1.27900         2.71900    193733.96

Mixed Set-Get

jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 105 secs]  0 threads:    60000000 ops,  992211 (avg:  570415) ops/sec, 321.85MB/sec (avg: 185.03MB/sec),  0.30 (avg:  0.52) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 105 secs]  0 threads:    60000000 ops, 1346899 (avg:  570342) ops/sec, 436.90MB/sec (avg: 185.00MB/sec),  0.22 (avg:  0.53) msec latency

30        Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       145404.28          ---          ---         0.52710         0.49500         1.24700         1.80700     48829.56
Gets       436212.84    436212.84         0.00         0.52488         0.48700         1.24700         1.80700    144358.73
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     581617.12    436212.84         0.00         0.52543         0.48700         1.24700         1.80700    193188.29


WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       143679.31          ---          ---         0.52817         0.49500         1.24700         1.67100     48250.26
Gets       431037.93    431037.93         0.00         0.52511         0.48700         1.24700         1.66300    142646.15
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     574717.24    431037.93         0.00         0.52587         0.48700         1.24700         1.66300    190896.42


AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       144541.79          ---          ---         0.52764         0.49500         1.24700         1.71900     48539.91
Gets       433625.38    433625.38         0.00         0.52499         0.48700         1.24700         1.71900    143502.44
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     578167.18    433625.38         0.00         0.52565         0.48700         1.24700         1.71900    192042.35

Memory

jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && echo "info memory" | nc $DRAGONFLY_SERVER 6379
$462
# Memory
used_memory:3047981240
used_memory_human:2.84GiB
used_memory_peak:3047981240
comitted_memory:3894657024
used_memory_rss:3181318144
used_memory_rss_human:2.96GiB
object_used_memory:2559986176
table_used_memory:480213552
num_buckets:12472320
num_entries:9999947
inline_keys:9999947
strval_bytes:2559986176
listpack_blobs:0
listpack_bytes:0
small_string_bytes:2559986176
maxmemory:201405674291
maxmemory_human:187.57GiB
cache_mode:store

Dashboard

image

KeyDB

Pure Set

jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && REDIS_PORT=6379 && memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 211 secs]  0 threads:    60000000 ops,  696657 (avg:  283109) ops/sec, 228.47MB/sec (avg: 92.85MB/sec),  0.43 (avg:  1.06) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 203 secs]  0 threads:    60000000 ops,  605828 (avg:  295104) ops/sec, 198.68MB/sec (avg: 96.78MB/sec),  0.49 (avg:  1.02) msec latency

30        Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       293413.30          ---          ---         1.01653         0.83100         2.38300         6.71900     98533.82
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     293413.30         0.00         0.00         1.01653         0.83100         2.38300         6.71900     98533.82


WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       283447.54          ---          ---         1.05917         0.94300         3.02300         7.58300     95187.15
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     283447.54         0.00         0.00         1.05917         0.94300         3.02300         7.58300     95187.15


AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       288430.42          ---          ---         1.03785         0.89500         2.71900         7.16700     96860.48
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     288430.42         0.00         0.00         1.03785         0.89500         2.71900         7.16700     96860.48

Pure Get

jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && REDIS_PORT=6379 && memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 159 secs]  0 threads:    60000000 ops,  924097 (avg:  376554) ops/sec, 298.65MB/sec (avg: 121.69MB/sec),  0.32 (avg:  0.80) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 164 secs]  0 threads:    60000000 ops,  683873 (avg:  364556) ops/sec, 221.01MB/sec (avg: 117.82MB/sec),  0.44 (avg:  0.82) msec latency

30        Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       388569.90    388569.90         0.00         0.79598         0.66300         1.47100         2.67100    128591.95
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     388569.90    388569.90         0.00         0.79598         0.66300         1.47100         2.67100    128591.95


WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       371916.73    371916.73         0.00         0.82274         0.70300         1.50300         1.91100    123080.79
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     371916.73    371916.73         0.00         0.82274         0.70300         1.50300         1.91100    123080.79


AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       380243.32    380243.32         0.00         0.80936         0.68700         1.48700         2.03100    125836.37
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     380243.32    380243.32         0.00         0.80936         0.68700         1.48700         2.03100    125836.37

Mixed Set-Get

jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && REDIS_PORT=6379 && memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 195 secs]  0 threads:    60000000 ops,  531805 (avg:  307101) ops/sec, 172.50MB/sec (avg: 99.62MB/sec),  0.56 (avg:  0.98) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 188 secs]  0 threads:    60000000 ops,  744913 (avg:  319107) ops/sec, 241.63MB/sec (avg: 103.51MB/sec),  0.40 (avg:  0.94) msec latency

30        Threads
10        Connections per thread
200000    Requests per client


BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        82518.16          ---          ---         0.95311         0.80700         1.88700         5.69500     27711.18
Gets       247554.48    247554.48         0.00         0.93559         0.79100         1.87100         5.59900     81924.79
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     330072.63    247554.48         0.00         0.93997         0.79100         1.87100         5.63100    109635.97


WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        78893.16          ---          ---         0.99005         0.83100         2.09500         6.30300     26493.84
Gets       236679.48    236679.48         0.00         0.97173         0.81500         2.07900         6.27100     78325.87
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     315572.64    236679.48         0.00         0.97631         0.82300         2.07900         6.27100    104819.71


AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        80705.66          ---          ---         0.97158         0.82300         2.00700         6.07900     27102.51
Gets       242116.98    242116.98         0.00         0.95366         0.80700         1.98300         6.01500     80125.33
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     322822.64    242116.98         0.00         0.95814         0.80700         1.99100         6.01500    107227.84

Memory Usage

jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && echo "info memory" | nc $KEYDB_SERVER 6379
$1190
# Memory
used_memory:3977378208
used_memory_human:3.70G
used_memory_rss:5380833280
used_memory_rss_human:5.01G
used_memory_peak:5496877432
used_memory_peak_human:5.12G
used_memory_peak_perc:72.36%
used_memory_overhead:537329080
used_memory_startup:3113504
used_memory_dataset:3440049128
used_memory_dataset_perc:86.56%
allocator_allocated:3977771688
allocator_active:5311553536
allocator_resident:5373931520
total_system_memory:253563305984
total_system_memory_human:236.15G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:188000000000
maxmemory_human:175.09G
maxmemory_policy:noeviction
allocator_frag_ratio:1.34
allocator_frag_bytes:1333781848
allocator_rss_ratio:1.01
allocator_rss_bytes:62377984
rss_overhead_ratio:1.00
rss_overhead_bytes:6901760
mem_fragmentation_ratio:1.35
mem_fragmentation_bytes:1403517104
mem_not_counted_for_evict:1048576
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:none

Dashboard

image

@romange
Copy link
Collaborator

romange commented Jun 10, 2022

@drinkbeer , do not expect to reach anywhere close to 3.8M qps on GCP. AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput. I will benchmark GCP and get back to you.

You provided a great reference point with your results! It will take me a week or so. Hope it's ok.

@drinkbeer
Copy link
Author

AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput.

If you take a look at the two dashboards, you can find that we are not even close to saturate the network I think. But I can check with google guys if our tests start to drop packets.

I will benchmark GCP and get back to you.

Thank you so much! I will use the results as a benchmark, continue testing with Docker and Kubernetes. Hoping that we can achieve similar results in Docker and Kubernetes (I guess Docker and Kubernetes will introduce overhead, but I am curious how many overhead it is).

It will take me a week or so. Hope it's ok.

It is totally fine. I am really appreciate your time and looking forward to your benchmarking in GCP.

@romange
Copy link
Collaborator

romange commented Jun 10, 2022

AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput.

If you take a look at the two dashboards, you can find that we are not even close to saturate the network I think. But I can check with google guys if our tests start to drop packets.

Yeah, it's not close to saturating the bandwidth. Throughput is another matter and is a bit more complicated.

  1. Clouds do not disclose this publicly, but they all put limits on Packets Per Second (PPS) for their VMs. They must, because I/O is a shared resource shared by all VMs on the server, unlike CPUs or memory that are dedicated wholly to each VM.
  2. Redis has a very naive protocol (ping pong style) that usually incurs lots of overhead due to interrupts.
    Essentially, a client sends a small packet and waits for a reply. A server receives a hardware interrupt from its NIC once a packet arrives. maybe spins/waits a bit to see if more packets arrive so that its interrupt handler won't handle just a single tiny packet. But nothing will arrive because the client waits for the response on the other side. So server now triggers a software interrupt for a single packet. Similarly, the send flow is triggered on a single response. (In the pipeline mode, things become better because a client sends N requests in a row that can be handled at once by the interrupt handler.)
  3. Cloud providers make various optimizations to offload network processing from the CPU of your VM to their own custom CPUs on a server. However, the quality of this job differs vastly between each cloud provider. Basically, all of them proudly say that they have 100GB network capacity, but in practice, AWS has done a great job to make it accessible for applications like DF or Memcached. But even in AWS, you will be bottlenecked on throughput rather on bandwidth when running a benchmark like this.
  4. Benchmarking and tuning software to hardware is a very complicated job. One of the recent interesting works I've read about this was this blog post: https://talawah.io/blog/extreme-http-performance-tuning-one-point-two-million/
    just to show how hard it is and how much black magic it requires.

I will benchmark GCP and get back to you.

Thank you so much! I will use the results as a benchmark, continue testing with Docker and Kubernetes. Hoping that we can achieve similar results in Docker and Kubernetes (I guess Docker and Kubernetes will introduce overhead, but I am curious how many overhead it is).

It will take me a week or so. Hope it's ok.

It is totally fine. I am really appreciate your time and looking forward to your benchmarking in GCP.

@osevan
Copy link

osevan commented Jun 12, 2022

Thank you so much for your reply. I think your suggestion of testing with GCP instance is great. I will follow the steps and try to get ideal P99 latency. I will update the results here once I finish the tests.

Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea.

No. They are running on two different nodes in the same cluster, so same region (us-east1), and same zone (us-east1-d). Memtier jobs are also running on the same nodes (memtier job for keydb runs in the same node as keydb cache; memtier job for dragonfly runs in the same node as dragonfly cache).

And two memtier jobs are running sequentially to avoid saturating the network.

I do not know if --key-pattern=S:R was a deliberate choice. I used R:R but also used --distinct-client-seed. Otherwise, each client connection goes through exactly the same route...

I first runs pure set memtier job that does only set from key:1 to key:10000000 to test set operations only; then I run pure get memtier job which has 100% hit rate as all the keys in the key space are filled; then I run mixed-set-get memtier job which has 25% of set and 75% of get, and the hit rate is 100% as well (Check the Misses/sec metric, they are 0 which means hit rate is 100%). I think the --key-pattern=S:R only affects the third memtier job (mixed ops one). Because hit rate is 100%, I think the key-pattern doesn't affect our performance a lot. But I could try --distinct-client-seed option in my test in GCP instance.

Hey, dont use high performance apps inside Kubernetes or docker or aws or azure cloud.

Pay in skilled admin with performance and security focus and inest money in bare metal server!!!

Even my 9 years old notebook did more requests - laboratory notebook.

@romange romange self-assigned this Jun 12, 2022
@romange
Copy link
Collaborator

romange commented Jun 12, 2022

@drinkbeer preliminary results...
took 2 machines c2-60 as you did
Screenshot from 2022-06-12 22-42-53

fetched DF binary v0.2.0 from https://github.com/dragonflydb/dragonfly/releases/download/v0.2.0/dragonfly-x86_64.unstripped.tar.gz

ev@test-c1:~$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy

dev@test-c1:~$ uname -a
Linux test-c1 5.15.0-1008-gcp #12-Ubuntu SMP Wed Jun 1 21:29:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Discloser: it's my development image created via a packer pipeline defined here: https://github.com/romange/image-bakery

After scanning it now, I think that the only substantial change performance-wise that I did there - is turning off mitigations:
sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub

besides this - it's just convenience configs and utilities.

I run only the first SET benchmark - I copy-pasted your command:

DRAGONFLY_SERVER="10.142.0.18" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1395818.80          ---          ---         0.23205         0.23100         0.40700         0.55100    468742.82 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1395818.80         0.00         0.00         0.23205         0.23100         0.40700         0.55100    468742.82 

CPU usage of dragonfly:
image

Already much better than your result. Lets try improving it.
Rerun dragonfly with:
./dragonfly-x86_64 --logbuflevel=-1 --logtostderr --conn_use_incoming_cpu (note the last flag).

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1420922.15          ---          ---         0.22131         0.21500         0.38300         0.50300    477173.01 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1420922.15         0.00         0.00         0.22131         0.21500         0.38300         0.50300    477173.01 

but now the CPU usage is:
Screenshot from 2022-06-12 23-05-58

much lower than before (4580% vs 3360%). Also p99 pretty good for both cases.
Now lets increase the load a bit by increasing the number of clients in memtier to 30:

DRAGONFLY_SERVER="10.142.0.18" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=30 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1977947.42          ---          ---         0.46646         0.43900         1.07100         1.44700    664232.85 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1977947.42         0.00         0.00         0.46646         0.43900         1.07100         1.44700    664232.85 

p99.9 is too high IMHO. Lets take it down a notch: clients=10, threads=60:

============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1677309.01          ---          ---         0.35636         0.33500         0.71100         1.01500    563272.64 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1677309.01         0.00         0.00         0.35636         0.33500         0.71100         1.01500    563272.64 

pretty good - p99.9 1ms under with 1.6M QPS.

@romange
Copy link
Collaborator

romange commented Jun 12, 2022

Now I see you used 1 vCPU per core ratio. I use the regular 2 vCPU / core

@romange
Copy link
Collaborator

romange commented Jun 13, 2022

Step2: I took a plain Ubuntu image 22.04.
the only thing I did before running DF was invoking ulimit -n 20000 and then ./dragonfly-x86_64 --logtostderr

client (loadtest) instance: took n2-custom-80-40960 just to be on the safe side so that we won't have bottlenecks there.
I do not think it matters substantially.

DRAGONFLY_SERVER="10.142.0.20" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 100000 -d 300 --pipeline=1 --clients=15 --threads=50 --run-count=1 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0                                                                                                                           
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  45 secs]  0 threads:    75000000 ops, 2052410 (avg: 1651450) ops/sec, 673.08MB/sec (avg: 541.59MB/sec),  0.36 (avg:  0.45) msec latency

50        Threads
15        Connections per thread
100000    Requests per client


ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1788894.22          ---          ---         0.45224         0.41500         0.84700         1.51900    600745.16 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1788894.22         0.00         0.00         0.45224         0.41500         0.84700         1.51900    600745.16 

Seems that DF works ok on Ubuntu 22.04 out of the box. Next step - to check debian.

@romange
Copy link
Collaborator

romange commented Jun 13, 2022

Step 3: used BullsEye - projects/debian-cloud/global/images/debian-11-bullseye-v20220519
dragonfly: https://github.com/dragonflydb/dragonfly/releases/download/v0.2.0/dragonfly-x86_64.unstripped.tar.gz

Everything else like before. As you can see - I can confirm that Debian 11 is very bad performance-wise.
I suspect that it's because to reach better performance you need at least 5.11 but I am not sure. In any case Ubuntu provides a simple alternative if performance is what you need.

Jianbin, I think there are enough data points here to continue evaluating DF.

dev@test-c1:~$ DRAGONFLY_SERVER="10.142.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 100000 -d 300 --pipeline=1 --clients=15 --threads=50 --run-count=1 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0                                                                                                                           
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 184 secs]  0 threads:    75000000 ops,  435703 (avg:  406529) ops/sec, 142.89MB/sec (avg: 133.32MB/sec),  1.72 (avg:  1.84) msec latency

50        Threads
15        Connections per thread
100000    Requests per client


ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets       432159.18          ---          ---         1.84153         1.45500         7.32700        18.43100    145127.38 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals     432159.18         0.00         0.00         1.84153         1.45500         7.32700        18.43100    145127.38 

@romange romange removed their assignment Jun 13, 2022
@romange
Copy link
Collaborator

romange commented Jul 6, 2022

@drinkbeer hey man, did you have a chance to experiment with it?

@romange
Copy link
Collaborator

romange commented Aug 22, 2022

@drinkbeer I am closing. Feel free to reopen if you have any questions

@romange romange closed this as completed Aug 22, 2022
@drinkbeer
Copy link
Author

drinkbeer commented Aug 24, 2022

Thank you @romange , this issue can be closed. The next step, we will probably build Dragonfly in our staging environment, and benchmark it along with Envoy proxy (which is the proxy used with KeyDB in our Prod).

Here are some results of our benchmarking. The performance of Dragonfly looks great.

(updated at July 4th, 2022)

TL:DR

We deployed Dragonfly, KeyDB on c2-standard-60 machines (30 cores, 240 GB RAM) with Ubuntu 22.04 out of box. We used memtier from Redis community for load generating and benchmarking. We tested performance and resource usage with all Set operations, all Get operations, and Set-Get Mixed operations. The conclusion is Dragonfly can achieve much higher throughput (3.5X) as well as much lower latency (14%) than KeyDB. The resource usage of Dragonfly is also impressive. Dragonfly can fully & averagely utilize the CPU multiple cores, while KeyDB cannot have more than 16 server-threads, which means it cannot fully utilize the 30 cores CPU in the machine; Dragonfly also utilizes less memory (76.19%) than KeyDB. One thing to notice about KeyDB is adding more threads does not help with performance and resource utilization.

Dragonfly KeyDB (4 threads) KeyDB (16 threads) Dragonfly (Docker)
Set Latency P99.9 (ms) 0.52700 8.63900 21.37500 0.93500
Get Latency P99.9 (ms) 0.54300 1.60700 1.56700 0.59900
Set-Get Mixed Latency P99.9 (ms) 0.57500 4.35100 7.03900 0.60700
Throughput (ops/s) ~1.4Million ~400K ~307K ~1.25Million
Memory (GB) 3.68 4.83 6.25 3.86
CPU (number of Cores) 22.8 4.25 15.23 27.97

Setup

Hardward

  • c2-standard-60, 30 vCPU/30 cores, 240 GB memory
  • OS: Ubuntu 22.04 LTS
  • Dragonfly Version: v0.3.1
  • KeyDB Version: v6.2.2

Dragonfly

jchome@dragonfly-worker-ubuntu:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
jchome@dragonfly-worker-ubuntu:~$ uname -a
Linux dragonfly-worker-ubuntu 5.15.0-1010-gcp #15-Ubuntu SMP Fri Jun 10 11:30:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
jchome@dragonfly-worker-ubuntu:~$ sudo sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
jchome@dragonfly-worker-ubuntu:~$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX=
GRUB_CMDLINE_LINUX=" mitigations=off"

cd ~ && \
    wget https://github.com/dragonflydb/dragonfly/releases/download/v0.3.1/dragonfly-x86_64.unstripped.tar.gz && \
    tar -xvf dragonfly-x86_64.unstripped.tar.gz
jchome@dragonfly-worker-ubuntu:~$ ./dragonfly-x86_64 --logbuflevel=-1 --logtostderr --conn_use_incoming_cpu

KeyDB

jchome@keydb-worker-ubuntu:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
jchome@keydb-worker-ubuntu:~$ uname -a
Linux keydb-worker-ubuntu 5.15.0-1010-gcp #15-Ubuntu SMP Fri Jun 10 11:30:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
jchome@keydb-worker-ubuntu:~$ sudo sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
jchome@keydb-worker-ubuntu:~$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX=
GRUB_CMDLINE_LINUX=" mitigations=off"

sudo apt-get update
sudo apt-get install build-essential nasm autotools-dev autoconf libjemalloc-dev tcl tcl-dev uuid-dev libcurl4-openssl-dev git
git clone https://github.com/EQ-Alpha/KeyDB.git
cd KeyDB
make distclean
make test
make
sudo make install

jchome@keydb-worker-ubuntu:~/KeyDB/src$ ./keydb-server --server-threads 4 --maxmemory 188G --port 6379 --protected-mode no

jchome@keydb-worker-ubuntu:~/KeyDB/src$ ./keydb-server --server-threads 16 --maxmemory 188G --port 6379 --protected-mode no &
[1] 102412

Memtier

jchome@memtier-worker-ubuntu:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
jchome@memtier-worker-ubuntu:~$ uname -a
Linux memtier-worker-ubuntu 5.15.0-1010-gcp #15-Ubuntu SMP Fri Jun 10 11:30:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
jchome@memtier-worker-ubuntu:~$ sudo sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
jchome@memtier-worker-ubuntu:~$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX=
GRUB_CMDLINE_LINUX=" mitigations=off"
git clone https://github.com/RedisLabs/memtier_benchmark.git & cd memtier_benchmark/
https://github.com/RedisLabs/memtier_benchmark#building-and-installing

Dragonfly

Resource Usage

image

Set

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets      1423048.66          ---          ---         0.22339         0.22300         0.39100         0.52700    477887.13
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1423048.66         0.00         0.00         0.22339         0.22300         0.39100         0.52700    477887.13

Get

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets      1376543.56   1376543.56         0.00         0.22864         0.22300         0.39900         0.54300    455548.42
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1376543.56   1376543.56         0.00         0.22864         0.22300         0.39900         0.54300    455548.42

Mixed

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       328923.70          ---          ---         0.23965         0.23900         0.42300         0.57500    110458.90
Gets       986771.11    986771.11         0.00         0.23785         0.23100         0.42300         0.57500    326558.53
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1315694.82    986771.11         0.00         0.23830         0.23100         0.42300         0.57500    437017.42

Dragonfly (Docker)

Resource Usage

image

Set

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets      1305596.15          ---          ---         0.23675         0.23100         0.50300         0.93500    438444.31
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1305596.15         0.00         0.00         0.23675         0.23100         0.50300         0.93500    438444.31

Get

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets      1316826.64   1316826.64         0.00         0.23503         0.23100         0.41500         0.59900    435785.91
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1316826.64   1316826.64         0.00         0.23503         0.23100         0.41500         0.59900    435785.91

Mixed

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       313124.70          ---          ---         0.24759         0.24700         0.43100         0.60700    105153.29
Gets       939374.09    939374.09         0.00         0.24553         0.23900         0.43100         0.59900    310873.12
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1252498.79    939374.09         0.00         0.24604         0.23900         0.43100         0.60700    416026.41

KeyDB (4 threads)

Resource Usage

image

Set

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       352644.13          ---          ---         0.87533         0.64700         3.71100         8.63900    118424.68
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     352644.13         0.00         0.00         0.87533         0.64700         3.71100         8.63900    118424.68

Get

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       462801.59    462801.59         0.00         0.66506         0.54300         1.16700         1.60700    153157.91
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     462801.59    462801.59         0.00         0.66506         0.54300         1.16700         1.60700    153157.91

Mixed

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        94887.31          ---          ---         0.80495         0.63900         2.33500         4.41500     31864.98
Gets       284661.92    284661.92         0.00         0.79218         0.61500         2.28700         4.35100     94205.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     379549.23    284661.92         0.00         0.79537         0.61500         2.30300         4.35100    126069.98

KeyDB (16 threads)

Resource Usage

image

Set

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       262139.40          ---          ---         1.14587         0.86300         8.83100        21.37500     88031.46
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     262139.40         0.00         0.00         1.14587         0.86300         8.83100        21.37500     88031.46

Get

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       399851.71    399851.71         0.00         0.75064         0.74300         1.05500         1.56700    132325.50
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     399851.71    399851.71         0.00         0.75064         0.74300         1.05500         1.56700    132325.50

Mixed

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        76642.55          ---          ---         0.98195         0.78300         4.86300         7.13500     25738.04
Gets       229927.65    229927.65         0.00         0.97991         0.78300         4.79900         7.03900     76091.44
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     306570.19    229927.65         0.00         0.98042         0.78300         4.79900         7.03900    101829.48

@romange
Copy link
Collaborator

romange commented Aug 24, 2022

@drinkbeer These are fantastic results! It really makes me happy 🕺🏼 to see that Dragonfly provides value!
Jianbin, I would like to have a quick chat with you on discord or google meet. Will it be possible?

@drinkbeer
Copy link
Author

drinkbeer commented Aug 24, 2022

I would like to have a quick chat with you on discord or google meet. Will it be possible?

I would love to. I sent you an invitation through your LinkedIn. Let's chat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants