Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dragonfly Unexpectedly Closes Client Connections #1763

Closed
nickamorim opened this issue Aug 29, 2023 · 7 comments
Closed

Dragonfly Unexpectedly Closes Client Connections #1763

nickamorim opened this issue Aug 29, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@nickamorim
Copy link

Describe the bug

When running a simple benchmark on Dragonfly via memtier_benchmark on Kubernetes, Dragonfly closes the client connection when the payload size is >225 bytes. I'm using the Memcached ASCII protocol.

Here are some relevant Dragonfly logs (--v=5):

I20230829 13:27:20.693219     9 listener_interface.cc:119] sock[14] Accepted 242.9.130.223:52876
I20230829 13:27:20.693253     9 dragonfly_listener.cc:288] CPU/NAPI for connection 14 is 6/0
I20230829 13:27:20.693289     9 epoll_proactor.cc:276] PRO[0] Fetched 1 cqes
I20230829 13:27:20.693297     9 listener_interface.cc:194] sock[14] Running connection
I20230829 13:27:20.693302     9 dragonfly_listener.cc:229] Opening connection 1
I20230829 13:27:20.693331     9 epoll_proactor.cc:276] PRO[0] Fetched 1 cqes
I20230829 13:27:20.693349     9 epoll_socket.cc:363] sock[14] Error system:103 on 242.9.130.223:52876
I20230829 13:27:20.693361     9 dragonfly_connection.cc:481] Before dispatch_fb.join()
I20230829 13:27:20.693368     9 dragonfly_connection.cc:483] After dispatch_fb.join()
I20230829 13:27:20.693375     9 dragonfly_connection.cc:354] Closed connection for peer 242.9.130.223:52876
I20230829 13:27:20.693382     9 listener_interface.cc:203] sock[14] After HandleRequests
I20230829 13:27:20.693387     9 dragonfly_listener.cc:256] Closing connection 1

To Reproduce
Steps to reproduce the behavior:

  1. Apply the following Dragonfly Deployment (note that this issue persists without setting proactor_threads and/or maxmemory):
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: dragonfly
  name: dragonfly
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dragonfly
  template:
    metadata:
      labels:
        app: dragonfly
    spec:
      containers:
        - name: dragonfly
          image: docker.dragonflydb.io/dragonflydb/dragonfly
          args:
            - --proactor_threads=4
            - --maxmemory=4gb
            - --memcached_port=11212
            - --logtostderr
            - --v=5
          resources:
            requests:
              memory: 4Gi
              cpu: 1
            limits:
              memory: 5Gi
              cpu: 4
          ports:
            - containerPort: 11212
          securityContext:
            privileged: true
            runAsNonRoot: false
  1. Apply a memtier_benchmark Deployment
---
apiVersion: batch/v1
kind: Job
metadata:
  name: memtier-dragonfly
spec:
  completions: 1
  template:
    metadata:
      labels:
        app: memtier
    spec:
      containers:
        - name: memtier
          image: redislabs/memtier_benchmark
          args:
            [
              "--server=242.9.130.222",
              "--port=11212",
              "--protocol=memcache_text",
              "--requests=10",
              "--data-size=225",
              "--clients=1",
              "--threads=1",
              "--run-count=1",
              "--hide-histogram",
              "--distinct-client-seed",
            ]
          resources:
            limits:
              cpu: 1
              memory: 256Mi
            requests:
              cpu: 512m
              memory: 128Mi
      restartPolicy: OnFailure
  1. Tail the logs of the benchmark and Dragonfly container

Expected behavior

The benchmark job should run to completion without any Connection reset by peer errors.

Environment (please complete the following information):

  • OS:
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=b15e582c1dbbf0e6f06747082754e5c5a71ea426
GOOGLE_CRASH_ID=Lakitu
VERSION=101
VERSION_ID=101
BUILD_ID=17162.210.48
  • Kernel: Linux dragonfly-67f948ffc8-2p6bj 5.15.107+ #1 SMP Thu Jun 29 09:19:06 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Containerized?: Kubernetes
  • Dragonfly Version:
root@dragonfly-67f948ffc8-2p6bj:/data# dragonfly --version
dragonfly v1.8.0-7c99d2d1111e2636556ad8c2afad43c396906d01
build time: 2023-08-08 19:58:04
@nickamorim nickamorim added the bug Something isn't working label Aug 29, 2023
@nickamorim
Copy link
Author

Further information: When I run the same benchmark but via Redis instead of Memcached, there isn't any issue.

@romange
Copy link
Collaborator

romange commented Aug 29, 2023

fixed by #1745

please use image ghcr.io/dragonflydb/dragonfly-weekly until the next version is released

@romange romange closed this as completed Aug 29, 2023
@nickamorim
Copy link
Author

Confirmed that using the weekly image worked - thanks

@nickamorim
Copy link
Author

This is still a problem with payloads over 100kB when using ghcr.io/dragonflydb/dragonfly-weekly:latest

"--protocol=memcache_text",
"--requests=100",
"--data-size=100000",
"--clients=1",
"--threads=1",
"--run-count=1",
"--hide-histogram",
"--distinct-client-seed",
error: response parsing failed.
connection dropped.
[RUN #1 0%,   1 secs]  0 threads:           0 ops,       0 (avg:       0) ops/sec, 0.00KB/sec (avg: 0.00KB/sec),  0.00 (avg:  -nan) msec latency

1         Threads
1         Connections per thread
100       Requests per client


ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals          0.00         0.00         0.00            -nan         0.00000         0.00000         0.00000         0.00

Here are the Dragonfly logs:

I20230829 17:24:00.620374     9 listener_interface.cc:119] sock[13] Accepted 242.9.130.218:59708
I20230829 17:24:00.620424     9 dragonfly_listener.cc:299] CPU/NAPI for connection 13 is 14/0
I20230829 17:24:00.620450     9 uring_socket.cc:78] sock[12] Accept [12]
I20230829 17:24:00.620483     9 listener_interface.cc:194] sock[13] Running connection
I20230829 17:24:00.620496     9 dragonfly_listener.cc:236] Opening connection 1
I20230829 17:24:00.620507     9 uring_socket.cc:338] sock[13] Recv [13] 0
I20230829 17:24:00.620529     9 uring_proactor.cc:625] PRO[0] wait_for_cqe 45954
I20230829 17:24:00.620570     9 uring_proactor.cc:641] PRO[0] Woke up after wait_for_cqe
I20230829 17:24:00.620599     9 uring_socket.cc:338] sock[13] Recv [13] 0
I20230829 17:24:00.620621     9 dragonfly_connection.cc:729] Growing io_buf to 2048
I20230829 17:24:00.620632     9 uring_socket.cc:338] sock[13] Recv [13] 0
I20230829 17:24:00.620710     9 dragonfly_connection.cc:729] Growing io_buf to 4096
I20230829 17:24:00.620724     9 uring_socket.cc:338] sock[13] Recv [13] 0
I20230829 17:24:00.620764     9 dragonfly_connection.cc:729] Growing io_buf to 8192
I20230829 17:24:00.620774     9 uring_socket.cc:338] sock[13] Recv [13] 0
I20230829 17:24:00.620821     9 dragonfly_connection.cc:729] Growing io_buf to 16384
I20230829 17:24:00.620829     9 uring_socket.cc:338] sock[13] Recv [13] 0
I20230829 17:24:00.620865     9 dragonfly_connection.cc:729] Growing io_buf to 32768
I20230829 17:24:00.620887     9 uring_socket.cc:338] sock[13] Recv [13] 0
I20230829 17:24:00.620930     9 dragonfly_connection.cc:729] Growing io_buf to 65536
I20230829 17:24:00.620936     9 uring_socket.cc:338] sock[13] Recv [13] 0
E20230829 17:24:00.620980     9 dragonfly_connection.cc:736] Request is too large, closing connection
I20230829 17:24:00.620986     9 dragonfly_connection.cc:485] Before dispatch_fb.join()
I20230829 17:24:00.620990     9 dragonfly_connection.cc:488] After dispatch_fb.join()
I20230829 17:24:00.620994     9 dragonfly_connection.cc:501] Error parser status 0
I20230829 17:24:00.621001     9 uring_socket.cc:162] sock[13] WriteSome [13] 2
I20230829 17:24:00.621045    10 uring_proactor.cc:641] PRO[1] Woke up after wait_for_cqe
I20230829 17:24:00.621050     9 uring_socket.cc:338] sock[13] Recv [13] 0
I20230829 17:24:00.621071    10 uring_proactor.cc:376] PRO[1] SchedulePeriodic 1
I20230829 17:24:00.621081     9 uring_socket.cc:338] sock[13] Recv [13] 0
I20230829 17:24:00.621086    10 uring_proactor.cc:625] PRO[1] wait_for_cqe 50481
I20230829 17:24:00.621095     9 uring_proactor.cc:625] PRO[0] wait_for_cqe 45979
I20230829 17:24:00.621158     9 uring_proactor.cc:641] PRO[0] Woke up after wait_for_cqe
I20230829 17:24:00.621170     9 dragonfly_connection.cc:667] Got event 8208
I20230829 17:24:00.621178     9 uring_socket.cc:363] sock[13] Error system:103 on 0.0.0.0:0
I20230829 17:24:00.621196     9 dragonfly_connection.cc:359] Closed connection for peer 242.9.130.218:59708
I20230829 17:24:00.621209     9 listener_interface.cc:203] sock[13] After HandleRequests
I20230829 17:24:00.621219     9 dragonfly_listener.cc:263] Closing connection 1

@romange
Copy link
Collaborator

romange commented Aug 29, 2023

yes, we limit the request size to 64KB.
E20230829 17:24:00.620980 9 dragonfly_connection.cc:736] Request is too large, closing connection

Is that not enough for you? Please describe your use-case.

@nickamorim
Copy link
Author

Is the 64Kb request size documented somewhere? If not, it may be a good idea since I couldn't find it anywhere.

The default maximum item size in Memcached is 1Mb so this has been the limit we've been using.

@romange
Copy link
Collaborator

romange commented Aug 30, 2023

Unfortunately, we have not done this yet. We have an open task for this:
dragonflydb/documentation#104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants