Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] queryable timeout and lost packets #277

Open
solderdev opened this issue Aug 6, 2024 · 0 comments
Open

[Bug] queryable timeout and lost packets #277

solderdev opened this issue Aug 6, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@solderdev
Copy link

solderdev commented Aug 6, 2024

Describe the bug

I'm running a microservice stack in docker compose using zenoh for communication. I noticed timeouts with 0.11.0 and tried the 1.0.0.5 alpha (same for 1.0.0.6a).
Still the same: about 2-3 times a day a query gets a timeout (communication a few times per second).

So I tried to build a minimal (not-)working example of messages being lost --> see attached code.

It consists of a docker-compose.yml running:

  • zenoh-router (needed because multicast does not work in "internal" networks)
  • query server
  • query client with multiple threads to increase load

The Dockerfile builds everything on execution.

Place everything in a folder and run with ./run.sh (often takes a few minutes before errors appear)

Feel free to adjust the number of threads in the docker-compose.yml depending how much your machine can handle.

Example log output
zenoh_router-1    | #!/bin/ash
zenoh_router-1    | cat /entrypoint.sh
zenoh_router-1    | echo " * Starting: /$BINARY $*"
zenoh_router-1    | exec /$BINARY $*
zenoh_router-1    |  * Starting: /zenohd -l tcp/0.0.0.0:7447 --no-multicast-scouting
zenoh_router-1    | 2024-08-06T13:55:46.749233Z  INFO main ThreadId(01) zenohd: zenohd v1.0.0-alpha.5 built with rustc 1.75.0 (82e1608df 2023-12-21)
zenoh_router-1    | 2024-08-06T13:55:46.749348Z  INFO main ThreadId(01) zenohd: Initial conf: {"access_control":{"default_permission":"deny","enabled":false,"policies":null,"rules":null,"subjects":null},"adminspace":{"enabled":true,"permissions":{"read":true,"write":false}},"aggregation":{"publishers":[],"subscribers":[]},"connect":{"endpoints":[],"exit_on_failure":null,"retry":null,"timeout_ms":null},"downsampling":[],"id":"b13053ae5ada57dcead7f7b0a4a1cb89","listen":{"endpoints":["tcp/0.0.0.0:7447"],"exit_on_failure":null,"retry":null,"timeout_ms":null},"metadata":null,"mode":"router","plugins":{"rest":{"__required__":true,"http_port":"8000"}},"plugins_loading":{"enabled":true,"search_dirs":null},"queries_default_timeout":null,"routing":{"peer":{"mode":null},"router":{"peers_failover_brokering":null}},"scouting":{"delay":null,"gossip":{"autoconnect":null,"enabled":null,"multihop":null},"multicast":{"address":null,"autoconnect":null,"enabled":false,"interface":null,"listen":null,"ttl":null},"timeout":null},"timestamping":{"drop_future_timestamp":null,"enabled":null},"transport":{"auth":{"pubkey":{"key_size":null,"known_keys_file":null,"private_key_file":null,"private_key_pem":null,"public_key_file":null,"public_key_pem":null},"usrpwd":{"dictionary_file":null,"password":null,"user":null}},"link":{"protocols":null,"rx":{"buffer_size":65535,"max_message_size":1073741824},"tls":{"client_auth":null,"client_certificate":null,"client_private_key":null,"root_ca_certificate":null,"server_certificate":null,"server_name_verification":null,"server_private_key":null},"tx":{"batch_size":65535,"batching":true,"keep_alive":4,"lease":10000,"queue":{"backoff":100,"congestion_control":{"wait_before_drop":1000},"size":{"background":1,"control":1,"data":4,"data_high":2,"data_low":2,"interactive_high":1,"interactive_low":1,"real_time":1}},"sequence_number_resolution":"32bit","threads":3},"unixpipe":{"file_access_mask":null}},"multicast":{"compression":{"enabled":false},"join_interval":2500,"max_sessions":1000,"qos":{"enabled":false}},"shared_memory":{"enabled":false},"unicast":{"accept_pending":100,"accept_timeout":10000,"compression":{"enabled":false},"lowlatency":false,"max_links":1,"max_sessions":1000,"qos":{"enabled":true}}}}
zenoh_router-1    | 2024-08-06T13:55:46.749515Z  INFO main ThreadId(01) zenoh::net::runtime: Using ZID: b13053ae5ada57dcead7f7b0a4a1cb89
zenoh_router-1    | 2024-08-06T13:55:46.749638Z  INFO main ThreadId(01) zenoh::api::loader: Loading required plugin "rest"
zenoh_router-1    | 2024-08-06T13:55:46.750048Z  INFO main ThreadId(01) zenoh::api::loader: Starting required plugin "rest"
zenoh_router-1    | 2024-08-06T13:55:46.752926Z  INFO main ThreadId(01) zenoh::api::loader: Successfully started plugin rest from "/libzenoh_plugin_rest.so"
zenoh_router-1    | 2024-08-06T13:55:46.752933Z  INFO main ThreadId(01) zenoh::api::loader: Finished loading plugins
zenoh_router-1    | 2024-08-06T13:55:46.760992Z  INFO main ThreadId(01) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/172.17.14.3:7447
node_queryable-1  | ROUTER: connecting to "zenoh_router"
node_queryable-1  | 2024-08-06T13:55:47.716274Z  INFO ThreadId(01) zenoh::net::runtime: Using ZID: a914b70b024de5313839e17b9583d3ed
node_queryable-1  | 2024-08-06T13:55:47.716663Z  INFO ThreadId(01) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/172.17.14.2:34319
node_queryable-1  | 2024-08-06 13:55:47,719 INFO    root         queryables ready
node_query-1      | ROUTER: connecting to "zenoh_router"
node_query-1      | 2024-08-06T13:55:48.812274Z  INFO ThreadId(01) zenoh::net::runtime: Using ZID: 21639b40534aaf81e77a8c3b85e8e8c6
node_query-1      | 2024-08-06T13:55:48.812609Z  INFO ThreadId(01) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/172.17.14.4:39087
node_query-1      | 2024-08-06 13:55:48,815 INFO    root         starting worker 1
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         starting worker 2
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         starting worker 3
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         starting worker 4
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         starting worker 5
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06 13:55:48,816 INFO    root         starting worker 6
node_query-1      | 2024-08-06 13:55:48,817 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06 13:55:48,817 INFO    root         starting worker 7
node_query-1      | 2024-08-06 13:55:48,817 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06 13:55:48,817 INFO    root         starting worker 8
node_query-1      | 2024-08-06 13:55:48,817 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06 13:55:48,817 INFO    root         starting worker 9
node_query-1      | 2024-08-06 13:55:48,817 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06 13:55:48,817 INFO    root         starting worker 10
node_query-1      | 2024-08-06 13:55:48,817 INFO    root         not sleeping between sends...
node_query-1      | 2024-08-06T13:57:28.192103Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187424 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.192259Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187420
node_query-1      | 2024-08-06T13:57:28.192282Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187421 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.192300Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187420
node_query-1      | 2024-08-06T13:57:28.192337Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187421
node_query-1      | 2024-08-06T13:57:28.192353Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187422 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.192368Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187421
node_query-1      | 2024-08-06T13:57:28.192406Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187422
node_query-1      | 2024-08-06T13:57:28.192420Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187423 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.192433Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187422
node_query-1      | 2024-08-06T13:57:28.192476Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187427 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.192519Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187424
node_query-1      | 2024-08-06T13:57:28.192533Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187425 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.192545Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187424
node_query-1      | 2024-08-06T13:57:28.192580Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187425
node_query-1      | 2024-08-06T13:57:28.192595Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187426 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.192608Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187425
node_query-1      | 2024-08-06 13:57:28,192 ERROR   root         request q6 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,193 ERROR   root         request q1 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,193 ERROR   root         request q4 received Error: "Timeout"
node_query-1      | 2024-08-06T13:57:28.195555Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187430 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.195638Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187427
node_query-1      | 2024-08-06T13:57:28.195650Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187428 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.195660Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187427
node_query-1      | 2024-08-06T13:57:28.195685Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187428
node_query-1      | 2024-08-06T13:57:28.195693Z  WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187429 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1      | 2024-08-06T13:57:28.195702Z  WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187428
node_query-1      | 2024-08-06 13:57:28,193 ERROR   root         request q3 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,193 ERROR   root         request q5 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,194 ERROR   root         request q9 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,194 ERROR   root         request q8 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,195 ERROR   root         request q2 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,196 ERROR   root         request q7 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,196 ERROR   root         request q10 received Error: "Timeout"
node_query-1      | 2024-08-06T13:57:28.820404Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187422 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.820427Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187424 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.820431Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187426 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.820434Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187429 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.820436Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187430 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.820461Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187421 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.820465Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187423 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.820468Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187425 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.820470Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187427 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.820473Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187428 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822699Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187421 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822716Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187422 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822719Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187423 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822721Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187424 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822723Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187425 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822725Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187426 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822727Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187427 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822729Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187428 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822731Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187429 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1      | 2024-08-06T13:57:28.822733Z  WARN  rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187430 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
^CGracefully stopping... (press Ctrl+C again to force)
[+] Stopping 3/3
[+] Stopping 3/3st-node_query-1      Stopping                                                                                                                                                                                          0.1s 
✔ Container ztest-node_query-1      Stopped                                                                                                                                                                                           1.4s 
✔ Container ztest-node_queryable-1  Stopped                                                                                                                                                                                           1.3s 
✔ Container ztest-zenoh_router-1    Stopped                                                                                                                                                                                           0.2s 
✔ Container ztest-zenoh_router-1    Stopped                                                                                                                                                                                           0.2s 
✔ Container ztest-node_query-1      Stopped                                                                                                                                                                                           1.4s 
✔ Container ztest-node_queryable-1  Stopped                                                                                                                                                                                           1.3s 
canceled
node_queryable-1  | 2024-08-06 13:57:32,035 INFO    root         STATUS: 191780 received
node_query-1      | 2024-08-06 13:57:28,192 ERROR   root         request q6 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,193 ERROR   root         request q1 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,193 ERROR   root         request q4 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,193 ERROR   root         request q3 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,193 ERROR   root         request q5 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,194 ERROR   root         request q9 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,194 ERROR   root         request q8 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,195 ERROR   root         request q2 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,196 ERROR   root         request q7 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:28,196 ERROR   root         request q10 received Error: "Timeout"
node_query-1      | 2024-08-06 13:57:31,038 INFO    root         STATUS: 191770 received, 191780 sent
node_query-1      | 2024-08-06 13:57:31,039 ERROR   root         STATUS: rx_count != tx_count: 191770 != 191780

To reproduce

(run example code with ./run.sh .. wait for errors .. press ctrl-c to stop)

Edit environment variables in docker-compose.yml for different effects:
e.g.: ZENOH_MODE=client / peer ..

(errors with both modes)
zenoh_tests.tar.gz

System info

  • zenoh-python 1.0.0a5, 1.0.0a6
  • zenoh-router: eclipse/zenoh:1.0.0-alpha.5 (alpha.6)
  • Ubuntu 22.04 host OS
  • Ubuntu 22.04 docker container
  • Docker version 27.1.1, build 6312585
  • Docker Compose version v2.29.1
@solderdev solderdev added the bug Something isn't working label Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant