You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running a microservice stack in docker compose using zenoh for communication. I noticed timeouts with 0.11.0 and tried the 1.0.0.5 alpha (same for 1.0.0.6a).
Still the same: about 2-3 times a day a query gets a timeout (communication a few times per second).
So I tried to build a minimal (not-)working example of messages being lost --> see attached code.
It consists of a docker-compose.yml running:
zenoh-router (needed because multicast does not work in "internal" networks)
query server
query client with multiple threads to increase load
The Dockerfile builds everything on execution.
Place everything in a folder and run with ./run.sh (often takes a few minutes before errors appear)
Feel free to adjust the number of threads in the docker-compose.yml depending how much your machine can handle.
Example log output
zenoh_router-1 | #!/bin/ash
zenoh_router-1 | cat /entrypoint.sh
zenoh_router-1 | echo " * Starting: /$BINARY $*"
zenoh_router-1 | exec /$BINARY $*
zenoh_router-1 | * Starting: /zenohd -l tcp/0.0.0.0:7447 --no-multicast-scouting
zenoh_router-1 | 2024-08-06T13:55:46.749233Z INFO main ThreadId(01) zenohd: zenohd v1.0.0-alpha.5 built with rustc 1.75.0 (82e1608df 2023-12-21)
zenoh_router-1 | 2024-08-06T13:55:46.749348Z INFO main ThreadId(01) zenohd: Initial conf: {"access_control":{"default_permission":"deny","enabled":false,"policies":null,"rules":null,"subjects":null},"adminspace":{"enabled":true,"permissions":{"read":true,"write":false}},"aggregation":{"publishers":[],"subscribers":[]},"connect":{"endpoints":[],"exit_on_failure":null,"retry":null,"timeout_ms":null},"downsampling":[],"id":"b13053ae5ada57dcead7f7b0a4a1cb89","listen":{"endpoints":["tcp/0.0.0.0:7447"],"exit_on_failure":null,"retry":null,"timeout_ms":null},"metadata":null,"mode":"router","plugins":{"rest":{"__required__":true,"http_port":"8000"}},"plugins_loading":{"enabled":true,"search_dirs":null},"queries_default_timeout":null,"routing":{"peer":{"mode":null},"router":{"peers_failover_brokering":null}},"scouting":{"delay":null,"gossip":{"autoconnect":null,"enabled":null,"multihop":null},"multicast":{"address":null,"autoconnect":null,"enabled":false,"interface":null,"listen":null,"ttl":null},"timeout":null},"timestamping":{"drop_future_timestamp":null,"enabled":null},"transport":{"auth":{"pubkey":{"key_size":null,"known_keys_file":null,"private_key_file":null,"private_key_pem":null,"public_key_file":null,"public_key_pem":null},"usrpwd":{"dictionary_file":null,"password":null,"user":null}},"link":{"protocols":null,"rx":{"buffer_size":65535,"max_message_size":1073741824},"tls":{"client_auth":null,"client_certificate":null,"client_private_key":null,"root_ca_certificate":null,"server_certificate":null,"server_name_verification":null,"server_private_key":null},"tx":{"batch_size":65535,"batching":true,"keep_alive":4,"lease":10000,"queue":{"backoff":100,"congestion_control":{"wait_before_drop":1000},"size":{"background":1,"control":1,"data":4,"data_high":2,"data_low":2,"interactive_high":1,"interactive_low":1,"real_time":1}},"sequence_number_resolution":"32bit","threads":3},"unixpipe":{"file_access_mask":null}},"multicast":{"compression":{"enabled":false},"join_interval":2500,"max_sessions":1000,"qos":{"enabled":false}},"shared_memory":{"enabled":false},"unicast":{"accept_pending":100,"accept_timeout":10000,"compression":{"enabled":false},"lowlatency":false,"max_links":1,"max_sessions":1000,"qos":{"enabled":true}}}}
zenoh_router-1 | 2024-08-06T13:55:46.749515Z INFO main ThreadId(01) zenoh::net::runtime: Using ZID: b13053ae5ada57dcead7f7b0a4a1cb89
zenoh_router-1 | 2024-08-06T13:55:46.749638Z INFO main ThreadId(01) zenoh::api::loader: Loading required plugin "rest"
zenoh_router-1 | 2024-08-06T13:55:46.750048Z INFO main ThreadId(01) zenoh::api::loader: Starting required plugin "rest"
zenoh_router-1 | 2024-08-06T13:55:46.752926Z INFO main ThreadId(01) zenoh::api::loader: Successfully started plugin rest from "/libzenoh_plugin_rest.so"
zenoh_router-1 | 2024-08-06T13:55:46.752933Z INFO main ThreadId(01) zenoh::api::loader: Finished loading plugins
zenoh_router-1 | 2024-08-06T13:55:46.760992Z INFO main ThreadId(01) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/172.17.14.3:7447
node_queryable-1 | ROUTER: connecting to "zenoh_router"
node_queryable-1 | 2024-08-06T13:55:47.716274Z INFO ThreadId(01) zenoh::net::runtime: Using ZID: a914b70b024de5313839e17b9583d3ed
node_queryable-1 | 2024-08-06T13:55:47.716663Z INFO ThreadId(01) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/172.17.14.2:34319
node_queryable-1 | 2024-08-06 13:55:47,719 INFO root queryables ready
node_query-1 | ROUTER: connecting to "zenoh_router"
node_query-1 | 2024-08-06T13:55:48.812274Z INFO ThreadId(01) zenoh::net::runtime: Using ZID: 21639b40534aaf81e77a8c3b85e8e8c6
node_query-1 | 2024-08-06T13:55:48.812609Z INFO ThreadId(01) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/172.17.14.4:39087
node_query-1 | 2024-08-06 13:55:48,815 INFO root starting worker 1
node_query-1 | 2024-08-06 13:55:48,816 INFO root not sleeping between sends...
node_query-1 | 2024-08-06 13:55:48,816 INFO root starting worker 2
node_query-1 | 2024-08-06 13:55:48,816 INFO root not sleeping between sends...
node_query-1 | 2024-08-06 13:55:48,816 INFO root starting worker 3
node_query-1 | 2024-08-06 13:55:48,816 INFO root not sleeping between sends...
node_query-1 | 2024-08-06 13:55:48,816 INFO root starting worker 4
node_query-1 | 2024-08-06 13:55:48,816 INFO root not sleeping between sends...
node_query-1 | 2024-08-06 13:55:48,816 INFO root starting worker 5
node_query-1 | 2024-08-06 13:55:48,816 INFO root not sleeping between sends...
node_query-1 | 2024-08-06 13:55:48,816 INFO root starting worker 6
node_query-1 | 2024-08-06 13:55:48,817 INFO root not sleeping between sends...
node_query-1 | 2024-08-06 13:55:48,817 INFO root starting worker 7
node_query-1 | 2024-08-06 13:55:48,817 INFO root not sleeping between sends...
node_query-1 | 2024-08-06 13:55:48,817 INFO root starting worker 8
node_query-1 | 2024-08-06 13:55:48,817 INFO root not sleeping between sends...
node_query-1 | 2024-08-06 13:55:48,817 INFO root starting worker 9
node_query-1 | 2024-08-06 13:55:48,817 INFO root not sleeping between sends...
node_query-1 | 2024-08-06 13:55:48,817 INFO root starting worker 10
node_query-1 | 2024-08-06 13:55:48,817 INFO root not sleeping between sends...
node_query-1 | 2024-08-06T13:57:28.192103Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187424 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.192259Z WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187420
node_query-1 | 2024-08-06T13:57:28.192282Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187421 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.192300Z WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187420
node_query-1 | 2024-08-06T13:57:28.192337Z WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187421
node_query-1 | 2024-08-06T13:57:28.192353Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187422 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.192368Z WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187421
node_query-1 | 2024-08-06T13:57:28.192406Z WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187422
node_query-1 | 2024-08-06T13:57:28.192420Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187423 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.192433Z WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187422
node_query-1 | 2024-08-06T13:57:28.192476Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187427 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.192519Z WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187424
node_query-1 | 2024-08-06T13:57:28.192533Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187425 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.192545Z WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187424
node_query-1 | 2024-08-06T13:57:28.192580Z WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187425
node_query-1 | 2024-08-06T13:57:28.192595Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187426 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.192608Z WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187425
node_query-1 | 2024-08-06 13:57:28,192 ERROR root request q6 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,193 ERROR root request q1 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,193 ERROR root request q4 received Error: "Timeout"
node_query-1 | 2024-08-06T13:57:28.195555Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187430 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.195638Z WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187427
node_query-1 | 2024-08-06T13:57:28.195650Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187428 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.195660Z WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187427
node_query-1 | 2024-08-06T13:57:28.195685Z WARN net-0 ThreadId(03) zenoh::api::session: Received ReplyData for unknown Query: 187428
node_query-1 | 2024-08-06T13:57:28.195693Z WARN net-0 ThreadId(03) zenoh::net::routing::dispatcher::queries: Didn't receive final reply Face{0, 21639b40534aaf81e77a8c3b85e8e8c6}:187429 from Face{2, a914b70b024de5313839e17b9583d3ed}: Timeout(1s)!
node_query-1 | 2024-08-06T13:57:28.195702Z WARN net-0 ThreadId(03) zenoh::api::session: Received ResponseFinal for unknown Request: 187428
node_query-1 | 2024-08-06 13:57:28,193 ERROR root request q3 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,193 ERROR root request q5 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,194 ERROR root request q9 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,194 ERROR root request q8 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,195 ERROR root request q2 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,196 ERROR root request q7 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,196 ERROR root request q10 received Error: "Timeout"
node_query-1 | 2024-08-06T13:57:28.820404Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187422 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.820427Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187424 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.820431Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187426 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.820434Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187429 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.820436Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187430 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.820461Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187421 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.820465Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187423 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.820468Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187425 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.820470Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187427 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.820473Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route reply Face{2, a914b70b024de5313839e17b9583d3ed}:187428 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822699Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187421 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822716Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187422 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822719Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187423 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822721Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187424 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822723Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187425 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822725Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187426 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822727Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187427 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822729Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187428 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822731Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187429 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
node_query-1 | 2024-08-06T13:57:28.822733Z WARN rx-1 ThreadId(08) zenoh::net::routing::dispatcher::queries: Route final reply Face{2, a914b70b024de5313839e17b9583d3ed}:187430 from Face{2, a914b70b024de5313839e17b9583d3ed}: Query not found!
^CGracefully stopping... (press Ctrl+C again to force)
[+] Stopping 3/3
[+] Stopping 3/3st-node_query-1 Stopping 0.1s
✔ Container ztest-node_query-1 Stopped 1.4s
✔ Container ztest-node_queryable-1 Stopped 1.3s
✔ Container ztest-zenoh_router-1 Stopped 0.2s
✔ Container ztest-zenoh_router-1 Stopped 0.2s
✔ Container ztest-node_query-1 Stopped 1.4s
✔ Container ztest-node_queryable-1 Stopped 1.3s
canceled
node_queryable-1 | 2024-08-06 13:57:32,035 INFO root STATUS: 191780 received
node_query-1 | 2024-08-06 13:57:28,192 ERROR root request q6 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,193 ERROR root request q1 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,193 ERROR root request q4 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,193 ERROR root request q3 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,193 ERROR root request q5 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,194 ERROR root request q9 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,194 ERROR root request q8 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,195 ERROR root request q2 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,196 ERROR root request q7 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:28,196 ERROR root request q10 received Error: "Timeout"
node_query-1 | 2024-08-06 13:57:31,038 INFO root STATUS: 191770 received, 191780 sent
node_query-1 | 2024-08-06 13:57:31,039 ERROR root STATUS: rx_count != tx_count: 191770 != 191780
To reproduce
(run example code with ./run.sh .. wait for errors .. press ctrl-c to stop)
Edit environment variables in docker-compose.yml for different effects:
e.g.: ZENOH_MODE=client / peer ..
Describe the bug
I'm running a microservice stack in docker compose using zenoh for communication. I noticed timeouts with 0.11.0 and tried the 1.0.0.5 alpha (same for 1.0.0.6a).
Still the same: about 2-3 times a day a query gets a timeout (communication a few times per second).
So I tried to build a minimal (not-)working example of messages being lost --> see attached code.
It consists of a docker-compose.yml running:
The Dockerfile builds everything on execution.
Place everything in a folder and run with
./run.sh
(often takes a few minutes before errors appear)Feel free to adjust the number of threads in the docker-compose.yml depending how much your machine can handle.
Example log output
To reproduce
(run example code with
./run.sh
.. wait for errors .. press ctrl-c to stop)Edit environment variables in docker-compose.yml for different effects:
e.g.: ZENOH_MODE=client / peer ..
(errors with both modes)
zenoh_tests.tar.gz
System info
The text was updated successfully, but these errors were encountered: