Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undesirable reconnection behaviour #1344

Open
Steve-Mcl opened this issue Sep 29, 2024 · 5 comments
Open

Undesirable reconnection behaviour #1344

Steve-Mcl opened this issue Sep 29, 2024 · 5 comments
Labels
bug Something isn't working needs-triage Needs looking at to decide what to do

Comments

@Steve-Mcl
Copy link
Contributor

Steve-Mcl commented Sep 29, 2024

Current Behavior

While working on #1343 I noted we manually attempt to recover dashboard connection using fixed times (no jitter) and if that is unsuccessful it performs a full refresh.

In many settings (industrial/professional/signage/etc) this is undesirable. consider the following:

  1. 50 displays showing dashboards
  2. Server maintenance is required - server is taken offline for 10 minutes
  3. All 50 screens attempt a refresh and end up showing an ERR_NAME_NOT_RESOLVED
    • "This site can’t be reached. xxxxxxxxx.com’s server IP address could not be found. Try: Checking the connection, Checking the proxy, firewall and DNS configuration. ERR_NAME_NOT_RESOLVED"
    • These now require manual intervention

Expected Behavior

To use the WS built in capabilities for auto reconnect (e.g. making use of randomizationFactor for the exponential backoff jitter)

Steps To Reproduce

disconnect or power off the server and wait until the dashboard gives up reconnection attempts.

Environment

  • Dashboard version: 1.17.x (current)
  • Node-RED version: all
  • Node.js version: all
  • npm version: all
  • Platform/OS: all
  • Browser: all

Have you provided an initial effort estimate for this issue?

I have provided an initial effort estimate

@Steve-Mcl Steve-Mcl added bug Something isn't working needs-triage Needs looking at to decide what to do labels Sep 29, 2024
@colinl
Copy link
Contributor

colinl commented Sep 29, 2024

we manually attempt to recover dashboard connection using fixed times

Can you point to that code please?

@Steve-Mcl
Copy link
Contributor Author

we manually attempt to recover dashboard connection using fixed times

Can you point to that code please?

forcePageReload('Too many retries')
}

@colinl
Copy link
Contributor

colinl commented Sep 29, 2024

There is a subtlety to that code that is not entirely obvious. It actually uses a retry count to determine when to retry. This is important, particularly in the case of Android devices. If the dashboard is left in the background in Android then eventually the page is suspended. The result is that the connection is dropped. It is important that when the page is brought to the foreground that it retries quickly, even if it has been hours since the connection was dropped. Using a retry count, rather than a timer, achieves that.

@Steve-Mcl
Copy link
Contributor Author

Steve-Mcl commented Sep 29, 2024

Yes, I understand that. My beef is with the full page refresh. I believe there is a halfway house (or combined logic) between the manual retry logic and the built in retry logic that socketio provides without doing a full page refresh. Yes I understand the parse error issue and the recovery attempt via refresh (and we may keep that) but blindly refreshing the page without the server even being alive can result in the situation I've described.

What I was thinking of including was a call for HEAD to the server. If the server is alive, permit the refresh. Otherwise keep trying.

Additionally, utilise the page lifecycle events to do a quicker recovery: https://developer.chrome.com/docs/web-platform/page-lifecycle-api#new_features_added_in_chrome_68

Mostly, I want to avoid hitting "ERR_NAME_NOT_RESOLVED" at which point every dashboard requires manual intervention.

@colinl
Copy link
Contributor

colinl commented Sep 29, 2024

Understood.
I don't know whether it is a further complication if a proxy is involved. For example I use Cloudflare Zero Trust on some systems. I am not sure what happens on those if the server is not available.

I want to avoid hitting "ERR_NAME_NOT_RESOLVED" at which point every dashboard requires manual intervention.

This is not necessarily true as at least some browsers automatically recover when I stop the server and then restart it, though I am not sure that is the same error. It recovers with Edge on a PC and Chrome on an old Android tablet. Even so, it agree it would be much better to avoid the situation in the first place.

Is whatever mechanism is used in D1 relevant here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Needs looking at to decide what to do
Projects
Status: Backlog
Development

No branches or pull requests

2 participants