Undesirable reconnection behaviour #1344

Steve-Mcl · 2024-09-29T12:20:25Z

Current Behavior

While working on #1343 I noted we manually attempt to recover dashboard connection using fixed times (no jitter) and if that is unsuccessful it performs a full refresh.

In many settings (industrial/professional/signage/etc) this is undesirable. consider the following:

50 displays showing dashboards
Server maintenance is required - server is taken offline for 10 minutes
All 50 screens attempt a refresh and end up showing an ERR_NAME_NOT_RESOLVED
- "This site can’t be reached. xxxxxxxxx.com’s server IP address could not be found. Try: Checking the connection, Checking the proxy, firewall and DNS configuration. ERR_NAME_NOT_RESOLVED"
- These now require manual intervention

Expected Behavior

To use the WS built in capabilities for auto reconnect (e.g. making use of randomizationFactor for the exponential backoff jitter)

Steps To Reproduce

disconnect or power off the server and wait until the dashboard gives up reconnection attempts.

Environment

Dashboard version: 1.17.x (current)
Node-RED version: all
Node.js version: all
npm version: all
Platform/OS: all
Browser: all

Have you provided an initial effort estimate for this issue?

I have provided an initial effort estimate

The text was updated successfully, but these errors were encountered:

colinl · 2024-09-29T12:41:09Z

we manually attempt to recover dashboard connection using fixed times

Can you point to that code please?

Steve-Mcl · 2024-09-29T12:52:59Z

we manually attempt to recover dashboard connection using fixed times

Can you point to that code please?

node-red-dashboard/ui/src/main.mjs

Lines 192 to 193 in c1b9c20

    
               forcePageReload('Too many retries') 
        
           }

colinl · 2024-09-29T13:17:31Z

There is a subtlety to that code that is not entirely obvious. It actually uses a retry count to determine when to retry. This is important, particularly in the case of Android devices. If the dashboard is left in the background in Android then eventually the page is suspended. The result is that the connection is dropped. It is important that when the page is brought to the foreground that it retries quickly, even if it has been hours since the connection was dropped. Using a retry count, rather than a timer, achieves that.

Steve-Mcl · 2024-09-29T16:22:56Z

Yes, I understand that. My beef is with the full page refresh. I believe there is a halfway house (or combined logic) between the manual retry logic and the built in retry logic that socketio provides without doing a full page refresh. Yes I understand the parse error issue and the recovery attempt via refresh (and we may keep that) but blindly refreshing the page without the server even being alive can result in the situation I've described.

What I was thinking of including was a call for HEAD to the server. If the server is alive, permit the refresh. Otherwise keep trying.

Additionally, utilise the page lifecycle events to do a quicker recovery: https://developer.chrome.com/docs/web-platform/page-lifecycle-api#new_features_added_in_chrome_68

Mostly, I want to avoid hitting "ERR_NAME_NOT_RESOLVED" at which point every dashboard requires manual intervention.

colinl · 2024-09-29T20:50:52Z

Understood.
I don't know whether it is a further complication if a proxy is involved. For example I use Cloudflare Zero Trust on some systems. I am not sure what happens on those if the server is not available.

I want to avoid hitting "ERR_NAME_NOT_RESOLVED" at which point every dashboard requires manual intervention.

This is not necessarily true as at least some browsers automatically recover when I stop the server and then restart it, though I am not sure that is the same error. It recovers with Edge on a PC and Chrome on an old Android tablet. Even so, it agree it would be much better to avoid the situation in the first place.

Is whatever mechanism is used in D1 relevant here?

Steve-Mcl added bug Something isn't working needs-triage Needs looking at to decide what to do labels Sep 29, 2024

github-project-automation bot moved this to Backlog in Dashboard Backlog Sep 29, 2024

knolleary added this to Dashboard Backlog Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undesirable reconnection behaviour #1344

Undesirable reconnection behaviour #1344

Steve-Mcl commented Sep 29, 2024 •

edited

Loading

colinl commented Sep 29, 2024

Steve-Mcl commented Sep 29, 2024

colinl commented Sep 29, 2024

Steve-Mcl commented Sep 29, 2024 •

edited

Loading

colinl commented Sep 29, 2024

Undesirable reconnection behaviour #1344

Undesirable reconnection behaviour #1344

Comments

Steve-Mcl commented Sep 29, 2024 • edited Loading

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Have you provided an initial effort estimate for this issue?

colinl commented Sep 29, 2024

Steve-Mcl commented Sep 29, 2024

colinl commented Sep 29, 2024

Steve-Mcl commented Sep 29, 2024 • edited Loading

colinl commented Sep 29, 2024

Steve-Mcl commented Sep 29, 2024 •

edited

Loading

Steve-Mcl commented Sep 29, 2024 •

edited

Loading