testDocWorkerId_8484 not known or not available #1386

donbowman · 2025-01-19T15:29:13Z

Describe the current behavior

So I don't know what lead up to this, my grist instance had been up for about 20 days, and in continuous use via API access.

I found that grist would not load my document, giving "Error accessing document". It emitted this error to console each time:

2025-01-19 15:16:29.264 - debug: allowHost:  req=http://grist.XXXXXXX/log, origin=https://grist.XXXXXXX, actualUrl=grist.XXXXXXX, allowedUrl=grist.XXXXXXX
2025-01-19 15:16:29.265 - warn: client error stack=Error: worker testDocWorkerId_8484 not known or not available
    at ye._onServerMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:41669)
    at b (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928861)
    at g (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928653)
    at d (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:926606)
    at c.trigger (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928546)
    at fe._processReceivedMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:35361)
    at fe.onmessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:33983)
    at ue._onWSMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:30605), message=worker testDocWorkerId_8484 not known or not available, docId=xxxxxxxxhGabV1mwdq, page=https://grist.XXXXXXX/xxxxxxxx/websites/p/2, language=en-CA, platform=Linux x86_64, userAgent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36, org=XXXXX, [email protected], userId=5, altSessionId=5bdaggrDWSHpg7VDjtztuQ
2025-01-19 15:16:29.266 grist.XXXXXXX POST /o/XXXXX/api/log {"event":{"stack":"Error: worker testDocWorkerId_8484 not known or not available\n    at ye._onServerMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:41669)\n    at b (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928861)\n    at g (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928653)\n    at d (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:926606)\n    at c.trigger (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928546)\n    at fe._processReceivedMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:35361)\n    at fe.onmessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:33983)\n    at ue._onWSMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:30605)","message":"worker testDocWorkerId_8484 not known or not available"},"docId":"xxxxxxxxhGabV1mwdq","page":"https://grist.XXXXXXX/xxxxxxxx/websites/p/2","browser":{"language":"en-CA","platform":"Linux x86_64","userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36"}} 200 4.078 ms - -

My first thought was the sqlite db was corrupt, so i copied it aside and checked it, but it was ok.
I restarted the pod, and, grist is back. An error a bunch of lines back in the log said this w/ SIGKILL (which likely means out of memory?):

2025-01-19 10:49:51.653 - : Sandbox row count access=owners, docId=xxxxxxxxxhGabV1mwdq, rowCount=396149
2025-01-19 10:49:51.654 - : ActiveDoc _applyUserActions returning {actionNum: 0, retValues: [null], isModification: false} access=owners, docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:49:51.661 - : snapshot status access=owners, docId=xxxxxxxxxhGabV1mwdq, pushes=0, skippedPushes=0, errors=0, changes=560925, windowsStarted=0, windowsDone=0, lastChangeAt=2025-01-19T10:43:44.843Z, lastWindowStartedAt=undefined, lastWindowDoneAt=undefined, delay=null
2025-01-19 10:49:51.662 - : DocPluginManager.shutdown cleaning up 2 plugins
2025-01-19 10:49:51.663 - : Sandbox shutdown starting sandboxPid=193, flavor=gvisor, command=undefined, entryPoint=(default), docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:49:51.663 - : DocStorage shutdown success
2025-01-19 10:49:51.663 - : DocPluginManager.shutdown removing tmpDir /tmp/grist-tmp-21Ezd5j1pp3yCl
2025-01-19 10:49:52.664 - : Sandbox sending SIGKILL sandboxPid=193, flavor=gvisor, command=undefined, entryPoint=(default), docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:49:52.870 - : ActiveDoc shutdown complete access=owners, docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:49:53.029 - : Sandbox exited with code null signal SIGKILL sandboxPid=193, flavor=gvisor, command=undefined, entryPoint=(default), docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:50:27.121 - : Auth[GET]: grist.xxxxxxx.ca /docs/xxxxxxxxxhGabV1mwdq/tables/Insights/records customHostSession=, method=GET, host=grist.xxxxxxx.ca, path=/docs/xxxxxxxxxhGabV1mwdq/tables/Insights/records, org=xxxxxxx, [email protected], userId=5, altSessionId=xu2PTjYy9cSVzBZ64NWTmC
2025-01-19 10:50:27.130 - : DocManager.fetchDoc xxxxxxxxxhGabV1mwdq
2025-01-19 10:50:27.166 - : ActiveDoc loadDoc access=owners, userId=5, [email protected], docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:50:27.200 - : ActiveDoc Failed to load document Error: worker testDocWorkerId_8484 not known or not available

Steps to reproduce

I do not know how to reproduce this issue.

Describe the expected behavior

No response

Where have you encountered this bug?

On docs.getgrist.com
On a self-hosted instance

Instance information (when self-hosting only)

Grist instance:
- Version: 1.3.2
- URL (if it's OK for you to share it):
- Installation mode: docker/kubernetes/... kubernetes
- Architecture: single-worker/multi-workers single
Browser name, version and platforms on which you could reproduce the bug: chrome 132.0.6834.46 (but i think its backend)
Link to browser console log if relevant: NA, server logs provided
Link to server log if relevant: above

The text was updated successfully, but these errors were encountered:

fflorent · 2025-01-19T15:36:13Z

Hello!

My guess is that you use a Redis server (worker testDocWorkerId_8484 not known or not available points to this piece of code).

Can you check the content of workers-available (using the smembers workers-available command)?

donbowman · 2025-01-19T16:24:14Z

127.0.0.1:6379> smembers workers-available
(empty array)

fflorent · 2025-01-19T16:38:17Z

What returns this URL: https://<grist domain>/status?redis=1&db=1&docWorkerRegistered=1&ready=1 ?
(change https to http if necessary)

donbowman · 2025-01-19T16:43:34Z

https://grist.agilicus.ca/?redis=1&db=4&docWorkerRegistered=1&ready=1

it redirects to auth, even if i port-forward underneath the ingress to the service.

if i add my api key on it, it just shows html.

am i doing this right?

fflorent · 2025-01-19T16:47:35Z

Yeah, there missed /status, it is actually: https://grist.agilicus.ca/status?redis=1&db=1&docWorkerRegistered=1&ready=1

Everything seems to be working. Can't you open and work on a document?

If so, my guess is that it was an issue with Grist joining Redis, but the problem seems to be solved.

donbowman · 2025-01-19T16:48:32Z

Once I restarted grist all was good again. But, it was broken until i restarted it.

Grist server(home,docs,static) is alive (redis ok, docWorkerRegistered ok, ready ok).

once i fixed the /status

fflorent · 2025-01-19T16:53:40Z

It might relate to this issue (with no certainty though: in our scenario, we were using a multi-worker instance):
#831

dsagal · 2025-01-20T03:22:44Z

The SIGKILL may be a red herring -- it's about the process with the document's data engine shutting down, which is a sandboxed Python process, and it's often killed unceremoniously when a document is closed. The "doc worker" refers to the node server.

Redis could be a good guess, though I can't see why it would happen. Could Redis have been cleared or restarted while Grist was running?

Is there any stack trace after this line?

2025-01-19 10:50:27.200 - : ActiveDoc Failed to load document Error: worker testDocWorkerId_8484 not known or not available

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testDocWorkerId_8484 not known or not available #1386

testDocWorkerId_8484 not known or not available #1386

donbowman commented Jan 19, 2025

fflorent commented Jan 19, 2025 •

edited

Loading

donbowman commented Jan 19, 2025

fflorent commented Jan 19, 2025 •

edited

Loading

donbowman commented Jan 19, 2025

fflorent commented Jan 19, 2025 •

edited

Loading

donbowman commented Jan 19, 2025 •

edited

Loading

fflorent commented Jan 19, 2025

dsagal commented Jan 20, 2025

testDocWorkerId_8484 not known or not available #1386

testDocWorkerId_8484 not known or not available #1386

Comments

donbowman commented Jan 19, 2025

Describe the current behavior

Steps to reproduce

Describe the expected behavior

Where have you encountered this bug?

Instance information (when self-hosting only)

fflorent commented Jan 19, 2025 • edited Loading

donbowman commented Jan 19, 2025

fflorent commented Jan 19, 2025 • edited Loading

donbowman commented Jan 19, 2025

fflorent commented Jan 19, 2025 • edited Loading

donbowman commented Jan 19, 2025 • edited Loading

fflorent commented Jan 19, 2025

dsagal commented Jan 20, 2025

fflorent commented Jan 19, 2025 •

edited

Loading

fflorent commented Jan 19, 2025 •

edited

Loading

fflorent commented Jan 19, 2025 •

edited

Loading

donbowman commented Jan 19, 2025 •

edited

Loading