Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testDocWorkerId_8484 not known or not available #1386

Open
1 of 2 tasks
donbowman opened this issue Jan 19, 2025 · 8 comments
Open
1 of 2 tasks

testDocWorkerId_8484 not known or not available #1386

donbowman opened this issue Jan 19, 2025 · 8 comments

Comments

@donbowman
Copy link

Describe the current behavior

So I don't know what lead up to this, my grist instance had been up for about 20 days, and in continuous use via API access.

I found that grist would not load my document, giving "Error accessing document". It emitted this error to console each time:

2025-01-19 15:16:29.264 - debug: allowHost:  req=http://grist.XXXXXXX/log, origin=https://grist.XXXXXXX, actualUrl=grist.XXXXXXX, allowedUrl=grist.XXXXXXX
2025-01-19 15:16:29.265 - warn: client error stack=Error: worker testDocWorkerId_8484 not known or not available
    at ye._onServerMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:41669)
    at b (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928861)
    at g (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928653)
    at d (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:926606)
    at c.trigger (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928546)
    at fe._processReceivedMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:35361)
    at fe.onmessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:33983)
    at ue._onWSMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:30605), message=worker testDocWorkerId_8484 not known or not available, docId=xxxxxxxxhGabV1mwdq, page=https://grist.XXXXXXX/xxxxxxxx/websites/p/2, language=en-CA, platform=Linux x86_64, userAgent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36, org=XXXXX, [email protected], userId=5, altSessionId=5bdaggrDWSHpg7VDjtztuQ
2025-01-19 15:16:29.266 grist.XXXXXXX POST /o/XXXXX/api/log {"event":{"stack":"Error: worker testDocWorkerId_8484 not known or not available\n    at ye._onServerMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:41669)\n    at b (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928861)\n    at g (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928653)\n    at d (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:926606)\n    at c.trigger (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:928546)\n    at fe._processReceivedMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:35361)\n    at fe.onmessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:33983)\n    at ue._onWSMessage (https://grist.XXXXXXX/v/unknown/main.bundle.js:2:30605)","message":"worker testDocWorkerId_8484 not known or not available"},"docId":"xxxxxxxxhGabV1mwdq","page":"https://grist.XXXXXXX/xxxxxxxx/websites/p/2","browser":{"language":"en-CA","platform":"Linux x86_64","userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36"}} 200 4.078 ms - -

My first thought was the sqlite db was corrupt, so i copied it aside and checked it, but it was ok.
I restarted the pod, and, grist is back. An error a bunch of lines back in the log said this w/ SIGKILL (which likely means out of memory?):

2025-01-19 10:49:51.653 - : Sandbox row count access=owners, docId=xxxxxxxxxhGabV1mwdq, rowCount=396149
2025-01-19 10:49:51.654 - : ActiveDoc _applyUserActions returning {actionNum: 0, retValues: [null], isModification: false} access=owners, docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:49:51.661 - : snapshot status access=owners, docId=xxxxxxxxxhGabV1mwdq, pushes=0, skippedPushes=0, errors=0, changes=560925, windowsStarted=0, windowsDone=0, lastChangeAt=2025-01-19T10:43:44.843Z, lastWindowStartedAt=undefined, lastWindowDoneAt=undefined, delay=null
2025-01-19 10:49:51.662 - : DocPluginManager.shutdown cleaning up 2 plugins
2025-01-19 10:49:51.663 - : Sandbox shutdown starting sandboxPid=193, flavor=gvisor, command=undefined, entryPoint=(default), docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:49:51.663 - : DocStorage shutdown success
2025-01-19 10:49:51.663 - : DocPluginManager.shutdown removing tmpDir /tmp/grist-tmp-21Ezd5j1pp3yCl
2025-01-19 10:49:52.664 - : Sandbox sending SIGKILL sandboxPid=193, flavor=gvisor, command=undefined, entryPoint=(default), docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:49:52.870 - : ActiveDoc shutdown complete access=owners, docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:49:53.029 - : Sandbox exited with code null signal SIGKILL sandboxPid=193, flavor=gvisor, command=undefined, entryPoint=(default), docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:50:27.121 - : Auth[GET]: grist.xxxxxxx.ca /docs/xxxxxxxxxhGabV1mwdq/tables/Insights/records customHostSession=, method=GET, host=grist.xxxxxxx.ca, path=/docs/xxxxxxxxxhGabV1mwdq/tables/Insights/records, org=xxxxxxx, [email protected], userId=5, altSessionId=xu2PTjYy9cSVzBZ64NWTmC
2025-01-19 10:50:27.130 - : DocManager.fetchDoc xxxxxxxxxhGabV1mwdq
2025-01-19 10:50:27.166 - : ActiveDoc loadDoc access=owners, userId=5, [email protected], docId=xxxxxxxxxhGabV1mwdq
2025-01-19 10:50:27.200 - : ActiveDoc Failed to load document Error: worker testDocWorkerId_8484 not known or not available

Steps to reproduce

I do not know how to reproduce this issue.

Describe the expected behavior

No response

Where have you encountered this bug?

Instance information (when self-hosting only)

  • Grist instance:

    • Version: 1.3.2
    • URL (if it's OK for you to share it):
    • Installation mode: docker/kubernetes/... kubernetes
    • Architecture: single-worker/multi-workers single
  • Browser name, version and platforms on which you could reproduce the bug: chrome 132.0.6834.46 (but i think its backend)

  • Link to browser console log if relevant: NA, server logs provided

  • Link to server log if relevant: above

@fflorent
Copy link
Collaborator

fflorent commented Jan 19, 2025

Hello!

My guess is that you use a Redis server (worker testDocWorkerId_8484 not known or not available points to this piece of code).

Can you check the content of workers-available (using the smembers workers-available command)?

@donbowman
Copy link
Author

127.0.0.1:6379> smembers workers-available
(empty array)

@fflorent
Copy link
Collaborator

fflorent commented Jan 19, 2025

What returns this URL: https://<grist domain>/status?redis=1&db=1&docWorkerRegistered=1&ready=1 ?
(change https to http if necessary)

@donbowman
Copy link
Author

https://grist.agilicus.ca/?redis=1&db=4&docWorkerRegistered=1&ready=1

it redirects to auth, even if i port-forward underneath the ingress to the service.

if i add my api key on it, it just shows html.

am i doing this right?

@fflorent
Copy link
Collaborator

fflorent commented Jan 19, 2025

Yeah, there missed /status, it is actually: https://grist.agilicus.ca/status?redis=1&db=1&docWorkerRegistered=1&ready=1

Everything seems to be working. Can't you open and work on a document?

If so, my guess is that it was an issue with Grist joining Redis, but the problem seems to be solved.

@donbowman
Copy link
Author

donbowman commented Jan 19, 2025

Once I restarted grist all was good again. But, it was broken until i restarted it.

Grist server(home,docs,static) is alive (redis ok, docWorkerRegistered ok, ready ok).

once i fixed the /status

@fflorent
Copy link
Collaborator

It might relate to this issue (with no certainty though: in our scenario, we were using a multi-worker instance):
#831

@dsagal
Copy link
Member

dsagal commented Jan 20, 2025

The SIGKILL may be a red herring -- it's about the process with the document's data engine shutting down, which is a sandboxed Python process, and it's often killed unceremoniously when a document is closed. The "doc worker" refers to the node server.

Redis could be a good guess, though I can't see why it would happen. Could Redis have been cleared or restarted while Grist was running?

Is there any stack trace after this line?

2025-01-19 10:50:27.200 - : ActiveDoc Failed to load document Error: worker testDocWorkerId_8484 not known or not available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants