Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to change timeout for message queue in plug? #1021

Closed
justyns opened this issue Aug 5, 2024 · 4 comments
Closed

How to change timeout for message queue in plug? #1021

justyns opened this issue Aug 5, 2024 · 4 comments
Assignees

Comments

@justyns
Copy link
Contributor

justyns commented Aug 5, 2024

I'm starting to change the indexing logic in the silverbullet-ai plug to basically listen for page:index events (like it is currently) and then send those pages to a separate message queue that can be processed in the background to avoid blocking the main SB indexing process.

This is mostly working, but I get a lot of these errors:

[mq] Message ack timed out, requeueing {
  id: "1722840902915-000357",
  queue: "aiSummaryQueue",
  body: "otherstuff-190",
  ts: 1722840903185
}

This is expected because it unfortunately takes longer than 20-30s for some notes when generating and indexing summaries. Is there a way to increase this timeout per queue?

This is in the plug yaml:

  processEmbeddingsQueue:
    path: src/embeddings.ts:processEmbeddingsQueue
    mqSubscriptions:
    - queue: aiEmbeddingsQueue
      batchSize: 1
      autoAck: true
  processSummaryQueue:
    path: src/embeddings.ts:processSummaryQueue
    mqSubscriptions:
    - queue: aiSummaryQueue
      batchSize: 1
      autoAck: true

I looked into lib/data/mq.datastore.ts and requeueTimeouts. It handles all messages, not per queue.

I'm happy to try and add this functionality, but I'm not exactly sure what the best route would be. One problem is a timeout parameter would have to make its way from the plug manifest to requeueTimeouts. Another problem is what if multiple plugs subscribe to the same queue (is that even a good idea)?

And an alternative would be - don't time a message out if a function is still running. Maybe this could be handled by a new mq.updateTime(msgId) or something to keep the message in the processing queue longer than the timeout value?

@zefhemel
Copy link
Collaborator

zefhemel commented Aug 5, 2024

It's funny that you're looking into this, because I actually have it on my list to ask you to exactly that: take expensive indexers off the main indexing thread 👍

Let me look at a way to make those deadlines configurable.

@zefhemel
Copy link
Collaborator

I now added a pollInterval option which doubles as a timeout, can you try if that works for you?

For example:

  batchSize: 1
  autoAck: true
  pollInterval: 30000

@github-project-automation github-project-automation bot moved this from On Hold to Done in Silver Bullet Roadmap Aug 18, 2024
@justyns
Copy link
Contributor Author

justyns commented Aug 21, 2024

I'm not sure if this is working or not.. I'm still getting a lot of these timeout errors.

With this in the plug yaml:

  processEmbeddingsQueue:
    path: src/embeddings.ts:processEmbeddingsQueue
    mqSubscriptions:
    - queue: aiEmbeddingsQueue
      batchSize: 1
      autoAck: true
      pollInterval: 60000
  processSummaryQueue:
    path: src/embeddings.ts:processSummaryQueue
    mqSubscriptions:
    - queue: aiSummaryQueue
      batchSize: 1
      autoAck: true
      # should be 600 seconds / 10 minutes
      pollInterval: 600000

Part of the server log:

Requested file test page 8-96.md
Writing file test page 8-96.md
AI: Generating and indexing embeddings for file test page 8-96
AI: Generating and indexing summary for test page 8-96
AI: Indexed 2 embedding objects for page test page 8-96 in 0.185 seconds
AI: Embeddings queue stats: {"queued":0,"processing":1,"dlq":0}
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
[mq] Message ack timed out, requeueing {
  id: "1724220697698-000006",
  queue: "aiSummaryQueue",
  body: "test page 8-96",
  ts: 1724220697698
}
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
Requested file test page 8-96.md
AI: Indexed summary for page test page 8-96 in 68.144 seconds
AI: Summary queue stats: {"queued":1,"processing":0,"dlq":0}
AI: Generating and indexing summary for test page 8-96
AI: Indexed summary for page test page 8-96 in 0.007 seconds
AI: Summary queue stats: {"queued":0,"processing":1,"dlq":0}
Requested file test page 8-96.md

It took 68 seconds to index, and then ended up re-indexing immediately after. (the second time is faster due to caching).

It's weird though because only the first attempt seems like it times out and gets requeued? If I reindex the whole space, I get a bunch of those errors after 10 seconds or so, and then none for a while.

@justyns
Copy link
Contributor Author

justyns commented Aug 21, 2024

Also I'm wondering if I should even be trying to generate a summary of a note during indexing. Do you think it'd be better to have it as a once-a-day type of thing? Or more often, but only if the file hasn't been modified in the last 30 minutes?

As it is right now, the embeddings are pretty okay to be indexed immediately because each paragraph/line gets cached, and only new/changed lines need to be regenerated. But for the summary index, it generates a new summary no matter what changed in the note, and takes longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants