Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit how much data we retrieve for a given CID #16

Closed
4 tasks done
Tracked by #47
bajtos opened this issue Sep 5, 2023 · 1 comment
Closed
4 tasks done
Tracked by #47

Limit how much data we retrieve for a given CID #16

bajtos opened this issue Sep 5, 2023 · 1 comment
Assignees

Comments

@bajtos
Copy link
Member

bajtos commented Sep 5, 2023

At the moment, the SPARK node tries to retrieve all content of CID, regardless of the size. Some CIDs represent GBs of data.

IMO, this is a problem - we don't want Stations to use so much bandwidth.

It also creates a problem in spark-api, where we currently represent byte_length as a 32bit signed integer, which overflows at 2GB.

2023-09-05T15:54:19Z app[17814d5b527638] cdg [info]error: value "2753993443" is out of range for type integer
2023-09-05T15:54:19Z app[17814d5b527638] cdg [info]    at /app/node_modules/pg-pool/index.js:45:11
2023-09-05T15:54:19Z app[17814d5b527638] cdg [info]    at runMicrotasks (<anonymous>)
2023-09-05T15:54:19Z app[17814d5b527638] cdg [info]    at processTicksAndRejections (node:internal/process/task_queues:96:5)
2023-09-05T15:54:19Z app[17814d5b527638] cdg [info]    at async setRetrievalResult (file:///app/index.js:74:5)
2023-09-05T15:54:19Z app[17814d5b527638] cdg [info]    at async handler (file:///app/index.js:12:5) 

I am proposing to introduce a new retrieval error status - content too large.

Tasks

Preview Give feedback
@bajtos bajtos moved this to 📥 todo in Space Meridian Sep 5, 2023
@juliangruber
Copy link
Member

Spark clients should be allowed to abort retrieval if it is too large, without getting penalized. Then ideally they won't even report the result. However, Spark shouldn't have a problem with retrieval testing for large CIDs, a result is a result and is useful.

I think therefore we want the solution to be on the Station module side - it should abort the request - and for the Station module not to be penalized for not reporting in a large retrieval.

@bajtos bajtos moved this from 📥 todo to 🏗 in progress in Space Meridian Oct 3, 2023
@bajtos bajtos self-assigned this Oct 12, 2023
@bajtos bajtos moved this from 🏗 in progress to 🧊 icebox in Space Meridian Oct 23, 2023
@bajtos bajtos moved this from 🧊 icebox to 📥 todo in Space Meridian Nov 6, 2023
@bajtos bajtos closed this as completed Nov 29, 2023
@github-project-automation github-project-automation bot moved this from 📥 todo to ✅ done in Space Meridian Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ done
Development

No branches or pull requests

2 participants