Piece Range retrieval #36

xmcai2016 · 2023-09-15T16:22:08Z

Piece range retrieval
Use the pieceCID field of the deal proposal and make piece retrieval with the HTTP endpoint
Make range retrieval for the first 100 bytes and verify it is a valid CAR V1/V2 header
If it is a CAR V2 header, then check the data_size in the header to calculate how much padding has been used. In the next step, we only need to perform range retrieval between [data_offset, data_offset + data_length]
Make ranges retrieval for a random offset of that piece, up to 8MiB length
We check if retrieved data is all zeroes. Overtime, we will get a ratio of how much datacap is under utilized by padding data with zeroes
Try to find [varint, CID, block, varint, CID]. This is a valid IPLD data block. A valid IPLD block size is <= 4MiB so we should expect to get at least one IPLD data block within that range
Calculate the compression ratio of the block bytes using zstd compression
High compression ratio / low entropy means the data is highly repetitive (i.e. repeating "hello world")
Low compression ratio / high entropy means the data is noisy (i.e. random bytes, already compressed or encrypted)
Useful data usually does not have an extremely high or low entropy and the compression ratio can be compared to the original data source
The purpose of this retrieval type is to make sure the clients are not padding too much zeroes or are actually storing data that is not useful. Since the retrieval is lightweight, most of the retrieval testing will be using this kind

jcace · 2023-11-06T21:40:00Z

As described, this ticket is very specific to open dataset retrieval validation, not necessarily other datasets. It is not useful for the more general case of validating the contents of piece range retrieval tests.

xmcai2016 mentioned this issue Sep 15, 2023

Retrieval Bot V2 #17

Open

bajtos mentioned this issue Sep 27, 2023

SPARK CID sampling alpha space-meridian/roadmap#43

Closed

xmcai2016 added this to ActionArena Oct 31, 2023

github-project-automation bot moved this to 🍇 Backlog in ActionArena Oct 31, 2023

xmcai2016 assigned stephen-pl Oct 31, 2023

stephen-pl moved this from 🍇 Backlog to 🍰 Todo / Commited in ActionArena Oct 31, 2023

stephen-pl moved this from 🍰 Todo / Commited to 👨‍💻 In Progress in ActionArena Nov 2, 2023

stephen-pl moved this from 👨‍💻 In Progress to 🍰 Todo / Commited in ActionArena Nov 2, 2023

jcace closed this as not planned Won't fix, can't repro, duplicate, stale Nov 6, 2023

github-project-automation bot moved this from 🍰 Todo / Commited to 🚢 Done in ActionArena Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Piece Range retrieval #36

Piece Range retrieval #36

xmcai2016 commented Sep 15, 2023 •

edited

Loading

jcace commented Nov 6, 2023

Piece Range retrieval #36

Piece Range retrieval #36

Comments

xmcai2016 commented Sep 15, 2023 • edited Loading

jcace commented Nov 6, 2023

xmcai2016 commented Sep 15, 2023 •

edited

Loading