You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Piece range retrieval
Use the pieceCID field of the deal proposal and make piece retrieval with the HTTP endpoint
Make range retrieval for the first 100 bytes and verify it is a valid CAR V1/V2 header
If it is a CAR V2 header, then check the data_size in the header to calculate how much padding has been used. In the next step, we only need to perform range retrieval between [data_offset, data_offset + data_length]
Make ranges retrieval for a random offset of that piece, up to 8MiB length
We check if retrieved data is all zeroes. Overtime, we will get a ratio of how much datacap is under utilized by padding data with zeroes
Try to find [varint, CID, block, varint, CID]. This is a valid IPLD data block. A valid IPLD block size is <= 4MiB so we should expect to get at least one IPLD data block within that range
Calculate the compression ratio of the block bytes using zstd compression
High compression ratio / low entropy means the data is highly repetitive (i.e. repeating "hello world")
Low compression ratio / high entropy means the data is noisy (i.e. random bytes, already compressed or encrypted)
Useful data usually does not have an extremely high or low entropy and the compression ratio can be compared to the original data source
The purpose of this retrieval type is to make sure the clients are not padding too much zeroes or are actually storing data that is not useful. Since the retrieval is lightweight, most of the retrieval testing will be using this kind
The text was updated successfully, but these errors were encountered:
As described, this ticket is very specific to open dataset retrieval validation, not necessarily other datasets. It is not useful for the more general case of validating the contents of piece range retrieval tests.
Piece range retrieval
Use the pieceCID field of the deal proposal and make piece retrieval with the HTTP endpoint
Make range retrieval for the first 100 bytes and verify it is a valid CAR V1/V2 header
If it is a CAR V2 header, then check the data_size in the header to calculate how much padding has been used. In the next step, we only need to perform range retrieval between [data_offset, data_offset + data_length]
Make ranges retrieval for a random offset of that piece, up to 8MiB length
We check if retrieved data is all zeroes. Overtime, we will get a ratio of how much datacap is under utilized by padding data with zeroes
Try to find [varint, CID, block, varint, CID]. This is a valid IPLD data block. A valid IPLD block size is <= 4MiB so we should expect to get at least one IPLD data block within that range
Calculate the compression ratio of the block bytes using zstd compression
High compression ratio / low entropy means the data is highly repetitive (i.e. repeating "hello world")
Low compression ratio / high entropy means the data is noisy (i.e. random bytes, already compressed or encrypted)
Useful data usually does not have an extremely high or low entropy and the compression ratio can be compared to the original data source
The purpose of this retrieval type is to make sure the clients are not padding too much zeroes or are actually storing data that is not useful. Since the retrieval is lightweight, most of the retrieval testing will be using this kind
The text was updated successfully, but these errors were encountered: