FRC: Retrieval Checking Requirements #1089

bajtos · 2024-12-04T13:55:22Z

When we set out to build Spark, a protocol for testing whether payload of Filecoin deals can be retrieved back, we designed it based on how Boost worked at that time (mid-2023). Soon after FIL+ allocator compliance started to use Spark retrieval success score (Spark RSR) in mid-2024, we learned that Venus Droplet, an alternative miner software, is implemented slightly differently and requires tweaks to support Spark. Things evolved quite a bit since then. We need to overhaul most of the Spark protocol to support Direct Data Onboarding deals. We will need all miner software projects (Boost, Curio, Venus) to accommodate the new requirements imposed by the upcoming Spark v2 release.

This FRC has the following goals:

Document the retrieval process based on IPFS/IPLD.
Specify what Spark needs from miner software.
Collaborate with the community to tweak the requirements to work well for all parties involved.
Let this spec and the building blocks like IPNI Reverse Index empower other builders to design & implement their own retrieval-checking networks as alternatives to Spark.

Discussion

#1086

Progress

bajtos · 2024-12-04T14:01:49Z

Tagging @steven004 @LexLuthr @magik6k @masih @willscott @juliangruber @patrickwoodhead for visibility.

FRCs/frc-retrieval-checking-requirements.md

LexLuthr · 2024-12-12T08:21:26Z

FRCs/frc-retrieval-checking-requirements.md

+
+#### Link on-chain MinerId and IPNI provider identity
+
+Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).


This requirement cannot be fulfilled in Curio. We no longer have a concept of minerID <> Unique peerID binding. IPNI must be extended to support other keys types like worker key to sign ads.

I am aware of that; see the note in the text below this paragraphs.

> [!NOTE] > This is open to extensions in the future, we can support more than one form of linking > index-provides to filecoin-miners. See e.g. [ipni/spec#33](https://github.com/ipni/specs/issues/33).

From my point of view, I prefer not to block progress on this FRC until the Curio team figures out how to extend IPNI to support other key types. Instead, I'd like this FRC to document the solution that works with Boost & Venus now and then enhance it with the new mechanism Curio needs once that new solution is agreed on.

I am probably mistaken here but Droplet (Venus' Boost) supports multiple minerIDs being associated with a single PeerID (see docs), does that mean if I am using Droplet, I need to limit myself to a 1:1 relationship to meet this requirement?

Great call, @lanzafame! I am still learning more about how Venus Droplet work and what features they offer.

Based on the docs you linked to, I believe you can have multiple minerIDs associated with a single Droplet PeerID and still meet this requirement.

In Spark, we need the PeerID returned by Filecoin.StateMinerInfo to match the PeerID used in IPNI advertisements. Spark does not check whether that PeerID is unique or shared by multiple miners.

Referencing for visibility - we are discussing a possible solution for this problem here:

filecoin-project/curio#377

Signed-off-by: Miroslav Bajtoš <[email protected]>

jsoares

Left a few editorial comments. I do not know enough about the specific topic to be able to opine on a technical level. I also found the explanation somewhat unclear, but that could be a consequence of my lack of knowledge, so not holding that against the draft.

Others will be better suited to provide a full review.

jsoares · 2025-01-19T22:43:21Z

FRCs/frc-retrieval-checking-requirements.md

+When we set out to build [Spark](https://filspark.com), a protocol for testing whether _payload_ of Filecoin deals can be retrieved back, we designed it based on how [Boost](https://github.com/filecoin-project/boost) worked at that time (mid-2023). Soon after FIL+ allocator compliance started to use Spark retrieval success score (Spark RSR) in mid-2024, we learned that [Venus](https://github.com/filecoin-project/venus) [Droplet](https://github.com/ipfs-force-community/droplet), an alternative miner software, is implemented slightly differently and requires tweaks to support Spark. Things evolved quite a bit since then. We need to overhaul most of the Spark protocol to support Direct Data Onboarding deals. We will need all miner software projects (Boost, Curio, Venus) to accommodate the new requirements imposed by the upcoming Spark v2 release.
+
+This FRC has the following goals:
+1. Document the retrieval process based on IPFS/IPLD.
+2. Specify what Spark needs from miner software.
+3. Collaborate with the community to tweak the requirements to work well for all parties involved.
+4. Let this spec and the building blocks like [IPNI Reverse Index](https://github.com/filecoin-project/devgrants/issues/1781) empower other builders to design & implement their own retrieval-checking networks as alternatives to Spark.


This reads more like motivation than an abstract. It'd be useful for the abstract to summarise the actual requirements/spec.

jsoares · 2025-01-19T22:47:40Z

FRCs/frc-retrieval-checking-requirements.md

+3. Map `(PieceCID, PieceSize)` to IPNI `ContextID` value.
+4. Query IPNI reverse index for a sample of payload blocks advertised by `ProviderID` with
+`ContextID` (see the [proposed API
+spec](https://github.com/ipni/xedni/blob/526f90f5a6001cb50b52e6376f8877163f8018af/openapi.yaml)).


Should this be in the FRC or is it our of scope? The link is fine, but trying to understand whether we see it as central.

jsoares · 2025-01-19T22:48:43Z

FRCs/frc-retrieval-checking-requirements.md

+
+#### Link on-chain MinerId and IPNI provider identity
+
+Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).


Suggested change

Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).

Storage providers are required to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).

jsoares · 2025-01-19T22:51:17Z

FRCs/frc-retrieval-checking-requirements.md

+}
+```
+
+IPNI provider status ([query](https://cid.contact/providers/12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys)):


Not a huge fan of these arbitrary links on a document that's intended to be frozen for a long time.

jsoares · 2025-01-19T22:53:44Z

FRCs/frc-retrieval-checking-requirements.md

+ 1. It's inefficient.
+
+    1. Each retrieval check requires two requests - one to download ~8MB chunk of a piece, the second one to download the payload block found in that chunk.
+
+    1. Spark typically repeats every retrieval check 40-100 times. Scanning CAR byte range 40-100 times does not bring enough value to justify the network bandwidth & CPU cost.
+
+ 1. It's not clear how can retrieval checkers discover the address where the SP serves piece retrievals.


The 1 numbered list renders fine, but is not great for reading in raw md.

Just like this FRC is dictating a convention, you could dictate a convention that at least one of the on-chain Multiaddrs stored in the miner actor of the SP is where you could retrieve from.

jsoares · 2025-01-19T22:54:55Z

FRCs/frc-retrieval-checking-requirements.md

+[Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes:
+- Miner software must construct IPNI `ContextID` values in a specific way.
+- Because such ContextIDs are scoped per piece (not per deal), miner software must de-duplicate advertisements for deals storing the same piece.


Just to be clear, and given this is an FRC, what are we breaking exactly?

jsoares · 2025-01-19T23:01:25Z

FRCs/frc-retrieval-checking-requirements.md

+To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements:
+
+1. Link on-chain MinerId and IPNI provider identity ([spec](#link-on-chain-minerid-and-ipni-provider-identity)).
+2. Provide retrieval service using the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/).
+3. Advertise retrievals to IPNI.
+4. In IPNI advertisements, construct the `ContextID` field from `(PieceCID, PieceSize)` ([spec](#construct-ipni-contextid-from-piececid-piecesize))
+
+Meeting these requirements needs support in software implementations like Boost, Curio & Venus Droplet but potentially also updates in settings configured by the individual SPs.


Suggested change

To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements:

1. Link on-chain MinerId and IPNI provider identity ([spec](#link-on-chain-minerid-and-ipni-provider-identity)).

2. Provide retrieval service using the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/).

3. Advertise retrievals to IPNI.

4. In IPNI advertisements, construct the `ContextID` field from `(PieceCID, PieceSize)` ([spec](#construct-ipni-contextid-from-piececid-piecesize))

Meeting these requirements needs support in software implementations like Boost, Curio & Venus Droplet but potentially also updates in settings configured by the individual SPs.

To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. This FRC outlines requirements that SPs and their software stacks should meet to allow 3rd-party networks to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content.

The goal here is not to go into technical detail. I left a non-binding suggestion; something along these lines would be preferable.

The content in 26-29 would potentially be a good fit for the abstract; see comment below.

lanzafame · 2025-01-23T05:40:21Z

FRCs/frc-retrieval-checking-requirements.md

+
+### Retrieval Requirements
+
+1. Whenever a deal is activated, the SP MUST advertise all IPFS/IPLD payload block CIDs found in the Piece to IPNI. See the [IPNI Specification](https://github.com/ipni/specs/blob/main/IPNI.md) and [IPNI HTTP Provider](https://github.com/ipni/specs/blob/main/IPNI_HTTP_PROVIDER.md) for technical details.


Can a client opt out of their deal payload being indexing?

It'll depend on the market software, Boost should support optionality here (see) and I imagine Curio will follow suit because there's already been demonstrated demand for "private" deals.

this is all in the context of data that's claimed to be 'retrievable'

We're not defining a standard here for signalling between client and SP / deal making for how a client communicates to an SP that data should be retrievable or not.

bajtos · 2025-01-23T13:20:13Z

Thank you for the feedback! I'll take a look and respond to your comments (early) next week.

rvagg · 2025-02-04T08:37:41Z

FRCs/frc-retrieval-checking-requirements.md

+
+<!--"If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the FIP.-->
+
+To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements:


My quibble with this paragraph is that you're asserting a need for "3rd-party networks", yet the structure you outline below isn't the only way to achieve the goal you've stated (because it's a "need"). This is more like Spark's preferred path, which is fine, it's just not a generalisable need. Maybe it'd be best to not give the impression to the reader that this is the only way to do this.

I'd prefer a more direct approach here: To allow Spark to sample deals, we need the following to be true.

It's nice to have a standard that's specified via FRC but it's not like we have a queue of retrieval checkers waiting for such a standard in order to get going.

rvagg · 2025-02-04T09:03:07Z

FRCs/frc-retrieval-checking-requirements.md

+1. Let's assume the maximum CAR block size is 4 MB and we have deal's `PieceCID` and `PieceSize`.
+2. Pick a random offset $o$ in the piece so that $0 <= o <= PieceSize - 2*4 MB$.
+3. Send an HTTP range-retrieval request to retrieve the bytes in the range`(o, o+2*4MB)`.
+4. Parse the bytes to find a sequence that looks like a CARv1 block header.


Sequence that looks like a CARv1 will either start at offset 0 or be discoverable from a CARv2 header which is 51 bytes long from offset 0 or will alternatively be a PoDSI container which should also be discoverable from offset 0.

rvagg · 2025-02-04T09:09:05Z

FRCs/frc-retrieval-checking-requirements.md

+5. Extract the CID from the CARv1 block header.
+6. Hash the block's payload bytes and verify that the digest equals to the CID.


CID in a CARv1 block header isn't a CID of the payload, it's typically (but not always, and may even be omitted) the root of the entire DAG within the CAR.

But the approach here could be generalised as something like the following: download a small chunk from the start of a piece, determine whether it's readable as either a CARv2, CARv1 or PoDSI container, then use that knowledge to read further sections of the containing CAR(s) to find CIDs to retrieve and build a catalog by progressive byte-range scanning of CAR section headers.

It's true that this is inefficient and would require a stored state to build up knowledge of a piece, but it is possible to do an IPLD block discovery by doing many, progressive, and small, piece retrievals if the SP is exposing the /piece/ endpoint.

You're still back at the problem of trusting the SP that the "piece" they are serving you corresponds to the PieceCID which you trust. As long as the SP is responsible for serving you the response or reporting CIDs to IPNI, you're at the mercy of the SP to tell you what a piece contains rather than the client who ought to have a canonical mapping of piece->blocks (or at least did have at the beginning). Or, to be properly trustless, downloading the entire piece and verifying the PieceCID matches the piece they gave you, but I guess you have to rule this option out and therefore be trusting the SP to play nice.

we should extend range reads over pieces to include the intermediate proof tree (which may already be in the podsci index, or can be generated on the side like the cid index) to allow validation that the range is indeed the expected sub-section of the overall PieceCID

rvagg · 2025-02-04T09:12:42Z

FRCs/frc-retrieval-checking-requirements.md

+
+   _We need the server to return an inclusion proof up to the PieceCID root._
+
+4. Parse the bytes to find a sequence that looks like a CARv1 block header.


see my point above, this doesn't need to be a random offset

bajtos requested review from momack2, arajasek, jennijuju, kaitlin-beegle, anorth, raulk, jsoares, TippyFlitsUK and rvagg as code owners December 4, 2024 13:55

bajtos force-pushed the frc-retrieval-checking-requirements branch from dd101ee to 950bc84 Compare December 4, 2024 13:56

This was referenced Dec 4, 2024

feat: index provider filecoin-project/curio#182

Merged

Secondary provider identifer ipni/specs#33

Open

bajtos marked this pull request as draft December 4, 2024 14:03

bajtos commented Dec 11, 2024

View reviewed changes

FRCs/frc-retrieval-checking-requirements.md Outdated Show resolved Hide resolved

bajtos commented Dec 11, 2024

View reviewed changes

FRCs/frc-retrieval-checking-requirements.md Outdated Show resolved Hide resolved

bajtos commented Dec 11, 2024

View reviewed changes

FRCs/frc-retrieval-checking-requirements.md Outdated Show resolved Hide resolved

bajtos commented Dec 11, 2024

View reviewed changes

FRCs/frc-retrieval-checking-requirements.md Show resolved Hide resolved

LexLuthr reviewed Dec 12, 2024

View reviewed changes

bajtos marked this pull request as ready for review December 18, 2024 12:56

bajtos added 8 commits December 18, 2024 13:56

FRC: Retrieval Checking Requirements

2ccf501

Signed-off-by: Miroslav Bajtoš <[email protected]>

README: add a link to the new RFC

85ee7bf

Signed-off-by: Miroslav Bajtoš <[email protected]>

improve compatibility section, add incentives

483fa94

Signed-off-by: Miroslav Bajtoš <[email protected]>

finish the document

fe10040

Signed-off-by: Miroslav Bajtoš <[email protected]>

add link to Spark v2 milestone issue

f758e5b

Signed-off-by: Miroslav Bajtoš <[email protected]>

updates

71ed7d4

Signed-off-by: Miroslav Bajtoš <[email protected]>

add IPNI support to compat table

3e9ecd9

Signed-off-by: Miroslav Bajtoš <[email protected]>

add mitigation for SPs not advertising all blocks

a91383e

Signed-off-by: Miroslav Bajtoš <[email protected]>

bajtos force-pushed the frc-retrieval-checking-requirements branch from 859745d to a91383e Compare December 18, 2024 12:57

bajtos mentioned this pull request Dec 18, 2024

Write FRC describing Spark's requirements on miner SW (Curio, Venus) CheckerNetwork/roadmap#189

Closed

explain that Venus supports HTTP retrievals

1b8f592

Signed-off-by: Miroslav Bajtoš <[email protected]>

jsoares reviewed Jan 19, 2025

View reviewed changes

lanzafame reviewed Jan 23, 2025

View reviewed changes

bajtos mentioned this pull request Jan 31, 2025

Spark DDO support for Curio CheckerNetwork/roadmap#231

Open

3 tasks

rvagg reviewed Feb 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FRC: Retrieval Checking Requirements #1089

FRC: Retrieval Checking Requirements #1089

bajtos commented Dec 4, 2024 •

edited

Loading

bajtos commented Dec 4, 2024

LexLuthr Dec 12, 2024

bajtos Dec 12, 2024

lanzafame Dec 12, 2024

bajtos Dec 13, 2024

bajtos Jan 31, 2025

jsoares left a comment

jsoares Jan 19, 2025

jsoares Jan 19, 2025

jsoares Jan 19, 2025

jsoares Jan 19, 2025

jsoares Jan 19, 2025

rvagg Feb 4, 2025

jsoares Jan 19, 2025

jsoares Jan 19, 2025

lanzafame Jan 23, 2025

rvagg Feb 4, 2025

willscott Feb 4, 2025

bajtos commented Jan 23, 2025

rvagg Feb 4, 2025

rvagg Feb 4, 2025

rvagg Feb 4, 2025

willscott Feb 4, 2025

rvagg Feb 4, 2025


		#### Link on-chain MinerId and IPNI provider identity

		Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).

	Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).
	Storage providers are required to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).


		### Retrieval Requirements

		1. Whenever a deal is activated, the SP MUST advertise all IPFS/IPLD payload block CIDs found in the Piece to IPNI. See the [IPNI Specification](https://github.com/ipni/specs/blob/main/IPNI.md) and [IPNI HTTP Provider](https://github.com/ipni/specs/blob/main/IPNI_HTTP_PROVIDER.md) for technical details.


		<!--"If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the FIP.-->

		To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements:

		5. Extract the CID from the CARv1 block header.
		6. Hash the block's payload bytes and verify that the digest equals to the CID.


		_We need the server to return an inclusion proof up to the PieceCID root._

		4. Parse the bytes to find a sequence that looks like a CARv1 block header.

FRC: Retrieval Checking Requirements #1089

Are you sure you want to change the base?

FRC: Retrieval Checking Requirements #1089

Conversation

bajtos commented Dec 4, 2024 • edited Loading

Discussion

Progress

bajtos commented Dec 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoares left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bajtos commented Jan 23, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bajtos commented Dec 4, 2024 •

edited

Loading