Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified Provide Interface for Content Routers #10097

Open
3 tasks done
guillaumemichel opened this issue Aug 23, 2023 · 8 comments
Open
3 tasks done

Unified Provide Interface for Content Routers #10097

guillaumemichel opened this issue Aug 23, 2023 · 8 comments
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/feature A new feature kind/maintenance Work required to avoid breaking changes or harm to project's status quo P2 Medium: Good to have, but can wait until someone steps up

Comments

@guillaumemichel
Copy link

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

Problem Statement:

Currently, Kubo is responsible for managing the DHT's provide and reprovide operations. However, with the evolution of Content Routers beyond just the DHT, it's evident that the existing mechanism is not optimal. The reasons are:

  1. The reprovide strategy Kubo uses was mainly designed for DHTs and is not always suitable for the newer Content Routers such as IPNI which is using a different advertising mechanism.
  2. The DHT cannot optimize its reprovide strategy as it doesn't have a direct insight into the content that needs to be republished.

Proposed Solution

To better streamline the providing mechanism across different content routers, we propose a unified interface that shifts the responsibility from Kubo to the individual content routers. The proposed interface includes:

  • StartProvide(CIDs): Instructs the content router to begin advertising that the Kubo node is storing the specified CIDs. This advertisement (or republishing) should continue until a StopProvide is invoked for these CIDs.
  • StopProvide(CIDs): Commands the content router to cease the advertisement for the given CIDs.
  • ListProvides: Returns the list of CIDs currently being advertised by the content router.

Benefits

  1. Flexibility: With a generic interface, different content routers can easily integrate with Kubo without being tied to a DHT-specific strategy.
  2. Optimization Opportunities: Allows the DHT and other content routers to implement their own specific provide strategies, optimized for their use cases. In the DHT, this change of interface is necessary to implement Reprovide Sweep (IPFS Thing 2023 presentation), allowing a resource efficient reprovide strategy, enabling large content providers to advertise content to the DHT.
  3. Clarity of Responsibilities: Removing the responsibility from Kubo makes the system modular, allowing each component to focus on its core functionality.

Feedback and Collaboration

The proposed interface is just a draft for now. The goal of this issue is to gather feedback and start a public discussion about specific interface needs for different content routers, especially IPNI and the DHT. This issue will probably be followed up by an IPIP in ipfs/specs, once we have listed the requirements of all (known) Content Routers.

References

cc: @masih @ischasny @aschmahmann @Jorropo @dennis-tra @iand

@guillaumemichel guillaumemichel added the kind/feature A new feature label Aug 23, 2023
@aschmahmann
Copy link
Contributor

This issue will probably be followed up by an IPIP

I don't see why this will require an IPIP, IIUC there are no spec things here. This is just agreement on what some Go code should look like so we can use multiple content advertising systems sanely.

StartProvide(CIDs)

  • There should be a contract with Start/Stop around atomicity. What's supposed to happen if the program dies in the middle of the operation?
  • What is the expected behavior around if the same CID is provided twice? What if it's StartProvide(CID); StartProvide(CID); StopProvide(CID)?
    • Current IPNI uses are heavy into duplicates (e.g. advertise a group of CIDs associated with some abstract concept like a pin, block collection, etc.) so will attempting deduplication in the API (i.e. the result is no provided data) be feasible/reasonable?
    • Dealing with duplicate advertisements in the IPFS Public DHT is mostly just wasteful, so will handling duplicates in the API (i.e. the result is provided data) and then deduplicating internally be feasible/reasonable?
  • Related to duplicates, is assigning context/metadata to groupings
    • IPNI leans into this already with contextIDs
    • This might compose nicely with any separation around if/why some CID should be advertised. For example, users might not want to advertise data they've downloaded put haven't pinned but some of the underlying blocks in the same DAG might be pinned.
      • Note: while this might feel like some existing proposals for named pins and ref-counting block GC this is not dependent on those since this is only about the advertisements and not the block data. That being said it would probably pave the way towards making those things easier for anyone who wanted to tackle them in the future.

ListProvides

We'll need to define the atomicity guarantees and guarantees around duplicates here. I'm not sure how this function is planned to be used, so probably easier to define things here after Start/Stop are well defined.

@willscott
Copy link
Contributor

What is the expected behavior around if the same CID is provided twice?

I would propose it's idempotent

There should be a contract with Start/Stop around atomicity. What's supposed to happen if the program dies in the middle of the operation?

I would propose the contract is that nothing is promised until the method returns. failing during execution means state is left in an undefined state, and it is the caller's responsibility to re-call the method (see idempotency above)

@Jorropo
Copy link
Contributor

Jorropo commented Aug 24, 2023

I would propose the contract is that nothing is promised until the method returns. failing during execution means state is left in an undefined state, and it is the caller's responsibility to re-call the method (see idempotency above)

I think it's better if it's eventually transactional, if StartProvide fails none of the CIDs are provided, it is fine if temporarily some CIDs are provided but eventually they must not. (this allows to parallelise writing to a database while doing DHT provides for example, if writing to the DB fail no CID is enqueued and whatever has been provided until there will stop being provided in ~1 day).

@guillaumemichel
Copy link
Author

I would propose it's idempotent

I agree with @willscott

I can see multiple ways forward:

  1. Delegated Responsibility:

    • Upon calling StartProvide([]cid.Cid), the function returns immediately.
    • The onus is on the content router to advertise the given Content Identifiers (CIDs). Even if there are initial failures, it assumes that the operation will eventually succeed.
    • Content routers will manage two lists:
      • CIDs awaiting advertisement.
      • CIDs already advertised.
    • An additional method, ProvideStatus(cid.Cid), can be queried to get the status of a specific CID's advertisement. This method might return statuses like advertised, pending, retrying, or failed.
  2. Caller's Responsibility:

    • In this pattern, StartProvide([]cid.Cid) error will return nil if all the CIDs were advertised successfully.
    • If at least one CID fails to be advertised, an error is returned.
    • It's up to the caller to retry in case of failures. This gives more control to the caller but also demands that they handle retries and error management. Note that error handling should make no assumption about the nature of the content router.
  3. Channel-based Feedback:

    • The method StartProvide([]cid.Cid) chan update returns a channel that provides real-time updates regarding the state of the CID advertisements.
    • As (groups of) CIDs are processed (either successfully or with failures), updates are written to this channel.
    • The channel is closed once all CIDs have been addressed.
    • This method could be designed to either:
      a) Let the application manage retries
      b) Hand over retry responsibility to the content router but still keep the application informed.
      • An additional method, ProvideStatus(cid.Cid), can be queried to get the status of a specific CID's advertisement. This method might return statuses like advertised, pending, retrying, or failed.

I have a preference for 3b because the retry logic may be content router specific and the application should make no assumption on the content router. Also 3b keeps the application informed of ongoing statuses, facilitating informed decision-making. This approach strikes a balance between delegation and oversight.

@lidel lidel added exp/expert Having worked on the specific codebase is important kind/maintenance Work required to avoid breaking changes or harm to project's status quo effort/weeks Estimated to take multiple weeks P2 Medium: Good to have, but can wait until someone steps up labels Sep 4, 2023
@lidel
Copy link
Member

lidel commented Sep 10, 2024

cc @gammazero

@gammazero
Copy link
Contributor

gammazero commented Jan 24, 2025

if all the CIDs were advertised successfully

What does "advertised successfully" mean, at least re. IPNI?

  • That the provide system has successfully received (and persisted) the StartProvide request?
  • That an advertisement has created and announced?
  • That an indexer has fetched the advertisement or it has been delivered to an indexer in some way?
  • That a query to the indexer returns the location of some/all of the most recently advertised content?

@gammazero
Copy link
Contributor

In order to reduce noise from advertising short-term content, or advertising content availability on short-lived providers, it has been proposed that the provide system be online of a minimum of time before advertising content. A time of 12h has been suggested.

  • Is this a reasonable feature?
  • Should the client be able to override this?
  • Does this reset every time the client goes offline?
  • Does the indexer need to check that content is retrievable?
  • Does the online check include network connectivity (and to where) or just that the service is running?

@aschmahmann
Copy link
Contributor

In order to reduce noise from advertising short-term content, or advertising content availability on short-lived providers, it has been proposed that the provide system be online of a minimum of time before advertising content. A time of 12h has been suggested.

I'd say this is unreasonable to apply across the board (e.g. Amino DHT + IPNI). However, if say IPNI is not equipped to handle certain scenarios then given we're adding new functionality it seems reasonable to roll it out more slowly / carefully. For example, if IPNI is going to heavily punish users if they're not online the X times a day they get pinged and the user isn't running a high availability server it might not be worth allowing pushing data to IPNI at all.

For something like the Amino DHT I might do something like:

  1. Data can be provided immediately, or maybe wait for something small like 5 minutes that could be overridden by user config
  2. Track how recently the (sorted in XOR space) data was provided so we don't constantly restart from the beginning

Does the online check include network connectivity (and to where) or just that the service is running?
There are a number of off-the-beaten-path types of kubo deployments that don't really interact with "mainnet", we should probably think of how to answer this within that context. For example, we already disable IPNI by default if PNET is enabled given the user is indicating they want to control their network interactions and they are unlikely to benefit from the IPNI interactions.

What does "advertised successfully" mean, at least re. IPNI?

@guillaumemichel wrote the comment and might disagree, but my 2c is that it means "That the provide system has successfully received (and persisted) the StartProvide request?". The other types of information (e.g.. particularly knowing if the data is ready to be fetched by a third party) seems like a useful API as well. This would hopefully allow easier standardization with DHT + IPNI since both could have local databases to write the Start/Stop Provide requests to as well as ways to track how much of the data they expect to be available (e.g. reprovided to the DHT recently, has been ingested by IPNI and we've passed our uptime checks, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/feature A new feature kind/maintenance Work required to avoid breaking changes or harm to project's status quo P2 Medium: Good to have, but can wait until someone steps up
Projects
No open projects
Status: No status
Development

No branches or pull requests

6 participants