Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharding? #199

Closed
v-stickykeys opened this issue Sep 16, 2021 · 5 comments
Closed

Sharding? #199

v-stickykeys opened this issue Sep 16, 2021 · 5 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@v-stickykeys
Copy link

v-stickykeys commented Sep 16, 2021

By default, go-ipfs provides a sharding option for the datastore. When using this plugin the datastore is not being sharded.

As described in previous issues, the serialization in the datastore_spec is not 1:1 because when I try to add shardFunc this results in an error.

Is there a way to achieve sharding for the data stored in S3?

@v-stickykeys v-stickykeys added the need/triage Needs initial labeling and prioritization label Sep 16, 2021
@welcome
Copy link

welcome bot commented Sep 16, 2021

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@Stebalien
Copy link
Member

Specifically, the flatfs (flat-file backed datastore) provides a sharding option because some filesystems don't handle large directories very well. None of the other datastores provide such an option.

Honestly, sharding just doesn't make sense in S3 and would massively complicate the query logic.

@v-stickykeys
Copy link
Author

@Stebalien Would be great to know why you think so, as the js-ipfs plugin for s3 does support sharding and in our case it has been useful to prevent rate limiting from s3.

We really don't see it as possible to use this plugin as it is without sharding.
cc @zachferland to provide any additional thoughts.

@v-stickykeys
Copy link
Author

Also @Stebalien please see this discussion about why sharding is useful in s3 in general and more specifically why it is useful for IPFS s3 datastore: ipfs/js-datastore-s3#27

@Stebalien
Copy link
Member

Interesting, I stand corrected.

In javascript, this isn't actually a feature in the s3 datastore but in a "wrapper" datastore that transforms keys. That's probably the correct way to implement this and that implementation would live in https://github.com/ipfs/go-datastore/.

However, it's going to be non-trivial to correctly handle queries, offsets, etc. Basically, every query would need to iterate over all shards at the same time, interleaving the results.

If you want to submit a datastore to do this, take a look at how queries are handled in https://github.com/ipfs/go-datastore/blob/ed11f242ef104130b10a1e86728ab3779cd23c64/mount/mount.go#L209.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

2 participants