Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

Listing possible chunking algorithm #7

Closed
DonaldTsang opened this issue Sep 18, 2018 · 10 comments
Closed

Listing possible chunking algorithm #7

DonaldTsang opened this issue Sep 18, 2018 · 10 comments
Labels
kind/support A question or request for support

Comments

@DonaldTsang
Copy link

Does it have static chunking and dynamic chunking algorithms?

@Stebalien
Copy link
Member

Could you explain what you mean by "static" and "dynamic" in this context?

@Stebalien Stebalien added the kind/support A question or request for support label Sep 19, 2018
@DonaldTsang
Copy link
Author

Static as in uniformal chunk size, dynamic as in Rabin-Karp or other algorithms that chunks using something context aware or uses a "sliding window" technique for chunking

See: https://github.com/YADL/yadl/wiki/Rabin-Karp-for-Variable-Chunking

@Stebalien
Copy link
Member

Ah. Yes, we actually have both. You can find the different splitters listed in the godoc: https://godoc.org/github.com/ipfs/go-ipfs-chunker.

@Stebalien
Copy link
Member

(closing for tracking purposes, feel free to ask followup questions and/or reopen)

@DonaldTsang
Copy link
Author

@Stebalien I have heard from @flyingzumwalt that dynamic chunking does not work, possibly due to the blocks being too large or that there are better algorithms for doing dynamic chunking.

@Stebalien
Copy link
Member

Dynamic chunking does work. There are better algorithms but the current algorithm definitely works (well, as long as the data isn't compressed).

@DonaldTsang
Copy link
Author

So what is the default chunk size in dynamic chunking? Also, has there been any benchmarks that proves the chunker works? References: ipfs-inactive/archives#134 ipfs-inactive/archives#142 ipfs/notes#183 restic/chunker#19 ipfs-inactive/archives#137

@Stebalien
Copy link
Member

So what is the default chunk size in dynamic chunking?

256KiB (same as the static chunker) with a min of 85KiB and a max of 512KiB.

Also, has there been any benchmarks that proves the chunker works?

Published? Not that I know. I've seen several of those threads but both @jbenet and I have tested rabin locally and it seems to work quite well on some workloads (but not others). Specifically, he was able to significantly compress his collection of keynote presentations.

(better chunkers and better benchmarks would, of course, be welcome)

@yuanjingsong
Copy link

Hi, I found an open source implementation on FastCDC in Golang, maybe it's helpful :) https://github.com/tigerwill90/fastcdc

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/support A question or request for support
Projects
None yet
Development

No branches or pull requests

3 participants