-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gateway/dir-index-html: switch dir listing sizes to Tsize #9058
Comments
This assume that we also disable resolving types (if we didn't already I'm not sure). This could also be used in the fast mode (where size and type are disabled), to still give a size estimation, but then it's a feature request not a perf improvement (as the perf come from not fetching direct links). |
Yes, the idea is to disable resolving types too. An equivalent of
|
Related to my question in ipfs/specs#283 (comment) wondering if this kind of thing is covered by the spec, or if it's a per-implementation "do whatever you want" kind of thing. If the latter than sure 🤷, we can always make more changes if people complain given that it's basically just an abstract size hint that's not super usable for anything other than rough estimates (i.e. a) it's not how big the DAG is on disk, even when the data structure is correct, since it doesn't account for deduplication b) for files its not how big the file would be on disk since it's not loading the first file block with the size). |
An aside, although not sure where else to put this. From what I can tell the less and more abstract we make the ways UnixFS directories load on gateways the easier it becomes to push for non-UnixFS data being loaded by gateways. For example, what's the difference between rendering a map or list as a dir-index HTML directory vs loading in a UnixFS directory? Is it just This probably plays on some tensions between where it's worth changing |
Synthesis of ideas from: ipfs/kubo#8455 and ipfs/kubo#9058
Yes, in my mind this is per-implementation "do whatever you want" kind of thing. I've added "Generated HTML with directory index" section to ipfs/specs@9fc9a9c and included basic suggestions for implementers, but also noted they are free to make their own decisions and innovations here. As for Kubo:
|
* feat: initial HTTP gateway specs This adds gateway specs under ./http-gateways directory. The aim is to document _current_ behavior (implementation in go-ipfs 0.13) and switch the way we do the gateway work to be specs-driven. Long term goal is to provide language and implementation agnostic specification that anyone can use to implement compatible gateways. * gateway: add Content-Range * gateway: registerProtocolHandler uri router * CODEOWNERS: add lidel for ./http-gateways * gateway: resolving an advanced DNSLink chain * gateway: only-if-cached HEAD behavior * gateway: suggestions from reviewers Co-authored-by: Adrian Lanzafame <[email protected]> Co-authored-by: Vasco Santos <[email protected]> Co-authored-by: Oli Evans <[email protected]> * gateway: include CIDv1 node in summary * gateway: reorder URI router section As suggested in #283 (comment) * gateway: add Denylists section * gateway: switch only-if-cached miss to 412 Rationale: ipfs/kubo#8783 (comment) * gateway: apply suggestions from review Co-authored-by: Thibault Meunier <[email protected]> * gateway: apply suggestions from Cloudflare #283 (review) * gateway: add X-Content-Type-Options * gateway: simplify dnslink summary https://github.com/ipfs/specs/pull/283/files#r898709569 * gateway: document 412 Precondition Failed https://github.com/ipfs/specs/pull/283/files#r898686654 * gateway: link to ipld codecs explainer https://github.com/ipfs/specs/pull/283/files#r898687052 * gateway: stub about handling traversal errors https://github.com/ipfs/specs/pull/283/files#r892845860 * gateway: expand HTTP caching considerations * gateway: editorial fixes Co-authored-by: Steve Loeppky <[email protected]> * gateway: expand on Host header parsing https://github.com/ipfs/specs/pull/283/files#r898703765 * gateway: editorial fixes * gateway: X-Forwarded-Proto and X-Forwarded-Host * gateway: editorial fixes * gateway: X-Trace-Id optional header suggested in: #283 (comment) rationale: having specific name as a suggestion of 'best practice' in the specs will simplify debugging across ecosystem * gateway: Generated HTML with directory index Synthesis of ideas from: ipfs/kubo#8455 and ipfs/kubo#9058 Co-authored-by: Adrian Lanzafame <[email protected]> Co-authored-by: Vasco Santos <[email protected]> Co-authored-by: Oli Evans <[email protected]> Co-authored-by: Thibault Meunier <[email protected]> Co-authored-by: Steve Loeppky <[email protected]>
go-ipfs 0.13 shipped with #8853 which introduced a bad-aid where we hide "size" column in big directories.
This allows us to skip child block resolution for directories bigger than 100 items, making the entire thing load really fast.
Sadly, directories smaller than 100 are still slow.
They load slower than a directory with 101 items.
Why showing size and type is expensive
The root node of every UnixFS DAG includes information about node type and the size of raw data inside of it (without metadata).
It also has links to other DAGs representing files and directories, and the total size of DAGs they represent (data + metadata)
The "size" on directory listing is the data without metadata.
To know the exact size and type of items in a directory listing, every item triggers additional block fetch, which baloons the time it takes to return response with a directory listing.
Proposed Change
Replace "size" based on raw dfata with "DAG size" based on
Tsize
value already present in the root UnixFS node (see logical format). The interface will look the same.What is user benefit?
Every directory will load fast, as soon the root UnixFS node is available.
This makes an extreme difference on the first load, when the directory is not present in local cache.
Loading
bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm
(10k items)will be as fast as a directory with 100.
Won't this be causing issues?
These values are provided in HTML dir listing only for quick eyeballing, and the difference between raw data and
Tsize
will usually be small enough to not impact this purpose:But just to be sure, we will add on-hover tooltip explanation of the value (right now, there is none).
Is this worth it?
To illustrate, I started a new, empty repo each time, and listed a directory with 1864 items.
go-ipfs 0.12 (which fetched every UnixFS child block to read the size) took nearly 3 minutes:
Same directory, but listed with change to
Tsize
:From nearly 3 minutes to under 1s. I'd say worth it.
The text was updated successfully, but these errors were encountered: