# Explainer: Subresource loading with Web Bundles Last updated: Apr 2022 We propose a new approach to load a large number of resources efficiently using a format that allows multiple resources to be bundled, e.g. [Web Bundles](https://web.dev/web-bundles/). <!-- TOC --> - [Backgrounds](#backgrounds) - [Requirements](#requirements) - [`<script>`-based API](#script-based-api) - [Example](#example) - [The bundle](#the-bundle) - [The main document](#the-main-document) - [Request's mode and credentials mode](#requests-mode-and-credentials-mode) - [Request's destination](#requests-destination) - [CORS and CORP for subresource requests](#cors-and-corp-for-subresource-requests) - [Content Security Policy (CSP)](#content-security-policy-csp) - [Defining the scopes](#defining-the-scopes) - [Serving constraints](#serving-constraints) - [Extensions](#extensions) - [Subsequent loading and Caching](#subsequent-loading-and-caching) - [Compressed list of resources](#compressed-list-of-resources) - [Alternate designs](#alternate-designs) - [`<link>`-based API](#link-based-api) - [Resource Bundles](#resource-bundles) - [Summarizing the contents of the bundle](#summarizing-the-contents-of-the-bundle) - [Approximate Membership Query datastructure](#approximate-membership-query-datastructure) - [No declarative scope](#no-declarative-scope) - [Naming](#naming) <!-- /TOC --> ## Backgrounds - Loading many unbundled resources is still slower in 2020. We concluded that [bundling was necessary in 2018](https://v8.dev/features/modules#bundle), and our latest local measurement still suggests that. - The output of JS bundlers (e.g. webpack) doesn't interact well with the HTTP cache. They are pretty good tools but configuring them to work in an optimal way is tough, and sometimes they'are also incompatible with new requirements like [dynamic bundling](https://github.com/azukaru/progressive-fetching/blob/master/docs/dynamic-bundling/index.md) (e.g. small edit with tree shaking could invalidate everything). - With JS bundlers, execution needs to wait for the full bytes to come. Ideally loading multiple subresources should be able to utilize full streaming and parallelization, but that's not possible if all resources are bundled as one javascript. (For JS modules execution still needs to be waited for the entire tree due to the current [deterministic execution model](https://docs.google.com/document/d/1MJK0zigKbH4WFCKcHsWwFAzpU_DZppEAOpYJlIW7M7E/edit#heading=h.5652gd5ks5id)) - Related issues: [#411](https://github.com/WICG/webpackage/issues/411), [#526](https://github.com/WICG/webpackage/issues/526) ## Requirements Web pages will declare that some of their subresources are provided by the [Web Bundle](https://wpack-wg.github.io/bundled-responses/draft-ietf-wpack-bundled-responses.html) at a particular URL. It's likely that the HTML parser will encounter some of the bundle's subresources before it receives the bundle's index. The declaration needs to somehow prevent the parser from double-fetching those bytes, which it can accomplish in a couple ways. We don't see an initial need for an associated Javascript API to pull information out of the bundle. We also don't address a way for Service Workers to use bundles to fill a Cache. Service Workers can technically unpack a bundle into [`cache.put()`](https://developer.mozilla.org/en-US/docs/Web/API/Cache/put) calls themselves, and, while the result may take an inefficient amount of browser-internal communication, letting some sites experiment with this will give us a better chance of designing the right API. This feature is a powerful feature that can replace any subresources in the page. So we limit the use of this feature only in [secure contexts](https://www.w3.org/TR/powerful-features/). This feature is NOT related to [Signed Exchanges](https://web.dev/signed-exchanges/), that is a common misunderstanding. The bundle doesn't have to be signed. ## `<script>`-based API Developers will write ```html <script type="webbundle"> { "source": "https://example.com/dir/subresources.wbn", "resources": ["https://example.com/dir/a.js", "https://example.com/dir/b.js", "https://example.com/dir/c.png"] } </script> ``` to tell the browser that subresources specified in `resources` can be found within the `https://example.com/dir/subresources.wbn` bundle. When the browser parses such a `script` element, it: 1. Fetches the specified Web Bundle, `https://example.com/dir/subresources.wbn`. 2. Records the `resources` and _delays_ fetching a subresource specified there if a subresource's origin is the [same origin](https://html.spec.whatwg.org/#same-origin) as the bundle's origin and its [path](https://url.spec.whatwg.org/#concept-url-path) contains the bundle's [shortened](https://url.spec.whatwg.org/#shorten-a-urls-path) path as a prefix. 3. As the bundle arrives, the browser fulfills those pending subresource fetches from the bundle's contents. 4. If a fetch isn't actually contained inside the bundle, it's probably better to fail that fetch than to go to the network, since it's easier for developers to fix a deterministic network error than a performance problem. The primary requirement to avoid fetching the same bytes twice is that "If a specified subresource is needed later in the document, that later fetch should block until at least the index of the bundle has downloaded to see if it's there." It seems secondary to then say that if a specified subresource isn't in the bundle, its fetch should fail or otherwise notify the developer: that just prevents delays in starting the subresource fetch. ## Example ### The bundle Suppose that the bundle, `subresources.wbn`, includes the following resources: ``` - https://example.com/dir/a.js (which depends on ./b.js) - https://example.com/dir/b.js - https://example.com/dir/c.png - … (omitted) ``` A URL of the resource in the bundle can be a [relative URL](https://url.spec.whatwg.org/#syntax-url-relative) to the bundle. A browser must [parse a URL](https://html.spec.whatwg.org/#parse-a-url) using bundle's URL. ### The main document ```html <script type="webbundle"> { "source": "https://example.com/dir/subresources.wbn", "resources": ["https://example.com/dir/a.js", "https://example.com/dir/b.js", "https://example.com/dir/c.png"] } </script> <script type=”module” src=”https://example.com/dir/a.js”></script> <img src=https://example.com/dir/c.png> ``` Then, a browser must fetch the bundle, `subresources.wbn`, and load subresources, `a.js`, `b.js`, and `c.png`, from the bundle. A URL in `source` can be a [relative URL](https://url.spec.whatwg.org/#syntax-url-relative) and must be resolved on document's [base URL](https://html.spec.whatwg.org/#document-base-url). A URL in `resources` and `scopes` can be a [relative URL](https://url.spec.whatwg.org/#syntax-url-relative) and must be resolved on the bundle's URL. `<script type="webbundle">` doesn't support `src=` attribute. The rule must be inline. ## Request's mode and credentials mode A [request](https://fetch.spec.whatwg.org/#concept-request) for a bundle will have its [mode][request mode] set to "`cors`" and its [credentials mode][credentials mode] set to "`same-origin`" unless a `credentials` is specified in its JSON as follows: ``` html <script type="webbundle"> { "source": "https://example.com/dir/subresources.wbn", "credentials": "omit", "resources": ["https://example.com/dir/a.js", "https://example.com/dir/b.js", "https://example.com/dir/c.png"] } </script> ``` A possible value is "`omit`", "`same-origin`", or "`include"`. See [the fetch spec][credentials mode] for details. If other values are specified, a [credentials mode][credentials mode] is set to "`same-origin`" . Note: `<script>` element's [crossorigin][crossorigin attribute] attribute is not used. [crossorigin attribute]: https://html.spec.whatwg.org/multipage/semantics.html#attr-link-crossorigin [request mode]: https://fetch.spec.whatwg.org/#concept-request-mode [credentials mode]: https://fetch.spec.whatwg.org/#concept-request-credentials-mode ## Request's destination With the `<script>`-based API, a [request](https://fetch.spec.whatwg.org/#concept-request) for a bundle will have its [destination](https://fetch.spec.whatwg.org/#concept-request-destination) set to "`webbundle`" ([whatwg/fetch#1120](https://github.com/whatwg/fetch/issues/1120)). ## CORS and CORP for subresource requests [CORS](https://fetch.spec.whatwg.org/#http-cors-protocol) and [CORP](https://fetch.spec.whatwg.org/#cross-origin-resource-policy-header) checks on subresources in bundles are based on the URL and response headers of requested subresource. For example, if a cors request is made to a cross-origin subresource in a bundle, and the subresource does not have an `Access-Control-Allow-Origin:` header, the request will fail. Similarly, if a no-cors request is made to a cross-origin subresource in a bundle, and the subresource has `Cross-Origin-Resource-Policy: same-origin` header, the request will fail. ## Content Security Policy (CSP) For resources loaded from bundles, URL matching of CSP is done based on the URL of the resource, not the URL of the bundle. For example, given this CSP header: ``` Content-Security-Policy: script-src https://example.com/script/ ``` In the following, `a.js` will be loaded, but `b.js` will be blocked: ``` <script type="webbundle"> { "source": "https://example.com/subresources.wbn", "resources": ["https://example.com/script/a.js", "https://example.com/b.js"] } </script> <script src=”https://example.com/script/a.js”></script> <script src=”https://example.com/b.js”></script> ``` ## Defining the scopes Instead of including a list of resources, the `<script>` defines a `scopes`. ```html <script type="webbundle"> { "source": "https://example.com/dir/subresources.wbn", "scopes": ["https://example.com/dir/js/", "https://example.com/dir/img/", "https://example.com/dir/css/"] } </script> ``` Any subresource under the `scopes` will be fetched from the bundle. ## Serving constraints See the [Serving constraints](https://wpack-wg.github.io/bundled-responses/draft-ietf-wpack-bundled-responses.html#name-serving-constraints) for response headers which MUST be included when serving Web Bundles over HTTP. ## Extensions There are several extensions to this explainer, aiming to support various use cases which this explainer doesn't support: - [Subresource loading with Web Bundles: Support opaque origin iframes](./subresource-loading-opaque-origin-iframes.md) See [issue #641](https://github.com/WICG/webpackage/issues/641) for the motivation of splitting the explainer into the core part, this explainer, and the extension parts. ## Subsequent loading and Caching [Dynamic bundle serving with WebBundles](https://docs.google.com/document/d/11t4Ix2bvF1_ZCV9HKfafGfWu82zbOD7aUhZ_FyDAgmA/edit) is a detailed exploration of how to efficiently retrieve only updated resources on the second load. The key property is that the client's request for a bundle embeds something like a [cache digest](https://httpwg.org/http-extensions/cache-digest.html) of the resources it already has, and the server sends down the subset of the bundle that the client doesn't already have. ## Compressed list of resources As discussed in [Dynamic bundle serving with WebBundles](https://docs.google.com/document/d/11t4Ix2bvF1_ZCV9HKfafGfWu82zbOD7aUhZ_FyDAgmA/edit), simply including a list of resources in the HTML [may cost as little as 5 bytes per URL on average after the HTML is compressed](https://github.com/yoavweiss/url_compression_experiments). ## Alternate designs ### `<link>`-based API This explainer had used `<link>`-based API before adopting `<script>`-based API: ```html <link rel="webbundle" href="https://example.com/dir/subresources.wbn" resources="https://example.com/dir/a.js https://example.com/dir/b.js https://example.com/dir/c.png" /> ``` However, we abandoned `<link>`-based API, in favor of `<script>`-based API. See [issue #580](https://github.com/WICG/webpackage/issues/580) for the motivation. Note that some of the following alternate designs were proposed at the era of `<link>`-based API. This explainer doesn't rewrite them with `<script>`-based API yet. ### Resource Bundles A [resource bundle] is the same effort, with a particular scope. A [resource bundle] has a good [FAQ](https://github.com/WICG/resource-bundles/blob/main/faq.md#q-how-does-this-proposal-relate-to-the-web-packageweb-packagingweb-bundlesbundled-exchange-effort-repo) which explains how this proposal and a [resource bundle] are related. We have been collaborating closely to gather more feedback to draw a shared conclusion. [resource bundle]: https://github.com/WICG/resource-bundles ### Summarizing the contents of the bundle Several other mechanisms are available to give the bundler more flexibility or to compress the resource list. #### Approximate Membership Query datastructure A page still executes correctly, albeit slower than optimal, if a resource that's in a bundle is fetched an extra time, or a resource that's not in a bundle waits for the bundle to arrive before its fetch starts. That raises the possibility of putting a Bloom filter or other _approximate membership query_ datastructure, like a cuckoo filter or quotient filter, in the scoping attribute. In this case, it must not be an error if a resource matches the filter but turns out not to be in the bundle, since that's an expected property of this datastructure. ```html <link rel="webbundle" href="https://example.com/dir/subresources.wbn" digest="cuckoo-CwAAAAOztbwAAAM2AAAAAFeafVZwIPgAAAAA" /> ``` #### No declarative scope In some cases, the page might be able to control when it issues fetches for all of the resources contained in a bundle. In that case, it doesn't need to describe the bundle's scope in the `<link>` element but can instead listen for its `load` event: ```html <link rel="webbundle" href="https://example.com/dir/subresources.wbn" onload="startUsingTheSubresources()" /> ``` Since the web bundles format includes an index before the content, we can optimize this by firing an event after the index is received (which expresses the bundle's exact scope) but before the content arrives: ```html <link rel="webbundle" href="https://example.com/dir/subresources.wbn" onscopereceived="startUsingTheSubresources()" /> ``` ### Naming We might be able to use a link type as general as `"bundle"`, especially if it also uses the MIME type of the bundle resource to determine how to process it. We'll need to disambiguate between a bundle meant for preloading subresources and a bundle meant as an alternative form of the current page. The second can use `<link rel="alternate" type="application/web-bundle">`. # Acknowledgements Thanks to https://github.com/yoavweiss/cache-digests-cuckoo and https://github.com/google/brotli for the software used to generate sample attribute values.