Optimizing static bundle size? #296

mbostock · 2013-02-24T17:32:49Z

The static bundling in browserify 2 looks very promising. However, one of my concerns is that it will increase the code size of the generated bundle as compared to simple concatentation, due to the boilerplate for each require'able file.

I expect it is possible to reduce the size of generated static bundle to levels comparable to concatentation, but this might break the design goals of browserify (say by not allowing bundled files to be require'd from outside the bundle). So I wanted to see your thoughts on this issue before I considered taking a crack at it myself.

As a contrived example, consider the following file, length.js:

module.exports = function(array) {
  return array.length;
};

Ignoring the fixed overhead (196 bytes) for the bundle, the incremental size of just this file minified in the static bundle is about 61 bytes:

1:[function(_,m){m.exports=function(a){return a.length}},{}],

In a non-browserify world, the length function might instead be implemented as:

function length(array) {
  return array.length;
}

Which minifies to 30 bytes:

function l(a){return a.length}

So, for the purpose of discussion, we can estimate that there is a per-file overhead of about 30 bytes when using static bundling versus concatenation. (Of course, building custom bundles using only the needed code via static analysis would produce a bundle that is far smaller than concatenating everything, but for this discussion I’m concerned with the default case, say where d3js.org provides a pre-built bundle with default functionality for convenience.)

If I extend this 30 bytes to D3’s 200 or so separate files, the resulting overhead is about 6KB, which represents about a 5% overhead on top of the 124KB d3.min.js. This isn’t huge, but at the same time, if I can avoid the overhead, that would make me happy.

One approach to reducing the size of the static bundle is to try to make it equivalent to concatenation. This might not be possible with certain edge cases, such as circular require's, but it wouldn’t be hard for a common usage pattern. For example, the minified length.js in the bundle might appear as:

L=function(a){return a.length};

This is only 31 bytes, identical to concatenation. And subsequently, all instances of require("length") would be replaced inline with L. (In practice, browserify could use long names for each required module, such as _require_length, and then uglifyjs could reduce them to minimal variable names without collision.) Assuming that all bundled modules are listed in dependency-order, this should function equivalently to the current approach, but be quite a bit smaller.

I guess the biggest downside of this approach is that non-exported local variables in the file would need to be namespaced so as to avoid leaking into other modules. This might be challenging to implement and would further increase the size of the non-minified bundle.

Anyway, curious to hear your thoughts. I might still be willing to go-ahead given all the other benefits, but I always like to have my cake and eat it too.

The text was updated successfully, but these errors were encountered:

jhnns · 2013-02-24T19:35:58Z

Personally I think that the benefits of using modular code outweighs the overhead in size by far. We should also remember that these files can usually be cached very aggressively (as long as they contain only application code). By splitting code into multiple, lazy bundles that are loaded on demand I think the syntactic noise is no longer a problem.

Its good to think about it and to optimize things but imho there are more important problems than file size 😉

sokra · 2013-02-24T22:40:15Z

gzip can reduce to overhead of repeating patterns like module.exports= very effectively. This would be premature optimizing in my opinion.

ghost · 2013-02-25T04:13:42Z

It would be very interesting to build a commonjs inliner tool as a pre-minification step before using uglify for the lower-level optimizations. Would it be possible to get some real-world figures for how much of a difference this require overhead makes percentage-wise in d3, both with and without gzip and uglify?

mbostock · 2013-02-26T01:05:13Z

Fortunately, this was easy to test with real data because @sebmarkbage has already done the conversion automatically; see the fork sebmarkbage/d3. Using this fork, I ran the following commands:

browserify d3.js -o d3.bundle.js
uglifyjs d3.bundle.js -c -m > d3.bundle.min.js

The generated bundle was 352K, minified 156K, minified and gzipped 52K. The concatenated files are 264K, 124K and 44K respectively, which translates to overheads of 33%, 26% and 18% respectively. This is quite a bit higher than my initial estimate of 6%, which could have something to do with the automatic conversion. It’s quite possible that doing the conversion manually would produce more optimal results.

But anyway, now we have a ballpark figure.

sebmarkbage · 2013-02-26T01:32:11Z

There's a lot of overhead in internal exports that are shared between modules. Especially when those are non-constant. Those can't be renamed internally in the modules. You could also force a rename of internal exports. That should save you a few bytes.

There are also modules that are purely internal which could be inlined together with other modules.

This code base is actually an excellent example where static exports and import * is useful. Since there are so many cyclic dependencies and late updates to shared variables. An ES6 based module system makes this very easy and more static assumptions leads to smaller files.

Converting this code base to ES6-style modules is simple. Just add a file called all.js containing "export * from ...;" for every file in makefile. Then add import * from "all" to the top of every file. Add "export" before every top level variable declaration or function declaration. Then you can easily do a sound analysis to package or convert it to Node-style modules. Therefore I'd recommend using ES6 modules in the source code.

You could also do unsound assumptions in tools like browserify to over come the limitations of the Node-style module format. Then you could make the packaging even smaller.

However, maintaining the current code base as idiomatic Node modules would be a significant shift in style. As evident by the conversion.

sebmarkbage · 2013-02-26T01:46:52Z

Oh and if you don't want to have incompatible syntax in editor and stuff. You can use Labeled Modules instead of ES6. https://github.com/sebmarkbage/link.js

defunctzombie · 2013-02-26T06:12:47Z

I would also add that there is a lot of duplication that could be avoided:
https://github.com/sebmarkbage/d3/blob/cjs/src/d3.core.js#L77-L87

I am sure there are more instances. d3 is a big library

For the above, I would also imagine that you could just do

d3.interpolate = require('./lib/interpolate');

and now all of the methods are exposed under d3.interpolate.array, etc.

I think if you work from a modular standpoint from the start you end up with a slightly different view on your requires and how to best make use of the space. Once you have everything organized via modules and require, you can very easily generate a dependency graph to start seeing what is used where. You can also make tools to identify unused requires and clean things up. the important takeaway is that it is MUCH easier to both test the code in parts and figure out wtf is going on when you have a module system.

sokra · 2013-02-26T06:47:54Z

I've used an alternative bundler which mangles require('./lib/interpolate') to require(23) and got this numbers: 351K, 138K, 45K. That means only a overhead of 2% gzipped.

defunctzombie · 2013-02-26T16:54:12Z

@sokra based on the numbers you have provided, I don't see how you got your 2%. Anyhow, I think all of this is trivial and the more important thing is for the project to start using require. We are talking a less than 10K difference when gzipped. This is beyond trivial and will get better with time. The gains of sane JS development outweigh this 10K IMHO.

sokra · 2013-02-27T06:00:15Z

@shtylman I just want to point out that not the function(x,y,z){ module.exports is the biggest part of the overhead, but the require('./lib/interpolate') stuff (especially the strings, which aren't mangled).

jhnns · 2013-02-27T10:29:30Z

I think @sokra means 45K compared to 44K 😄 . I really like @sokra 's approach of moving all the resolving stuff into the bundling process and require() only module ids on the client. This makes totally sense and still works fine in development mode because of comments ala require(/*myModule.js*/3).

But as @shtylman pointed out: 10k isn't worth the discussion imho. I'd appreciate if someone writes a module that squeezes out another 10k, but for production there are other ways of reducing much more size, like serving minified images (via JPEGmini or PNGGauntlet).

mbostock · 2013-03-02T16:22:00Z

@sokra Which alternative bundler did you use?

sokra · 2013-03-03T10:45:39Z

@mbostock webpack

It this case to only (relevant) difference is that require("xxx") calls are mangled:

for a module

var add = require("./math").add;
exports.increment = function increment(i) {
  return add(i, 1);
};

browserify generates modules like this:

0: [function(require, module, exports) {
  var add = require("./math").add;
  exports.increment = function increment(i) {
    return add(i, 1);
  };
}, {"./math": 1}]

and webpack generates modules like this:

0: function(module, exports, require) {
  var add = require(/* ./math */1).add;
  exports.increment = function increment(i) {
    return add(i, 1);
  };
}

That's the whole magic of the 10K.

Minimized:

0:[function(n,a,t){var r=n("./math").add;t.increment=function(n){return r(n,1)}},{"./math":1}]
0:function(n,r,a){var t=a(1).add;r.increment=function(n){return t(n,1)}}

rowanbeentje · 2013-05-20T13:49:35Z

I think we all agree that there's big benefits to using modular code, but I think the purpose of this ticket is to ensure that there isn't a massive overhead in doing so.

I work on a project which currently has >2000 require statements, and rising as we increase modularity. We recently switched to browserify from a home-rolled solution, and everything was great - except for a ~10% increase in size for our minified javascript, adding an overhead of around 80k!

While a significant chunk of that would be removed by gzip during transfer, we also care about overall size for another reason - we cache javascript on the client using localStorage, which is quite size-sensitive.

10% overhead isn't massive, but it can be avoided using an approach like the one @sokra suggested, meaning everyone can use the advantages of sane modular development without incurring an overhead.

I've hacked together a solution in browserify to prove the kind of gains this gives. See:

rowanbeentje/node-detective@d4f6cd0
rowanbeentje/browser-pack@a18051e
rowanbeentje@f2e2673

This changes browserify to perform require() lookups by index, so that require("./math") becomes require(/* ./math */ 1 ). As well as minification then being able to strip out the comments leaving a nice short r(1)-type call, this also means the mapping definition and the surrounding array brackets can be removed, and makes the require() function definition both shorter and faster, as there's one less lookup required per require() call.

The implementation isn't massively neat - it results in detective() being called twice, and the current implementation only works for Literals and also doesn't support external name-based lookups, so some of the tests currently fail. It's also significantly slower to run browserify over a large source tree because of the double detective invocations and resulting string handling, but this can all be improved - and the proof-of-concept works well and has clawed us our 80k back.

Suggestions on how to implement this in a neater fashion accepted, though I'm hoping @substack will now jump in and implement it properly considering the advantages shown :)

jhnns · 2013-05-20T14:40:33Z

Nice! Replacing module filenames with ids makes totally sense to me. 😄

Why do you cache JavaScript using localStorage? Shouldn't the HTTP-cache be used for static resources? In combination with long-time caching and hashed bundlenames (like app.jkr74hbg013276nad22.js) the browser doesn't need to do any requests.

rowanbeentje · 2013-05-20T14:52:09Z

@jhnns That's very much a discussion for elsewhere, but think mobile phones, offline usage, managed upgrades, and appCache issues :)

josdejong · 2013-08-14T19:45:10Z

+1 for (optionally) replacing the module filenames with ids!

guybrush · 2014-01-20T13:51:51Z

i think browserify shouldnt replace filenames with ids, it just complicates things and is not worth the effort

https://github.com/guybrush/browserify-paths2ids

rowanbeentje · 2014-01-20T14:43:16Z

@guybrush Is that with 16-character filename hashes?

As above, the solution I had up and running for a while which uses numerically indexed ids saved around 80k (10% of minified size) for a large project, which I keep meaning to come back to...

ghost · 2014-01-27T02:10:25Z

Here is a tool for converting require('pathname') to require(integer) to save space: intreq. It might be good to do this automatically in --standalone mode to save bytes.

mikermcneil · 2014-02-19T15:26:18Z

@substack thanks for the link!

We'd used browserify in the past for front-end stuff, but I recently experimented with using browserify to minify waterline. It's adapter-based, and supports streams, so I was curious, and a bit skeptical, if we could use it client-side. Ran it through browserify --standalone and in like, a couple of seconds later, it worked. I was giddy. Added a crappy little npm script to the package.json so we could do npm run browserify, did the whole bower thing, and we were off to the races.

There's only one problem-- download size. To make it a realistic solution for us to use on projects as an ORM with things like Angular, we've got to get it a little smaller. I reckon we could require fewer things, but we haven't invested the time yet. I'll try out intreq and report back on the gains.

nikku · 2014-06-19T22:08:29Z

I am having a bit trouble setting up intreq as the default packer in my bundle build.

I used a custom packer function like this

function pack(params) {
  var intreq = require('intreq'),
      browserPack = require('browser-pack');

  params.raw = false;
  params.sourceMapPrefix = '//#';

  return intreq().pipe(browserPack(params));
}

browserify({ pack: pack });

The above code snippet should resemble the browserify example/main.js | browser-unpack | intreq | browser-pack pipeline.

Any hints what I might be doing wrong?

ghost · 2014-07-25T02:06:58Z

I finally got around to this: bundle-collapser.

Give it a bundle.js as input and it collapses the require('foo/blah.js') calls down to require(23) which can be much more effectively minimized.

mikermcneil · 2014-07-25T03:11:42Z

@substack awesome news!!

josdejong · 2014-07-25T13:36:37Z

Nice work!

jdalton · 2015-02-05T21:19:13Z

So nice!

mbostock · 2015-02-05T21:40:41Z

Nice!

sokra mentioned this issue Mar 3, 2013

optimized some require calls sebmarkbage/d3#1

Closed

andreypopp mentioned this issue Feb 2, 2014

Make a packer for browserify RReverser/pure-cjs#2

Closed

ghost closed this as completed Jul 25, 2014

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing static bundle size? #296

Optimizing static bundle size? #296

mbostock commented Feb 24, 2013

jhnns commented Feb 24, 2013

sokra commented Feb 24, 2013

ghost commented Feb 25, 2013

mbostock commented Feb 26, 2013

sebmarkbage commented Feb 26, 2013

sebmarkbage commented Feb 26, 2013

defunctzombie commented Feb 26, 2013

sokra commented Feb 26, 2013

defunctzombie commented Feb 26, 2013

sokra commented Feb 27, 2013

jhnns commented Feb 27, 2013

mbostock commented Mar 2, 2013

sokra commented Mar 3, 2013

rowanbeentje commented May 20, 2013

jhnns commented May 20, 2013

rowanbeentje commented May 20, 2013

josdejong commented Aug 14, 2013

guybrush commented Jan 20, 2014

rowanbeentje commented Jan 20, 2014

ghost commented Jan 27, 2014

mikermcneil commented Feb 19, 2014

nikku commented Jun 19, 2014

ghost commented Jul 25, 2014

mikermcneil commented Jul 25, 2014

josdejong commented Jul 25, 2014

jdalton commented Feb 5, 2015

mbostock commented Feb 5, 2015

Optimizing static bundle size? #296

Optimizing static bundle size? #296

Comments

mbostock commented Feb 24, 2013

jhnns commented Feb 24, 2013

sokra commented Feb 24, 2013

ghost commented Feb 25, 2013

mbostock commented Feb 26, 2013

sebmarkbage commented Feb 26, 2013

sebmarkbage commented Feb 26, 2013

defunctzombie commented Feb 26, 2013

sokra commented Feb 26, 2013

defunctzombie commented Feb 26, 2013

sokra commented Feb 27, 2013

jhnns commented Feb 27, 2013

mbostock commented Mar 2, 2013

sokra commented Mar 3, 2013

rowanbeentje commented May 20, 2013

jhnns commented May 20, 2013

rowanbeentje commented May 20, 2013

josdejong commented Aug 14, 2013

guybrush commented Jan 20, 2014

rowanbeentje commented Jan 20, 2014

ghost commented Jan 27, 2014

mikermcneil commented Feb 19, 2014

nikku commented Jun 19, 2014

ghost commented Jul 25, 2014

mikermcneil commented Jul 25, 2014

josdejong commented Jul 25, 2014

jdalton commented Feb 5, 2015

mbostock commented Feb 5, 2015