Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement jest-haste-map instead of node-haste #896

Closed
wants to merge 10 commits into from

Conversation

cpojer
Copy link
Member

@cpojer cpojer commented Apr 15, 2016

This is a new haste map implementation for Jest which is much more scalable than node-haste.

node-haste2 isn't well designed and not scalable for short-lived services like Jest. The startup time of node-haste1 vs. node-haste2 on www is almost the same, both between 6 and 8 seconds which is not acceptable for our engineers. This implementation is attempting to accomplish a much reduced and more scalable startup time. It also has reduced scope – the goal of this is to only build a haste map and provide a way to resolve a one-level deep dependency tree, which is all that Jest really needs from node-haste. jest-haste-map can serve as an ideal basis for rewriting node-haste2 (into node-haste3!).

With this, the cold start time (building the entire haste map) is now about 14 seconds on www (that's acceptable) but the incremental invocation is only 2 seconds (@kyldvs will love me). I haven't heavily micro-optimized the JavaScript in packages/jest-haste-map/src/index.js and there is a bunch of data copying with HasteResolver.js – I believe I can get close to 1 second with these optimizations once I'm done. One of the goals I have is to allow tacking on data (list of mocks) to the haste map and serialize the haste map and read that one in directly in the workers instead of keeping two caches.

On every startup node-haste2 asks watchman (or node) for a list of all files. It builds up an in memory-file system and then processes every single file through an insane amount of promise chaining. It takes about 3 seconds on react-native to do this on a warm start. With this new implementation the cold start is about 1.5 seconds and warm start is way less than a second.

jest-haste-map only does the minimum processing necessary: it retrieves a list of all files only on the first start when using watchman and subsequent invocations only receive the deltas of changed files since the last time Jest has run. It further processes only files that have been changed or added. The node crawler isn't as optimized but it should be faster than node-haste2 as well.

What to review:

  • packages/jest-haste-map/src/index.js
  • packages/jest-haste-map/src/worker.js
  • packages/jest-haste-map/src/crawlers/
  • The other modules in jest-haste-map were lifted from node-haste2 and just received small code style updates (removed babel compilation etc.)
  • src/resolvers/HasteResolver.js for usage.

cc @davidaurelio @kentaromiura @jeffmo @vjeux
cc @amasad – if you could take a quick look, I would greatly appreciate it :)

@cpojer
Copy link
Member Author

cpojer commented Apr 15, 2016

Actually the time it takes to run a fast test (~10ms) on www with this is 2s overall. That includes the 500ms overhead of loading jsdom too, so we are already pretty close to 1s for building the haste map.

@cpojer cpojer force-pushed the jest-haste-map branch 3 times, most recently from ab2582a to db220d7 Compare April 15, 2016 08:05
path.sep + '**' + path.sep + '*.{' + extensions.join(',') + '}';

return Promise.all(roots.map(
root => globPromise(root + pattern)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this nice and simple implementation of the node crawler actually turns out to have 2x worse performance than the one in node-haste. I might revert to that one instead. Didn't think glob would be sooo slow.

@cpojer cpojer force-pushed the jest-haste-map branch 2 times, most recently from 8e447d9 to 10f8caf Compare April 15, 2016 09:03
@cpojer
Copy link
Member Author

cpojer commented Apr 15, 2016

Done in e3189c4: "Clean up HasteResolver and file caching for all of this". Now we only create one cache file. It is amended by "HasteResolver" with mocking information and is directly read from a TestWorker.

One nice side-effect of all this work is that Jest's own test suite is almost 2x as fast now. It only takes 2.5s instead of 4-5s.

/*
* original author: dead_horse <[email protected]>
* ported by: yaycmyk <[email protected]>
*/
Copy link
Contributor

@davidaurelio davidaurelio Apr 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could take over the “fastpath” name on npm :-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fasterpath" :)

@davidaurelio
Copy link
Contributor

congrats on the speed improvements, that’s awesome!

I don’t have too much to say about the code. The indentation level of promises can also get quite deep :-)
What was our case for multiple roots? Do we really need that?

@davidaurelio
Copy link
Contributor

Should we write a dex with the haste resolution rules?

@davidaurelio
Copy link
Contributor

we should try to get the “fastpath” module and npm and publish the two file crawlers seperately + an watchman auto-detection function.

@davidaurelio
Copy link
Contributor

A quick synthetic benchmark seems to suggest that native Map and Set are really fast in node.js, even faster than property lookups on objects with many entries. So there isn’t really a case for Object.create(null). Deserialization from JSON also doesn’t need the setPrototypeOf trick. Serialization, on the other hand, is uglier (.stringify(Array.from(map.entries())))

@cpojer
Copy link
Member Author

cpojer commented Apr 16, 2016

Thanks for the review.

  • We use testPathDirs as separate roots extensively on www. This reduces the scope of files to crawl
  • Yes, we should probably write a dex :D
  • It is probably fine to publish this as jest-haste-map as a first version and then consider splitting out certain packages into separate modules, depending on what other people might need.
  • Yes, Map and Set are super fast but they don't serialize well to JSON. Is writing to a Map or Set as fast as it is to an Object? We are doing so many writes during startup, which is quite critical when you have 100k+ items. https://github.com/cognitect/transit-js could help to speed up serialization – it is basically a custom serializer that is really fast and can encode/decode to immutable data structures. I think right now that is more than I need but if it yields real performance gains, I'd consider going for it :)

@cpojer cpojer force-pushed the jest-haste-map branch 7 times, most recently from 01bd200 to b743e43 Compare April 18, 2016 08:13
@cpojer
Copy link
Member Author

cpojer commented Apr 19, 2016

I made a bunch of micro-optimizations in ed2ee78

Since both my metadata objects for all files/modules contain the same fixed amount of items, using constant keys and arrays was a great optimization and the code complexity stayed pretty much the same (module[H.ID] instead of module.id, for example).

The file size reduction of using constant keys and arrays instead of long string names and objects was 10 mb (!) down from 26 to 16 on a large code base. I was then able to exclude 5 % of the files from the original list and am now down to 14 mb. The parse time went down from 500ms to 250ms. The write time, interestingly, is still at 300ms. I looked into compacting the paths but that would only yield a win of another 2mb or so but it would add additional post-processing. I believe with this model the file size will grow to 30 mb over the next 3-4 years and decode in less than a second which will be more than good enough. Then, someone can consider different serialization models but right now we are in the diminishing-returns territory. The data model is simple enough and the code is easy enough to change.

Things I tried:

  • JSONStream: this is memory efficient but actually very slow
  • Decoding every metadata object individually so I could store them using my own data format. loop over JSON.parse(item) had exactly the same performance characteristics as JSON.parse(allItems);.

@cpojer cpojer force-pushed the jest-haste-map branch 2 times, most recently from 0607cf7 to ead50f7 Compare April 19, 2016 06:49
function findNative(roots, extensions, ignore, callback) {
const args = [].concat(roots);
args.push('-type', 'f');
extensions.forEach((ext, index) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need parentheses around the extensions

find x -type f -name '*.json' -o -name '*.js' is not equivalent to
find x -type f \( -name '*.json' -o -name '*.js' \)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, what's the difference? It seems to work fine on www for example. This is what it was like in node-haste1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first one would also match directories with ending with “.js”

static getCacheFilePath(tmpdir) {
const hash = crypto.createHash('md5');
Array.from(arguments).slice(1).forEach(arg => hash.update(arg));
return path.join(tmpdir, hash.digest('hex'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to add a human-readable prefix here. That makes it easier to delete these files manually if necessary for debugging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! Let's use the name from the options and make it tmpdir/<name>-hash.

return path.join(tmpdir, hash.digest('hex'));
}

build() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I’d prefer this to be a static method that returns a promise that resolves to an instance. That way, there’s no need to call this.build().then(…) / hasteMap.build().then(…) anywhere.

But that’s a cosmetic change, so feel free to ignore this comment :-)

@davidaurelio
Copy link
Contributor

davidaurelio commented Apr 21, 2016

This looks solid in general.

To make this the base of node-haste (and have RN packager continue to use node-haste), we need better decoupling of caching, crawling, building the map, and extracting requires.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 14, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants