Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep data in a single place #28

Open
rxaviers opened this issue Jan 22, 2016 · 4 comments
Open

Keep data in a single place #28

rxaviers opened this issue Jan 22, 2016 · 4 comments

Comments

@rxaviers
Copy link
Owner

Can we keep the data in one single/global place to avoid unnecessary downloads and to avoid unnecessary disk usage? For example, in ~/.npm/cldr-data/<version>. More details: the files of this repo would be normally installed by npm and therefore placed in each local node_modules path, but our post-install script would download CLDR into a global place (and avoid to re-download it in case it was already downloaded); then link those files for the respective node_modules install, so they can be accessed.

Benefits:

  • Avoid re-downloading CLDR data between multiple projects that depends on cldr-data: i.e., multiple cldr-data installs (using the same CLDR version) in the same machine would all share the same JSON files.
  • Avoid re-downloading CLDR data between multiple runs on Continuous Integration tests. Note, this setup requires access to the machine, so the cached files can be put in place once.

TODOs:

  • Can we re-use the same path npm already uses for caching, e.g., it is ~/.npm/cldr-data/<version> in linux? If so, how to figure out that path, is it available via an npm config / env var?
  • Does require('cldr-data/<arbitrary-json-file-path.json') still work using this approach? For linux and mac, we can use a link (ideally a symlink should be enough, but surely hardlinks would do it). For MS windows I don't know. If there's no solution for MS Windows, we could optimize (use the single/global shared place optimization) for linux and macs only.
@bajtos
Copy link
Contributor

bajtos commented Sep 1, 2016

FWIW, you can take a look at how https://www.npmjs.com/package/phantomjs-prebuilt is handling re-use of downloaded data.

@puzrin
Copy link
Contributor

puzrin commented Mar 21, 2017

Does require('cldr-data/<arbitrary-json-file-path.json') still work using this approach?

That's dangerous and make project fragile. Npm cache can be cleaned by different reasons with external tools. That should not fuckup project.

There still a chance to cache downloads, but those are 10x size less than unpacked data. Not a big deal.

@rxaviers
Copy link
Owner Author

Basically the feedback I got so far is: let's cache the downloaded packages, not the actual uncompressed cldr-data.

This would prevent re-downloading CLDR, but it wouldn't help de-duplicate disk usage. Not saying it's good or bad, I'm just posting this comment to sum it up.

@SlexAxton
Copy link

I think I agree in general with that approach (cache the resource but redo the work). It might be nice to support checking a sha-sum of the expected files somewhere so one dependency can't poison the other dependencies by putting bad data in the right spot.

In my experience reducing network activity is probably going to pay off a lot more than reducing a few mb on the hard drive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants