Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert npm's compressed tarballs to uncompressed. #40

Closed
Daniel15 opened this issue Apr 30, 2016 · 16 comments
Closed

Convert npm's compressed tarballs to uncompressed. #40

Daniel15 opened this issue Apr 30, 2016 · 16 comments

Comments

@Daniel15
Copy link

Daniel15 commented Apr 30, 2016

Just discovered this project today through a comment on Reddit. Interesting project!

One thing that came to mind is that source control systems may not handle compressed tarballs very well, as every update to a package would result in a new copy of the entire file in the repo, which can make the repo very large. Perhaps uncompressed tarballs (ie. .tar files, not .tar.gz or .tgz) would be better since they can actually be diffed? I'm not sure if Git still treats tars as binary though, in which case it might not actually add any value. Just a thought :)

@webuniverseio
Copy link

Hi @Daniel15 I was experimenting with git lfs to get around the issue, but unfortunately it appeared to be quite slow as for every checkout from lfs it was taking 3 seconds per package to get downloaded. I didn't make any research yet, but I hope there is a way to speed up that process as if you have 1k of packages it will take forever to checkout.

@JamieMason
Copy link
Owner

Thanks for raising this @Daniel15. Initially I'm a little wary of diverging from npm's approach by using tarballs which aren't identical to theirs, however the point raised is good and valid, so should be investigated.

Unless someone offers a PR, I'll pick this up as part of performance improvements after bugs and features have been completed.

@JamieMason JamieMason changed the title Uncompressed tarballs? Convert npm's compressed tarballs to uncompressed. Jun 23, 2016
@JamieMason
Copy link
Owner

Digging into it this morning, it seems to be as simple as running eg. gunzip lodash-4.11.1.tgz. This will convert the tar from compressed to uncompressed in-place.

I've run this from my mac then tried an npm install lodash-4.11.1--uncompressed.tar on windows, which worked.

I will probably implement this behind an option such as shrinkpack --uncompressed.

@JamieMason
Copy link
Owner

Note to self, I've sketched up this script which should convert a .tgz to a .tar on Mac and Linux;

const fs = require('fs');
const path = require('path');
const spawn = require('child_process').spawn;

function getTarFromTgz(tgz, tar) {
  return new Promise((resolve, reject) => {
    const gunzip = spawn('gunzip', ['-c', tgz]);
    const wStream = fs.createWriteStream(tar);

    gunzip.stdout.setEncoding('utf8');
    gunzip.stdout.on('data', appendToTar);
    gunzip.on('close', onClose);

    function appendToTar(data) {
      wStream.write(data);
    }

    function onClose(code) {
      wStream.end();
      if (code === 1) {
        reject(`gunzip process exited with code ${code}`);
      } else {
        resolve(tar);
      }
    }
  });
};

// EXAMPLE USAGE
const readFrom = path.resolve('package.tgz');
const writeTo = path.resolve('package.tar');

getTarFromTgz(readFrom, writeTo)
  .then(tar => console.log(tar), err => console.error(err));

I will probably release this without Windows support initially, then add that in a later version.

@JamieMason
Copy link
Owner

Increasing priority as users are being affected: https://github.com/jmeas/moolah/issues/433.

@jamesplease
Copy link

jamesplease commented Jul 29, 2016

A few more details, for the record:

I didn't figure the issue out entirely, but it seemed sometimes git would not be sure if things were up to date and push lots of "objects" up to the repo as I worked on feature branches. An interesting thing to note is that shrinkpack was just in the git history, but not in the latest commit. In other words, I had added it and then removed it.

The pushes were about 50mb, which was particularly problematic for me because I:

  • work from my cellular data plan frequently
  • use GitHub's differ to see changes, so I push often

We were evaluating introducing this on some internal apps at Netflix, too, but we're concerned because our apps have many more dependencies than Moolah and other team members also use cell connections sometimes. I have a decent data plan but one night of work with 50mb per push would likely eat through it :)

An example log of pushing a few files containing only text:

push log (click to expand)

Jamess-MacBook:moolah jmeas$ git push origin action-creator-tests
Counting objects: 1957, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1778/1778), done.
Writing objects: 100% (1957/1957), 52.62 MiB | 676.00 KiB/s, done.
Total 1957 (delta 448), reused 1206 (delta 62)
To [email protected]:jmeas/moolah.git
 * [new branch]      action-creator-tests -> action-creator-tests

There seemed to be some steps I could take to make it not push every object (perhaps making sure that I had the latest code and rebasing before each push?) but I never figured it out entirely (I'm not an expert at git on that low of a level).

I removed shrinkpack from the git history – I'll be working on some feature branches tonight to see if it helps the problem. I'll post an update here.

Anyway, thanks for looking into this, @JamieMason !

@Daniel15
Copy link
Author

Daniel15 commented Jul 30, 2016

Note to self, I've sketched up this script which should convert a .tgz to a .tar on Mac and Linux;

This assumes gunzip is on the $PATH (I guess that's a reasonable assumption). Is there no way to natively un-gzip a file with Node.js? Pretty much every other language/framework has archive extraction as part of its core library.

@JamieMason
Copy link
Owner

Thanks @Daniel15, @jmeas that's really useful. Doing some work on this now.

@DrewML
Copy link
Contributor

DrewML commented Aug 2, 2016

Is there no way to natively un-gzip a file with Node.js

There definitely is. Should just be able as simple as passing a stream from fs.createReadStream into it.

@JamieMason
Copy link
Owner

perfect @DrewML 😎👌

@webuniverseio
Copy link

@JamieMason you posted a comment with test branch (npm install -g git://[email protected]:JamieMason/shrinkpack.git#7b2f341408be4f0415714ec57534debfdaaa3fbf). I checked and I get following error:

! Please raise an issue at https://github.com/JamieMason/shrinkpack/issues

    TypeError: Path must be a string. Received undefined
        at assertPath (path.js:7:11)
        at Object.join (path.js:466:7)
        at Object.onSuccess (D:\sz\websites\indigo\images-parent\images\node_modules\shrinkpack\
src\analyse\get-paths.js:17:19)
        at tryCatchReject (D:\sz\websites\indigo\images-parent\images\node_modules\when\lib\make
Promise.js:845:30)
        at runContinuation1 (D:\sz\websites\indigo\images-parent\images\node_modules\when\lib\ma
kePromise.js:804:4)
        at Fulfilled.when (D:\sz\websites\indigo\images-parent\images\node_modules\when\lib\make
Promise.js:592:4)
        at Pending.run (D:\sz\websites\indigo\images-parent\images\node_modules\when\lib\makePro
mise.js:483:13)
        at Scheduler._drain (D:\sz\websites\indigo\images-parent\images\node_modules\when\lib\Sc
heduler.js:62:19)
        at Scheduler.drain (D:\sz\websites\indigo\images-parent\images\node_modules\when\lib\Sch
eduler.js:27:9)
        at _combinedTickCallback (internal/process/next_tick.js:67:7)

@JamieMason
Copy link
Owner

thanks @szarouski, looking into it now

@JamieMason
Copy link
Owner

I see the problem, running shrinkpack and not shrinkpack . causes it, will fix that in a moment.

@JamieMason
Copy link
Owner

should be fixed now in npm install -g git://[email protected]:JamieMason/shrinkpack.git#708e7455a3ef0b154fce2cd8bf06bc44fca991a3 @szarouski, sorry about that.

@webuniverseio
Copy link

webuniverseio commented Aug 5, 2016

@JamieMason np, thanks for looking into that. I get a different message now:

events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: spawn npm ENOENT
    at exports._errnoException (util.js:1012:11)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:182:32)
    at onErrorNT (internal/child_process.js:348:16)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

Update: sorry I run shrinkpack through npm scripts, here is a better stack trace:

module.js:442
    throw err;
    ^

Error: Cannot find module 'lodash.assign'
    at Function.Module._resolveFilename (module.js:440:15)
    at Function.Module._load (module.js:388:25)
    at Module.require (module.js:468:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (C:\Users\Sergey\AppData\Roaming\npm\node_modules\shrinkpack\src\analy
se\index.js:2:14)
    at Module._compile (module.js:541:32)
    at Object.Module._extensions..js (module.js:550:10)
    at Module.load (module.js:458:32)
    at tryModuleLoad (module.js:417:12)
    at Function.Module._load (module.js:409:3)

Update2: I tried manually adding lodash.assign to dependencies and after running shrinkpack I get the error from the top of this comment.

@JamieMason
Copy link
Owner

Looking into this, there is an issue with child_process.spawn on Windows which I have added a fix for. I've tested commit e54ed6a on Windows and it is now working.

[1] http://stackoverflow.com/questions/27688804/how-do-i-debug-error-spawn-enoent-on-node-js
[2] nodejs/node-v0.x-archive#2318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants