Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable --dev-mod-res switch to better support development-time module storage and installs #9428

Closed
ghost opened this issue Nov 2, 2016 · 14 comments
Labels
feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem.

Comments

@ghost
Copy link

ghost commented Nov 2, 2016

  • Version: latest
  • Platform: all
  • Subsystem: Modules

*** CAVEAT ***

The author of this issue recognizes it involves a subsystem of node that has been locked, and that there is a somewhat heated history around the subsystem. This may create the potential of any nodejs decision makers in immediately dismissing this issue out-of-hand as a non-starter. It is humbly and respectfully requested the issue be given equal consideration as any other issue, and that it be recognized it's intended to solve a real problem (that's fundamentally a result of the amazing success and wide usage of node).

It should also be noted that the implementation this issue recommends in no way breaks the current algorithm implemented by the affected subsystem, that when activated is fully interoperable with the current algorithm while conceptually embracing the simplicity of the current algorithm, that it must be explicitly activated, and that it is meant to address problems primarily encountered during development-time activities.

The author has implemented the solution this issue suggests in a fork/branch of node in less than 50 lines of code (<35 in javascript, <10 in C++), and should response to this issue be positive, would immediately create a pull request and fully support and implement any issues with the PR.

Problem Statement

It's unlikely anyone would dispute that npm has contributed greatly to the wide adoption and use of node. It seams nodejs itself has recognized npm as the defacto standard node package manager as it's included with node installers. This is important to this issue, as it really addresses how node and npm work together with respect to how modules are stored and referenced as dependencies of other modules.

It is typical that developers within the node ecosystem may eventually (often quickly) come to have dozens of node based projects on their development machines, where each project is stored in its own project folder. As a consequence of how npm currently manages installation of modules, this creates two basic problems.

Problem One - Storage Space

Because npm creates physical copies of modules when a developer requests it to install all descendant dependencies rooted in a given package.json file, (typically the root package.json of a project for some module, be-it a top-level entry-point type module, or shared-library type module), it's often the result that a version of a referenced dependency module, and all its decedent dependencies, are physically residing on the machine multiple times.

This, in combination with typical numbers of node based projects on a developers machine, often results in gigabytes of storage being consumed (for example, the author has 90+ projects, consuming almost 50 GB in just redundant copies of dependency modules).

While it could be argued this isn't that big of a problem considering the continuing decline of the cost of storage, especially around SSDs, it should not be ignored. Besides the consumed space (which back-of-the-napkin math shows is typically 20-30 times more than what is actually necessary if module versions were only ever physically installed in one place, then referenced when needed), there's also how it impacts basic activities such as deleting projects, optimal use of the underlying OS's file cache, and others.

Problem Two - Install Duration

A significant amount of time is taken by npm to unpack and copy modules when a developer requests an install of a package.json; 'significant' is in comparison to simply referencing an already installed module (back-of-the-napkin math and initial empirical observation shows referencing an existing physical installation can be 30+ times faster than always copying).

Installation time impacts the initial install of a project/package, but there are other instances when full installs occur beyond the initial one. These typically occur for two primary reasons (although there may be others):

  1. During development of a package, one or more of its dependencies needs to be upgraded. Npm does not have the best history of pulling this off with out problems, and at times what's required is the deletion of the entire node_modules folder for the project, and a re-install of the project's module dependency tree.
  2. At times a developer must implement a bug or enhancement on an early version of a project, and the early version has a different module@version dependency tree. Like reason 1) above, it's often best to delete the node_modules folder and re-install.

Problem Impact

If the above issues occurred once or twice every few months, they probably aren't 'problems'. However, relative to a developer's involvement in the node ecosystem, the above issues can occur several times a month or more, and it ends up becoming more a problem akin to 'death by a thousand paper-cuts'.

Root Problem

It should be obvious to the knowledgeable reader that the obvious solution is to simply symlink modules. The documentation for the Modules subsystem implies that in principal, package managers should be able to implement this.

However, in practice with npm, this cannot be achieved, as npm allows the specified version of a module's dependencies to not be unary, but effectively a list of possible versions, where the 'highest' version, as determined by semver semantics, is chosen at install time. This means that two or more projects that depend on the same module@version, may end up installing slightly differing versions of the dependencies the common module@version itself depends on. Because a module's dependencies can be placed in the `node_modules' subdirectory of the module, this means that the module@version can't be physically installed once and still allow npm to offer reasonable guarantees as to the versions of the entire module dependency tree of a given project when symlink'ing all modules; the only way to guarantee is to physically copy the modules. (Note: npm offers bundling and shrink-wrapping to precisely control dependencies at the version level, but this is a tangent to this issue).

This is because the only way node currently allows a module's dependencies to be precise is by installing those dependencies in a sub-directory named 'node_modules' underneath the module's directory.

Solution

Implement a --dev-mod-res switch (and NODE_DEV_MOD_RES environment variable) in node, which stands for 'development-time-module-resolution', which activates augmenting node's behavior in two ways:

  1. Search for a module's dependencies not just in a subdir of the module named 'node_modules', but then in an adjacent directory to the module named '.node_modules'. For example, in node's example under Loading from node_modules folder, the list and order of the directory searches becomes:
  • /home/ry/projects/node_modules/bar.js
  • /home/ry/projects.node_modules/bar.js
  • /home/ry/node_modules/bar.js
  • /home/ry.node_modules/bar.js
  • /home/node_modules/bar.js
  • /home.node_modules/bar.js
  • /node_modules/bar.js
  1. Preserve symlinks for all module paths, including the entry module (which, in combination with the NODE_DEV_MOD_RES, allows npm to implement it's lifecycle steps that depend on installed modules).

With this augmentation, all modules can by physically installed once on a machine, while they and their dependencies can be symlinked, and a module@version that happens to be symlinked in multiple projects can still have its dependencies effectively determined (symlinked) specific to the project it is being used in.

Note: node's Modules documentation warns that preserving symlinks can cause unexpected behavior, specifically siting that node will fail if two different symlinks refer to the same native node module. This can be easily addressed by using the module's specified path to determine if a module should be loaded, but always using the realpath to determine if the module has already been loaded. The current fork/branch does not implement this, but the author would do so should this issue garner positive support.

The author of this issue, and a node fork/branch of its implementation, has also implemented a preliminary fork/barnch to npm to work in combination with the --dev-mod-res switch. The change is the addition of a new command to npm called 'mount', which is conceptually identical to 'install' except it symlinks modules, and symlinks dependencies into .node_modules folders. The current implementation of 'mount' in npm makes slight adjustments to its 'install' command, and is relatively low impact to npm from an amount-of-code perspective (about a 100 lines sofar). However, it has currently only been implemented enough to prove the concept.

Depending on the response to this issue, the author would fully implement 'mount' in npm.

Thank you for your valuable time in reading, contemplating, and responding to this issue.

@evanlucas
Copy link
Contributor

Hi @phestermcs! Thanks for taking the time to write this up. The module system is currently frozen. We have to carefully evaluate any changes to the module system to prevent major ecosystem breakage. I am currently -1 on implementing this change.

@ghost
Copy link
Author

ghost commented Nov 2, 2016

@evanlucas Thank you kindly for replying. As was described in the original issue, this change is non-breaking, and opt-in. In other words, unless the switch is active, the module system behaves identically to how it currently does. With the switch active, it STILL behaves identically to how it currently does with respect to how the current eco system expects module resolution to occur solely through hierarchical node_modules folders.

You stated:

We have to carefully evaluate any changes to the module system to prevent major ecosystem breakage

Can you help me understand how this issue was 'carefully evaluated', and how it would create 'major ecosystem breakage'. It seems neither have been established, and articulating how either was done with this specific issue will certainly help me decide how to pursue garnering support for this issue.

Regards

@mscdex mscdex added module Issues and PRs related to the module subsystem. feature request Issues that request new features to be added to Node.js. labels Nov 2, 2016
@rvagg
Copy link
Member

rvagg commented Nov 3, 2016

https://github.com/nodejs/node-eps might be a better location to file something like, sticking with the format used there. However, I really think you're not going to have a good time getting any of the core team agreeing to changes in this area.

Some thoughts arising from your proposal:

  • You're pointing to limitations in npm, it's not really core's place to fix deficiencies with the package manager, you could take it up there and there are in fact options for how npm could behave differently, although again you're not likely to get buy in for major changes there either.
  • Have you looked at alternative package managers? It's becoming a competitive landscape and you might find solutions to what you're seeing as problems already out there. Off the top of my head I think ied might do some symlinking that could potentially save you a lot of space. if it's not quite right then you could propose a change to that.
  • I've heard of people doing their dev inside a folder named "node_modules" to get some crazy broad resolution happening. Not a strategy I'd recommend, even in dev, I prefer to be explicit (which is another reason I object to the broad proposal here)
  • Have you tried using npm link for dev? A lot of us find it a useful tool for developing across multiple projects, it's an easy way of managing symlinks.
  • Perhaps you should just build a tool to manage symlinks for you in bulk for dev. I reckon your needs could be met by tooling outside of core.
  • Have you played with --preserve-symlinks? Your OP suggests that you might be aware of it, it's something that the team hopes might become default one day but we have too many breaking blockers to doing that at the moment, but there's nothing stopping you from using it if you can make it work for you.

@ghost
Copy link
Author

ghost commented Nov 3, 2016

@rvagg Thank you kindly for responding as I absolutely appreciate the sensitivities around changes to Modules subsystem. Please allow me to respond to each of your points.

You're pointing to limitations in npm, it's not really core's place to fix deficiencies with the package manager, you could take it up there and there are in fact options for how npm could behave differently, although again you're not likely to get buy in for major changes there either.

My proposal is orthogonal to all package managers (the ones I'm aware of are npm, yarn, ied), and the core problem is impossible to fix with any package manager. I would therefore not characterize the problem as a limitation in npm, but rather a behavior in node that creates a limitation in all package managers. All package managers are fundamentally prevented from symlinking modules within individual projects to single physical copies of the modules stored in a machine wide global cache (regardless if it's uniquely identified by its name and semver, or its name and a hash), specifically because its possible a module used in a project can have its dependency's versions slightly different from project to project as a consequence of the particular projects entire dependency tree, and because such dependencies can currently only be stored in a node_modules folder underneath the module. For example:

// project A's modules where installed 1 month ago, so
projectA
    /node_modules
        /modA //[v1.0.0]
            /node_modules
                /modB //[v1.1.0] from pkg.deps.modB: "^1.0.0"

// project B's modules were installed today.
// between a month ago and today a new modB was released.
projectB
    /node_modules
        /modA //[v1.0.0]
            /node_modules
                /modB //[V1.2.0] from pkg.deps.modB: "^1.0.0"

It's therefore impossible to have a single physical location of [email protected] that both projects can symlink to, and still have a guarantee about all versions in the entire dependency tree. There are also other use cases that result in the same problem.

With the proposed change, the above can now be done as such, and all package managers can benefit in supporting:

// project A's modules where installed 1 month ago, so
projectA
    /node_modules
        /modA //[v1.0.0] symlinked to global copy sans node_modules subfolder
        /modA.node_modules
            /modB //[v1.1.0] from pkg.deps.modB: "^1.0.0" and symlinked to global copy for v1.1.0

// project B's modules were installed today.
// between a month ago and today a new modB was released.
projectB
    /node_modules
        /modA //[v1.0.0] symlinked to same global copy symlinked in projectA
        /modA.node_modules
            modB //[V1.2.0] from pkg.deps.modB: "^1.0.0" and symlinked to global copy for v1.2.0

Have you looked at alternative package managers? It's becoming a competitive landscape and you might find solutions to what you're seeing as problems already out there. Off the top of my head I think ied might do some symlinking that could potentially save you a lot of space. if it's not quite right then you could propose a change to that.

Ied is using symlinking, but only after all modules have been physically copied into a projects node_modules folder; symlinks are only being used to point back to these copies with a given module's node_modules.

I've heard of people doing their dev inside a folder named "node_modules" to get some crazy broad resolution happening. Not a strategy I'd recommend, even in dev, I prefer to be explicit (which is another reason I object to the broad proposal here)

I researched this approach, and quickly dismissed it because it fundamentally does not let a single physical copy of a module be used in many different places and is entirely unmanageable in any deterministic way. Nothing in my proposal prevents being explicit!

Have you tried using npm link for dev? A lot of us find it a useful tool for developing across multiple projects, it's an easy way of managing symlinks.

I have used npm-link, but it's purpose is to symlink a single dependency to another location in the file system in order to support developing two or more modules at the same time, but as separate entities. It would in no way support what my proposal does: a single physical copy of a module symlinked whenever it is referenced across multiple projects.

Perhaps you should just build a tool to manage symlinks for you in bulk for dev. I reckon your needs could be met by tooling outside of core.

I looked at this approach, and again, not possible because node's logic creates a fundamental limitation.

Have you played with --preserve-symlinks? Your OP suggests that you might be aware of it, it's something that the team hopes might become default one day but we have too many breaking blockers to doing that at the moment, but there's nothing stopping you from using it if you can make it work for you.

Yes, I've certainly played with that. And in fact if you read my proposal carefully, it preserves all symlinks including that of the entry-module (which node does not; it always uses the realpath of the entry module, which prevents any package manager from running certain lifecyle scripts that depend on modules that were symlinked into the project)

I'm starting to wonder who do I have to pay and how much to have this issue seriously considered, especially as it's non-breaking to the current algorithm.. :(

@ghost
Copy link
Author

ghost commented Nov 3, 2016

@rvagg Just wanted to add that the proposal is entirely interoperable with the current logic, and gives precedence to the current logic, and doesn't break the current logic!!!

Regarding your suggestion of submitting through node-eps, this proposal is so tiny and non-breaking it didn't seem to warrant node-eps; I believe it's only perception this proposal would be some massive breaking change to a fundamental and locked subsystem! I can't stress this enough: It does not break existing logic in any way. The only edge case I can think of would be if someone actually gave a full module name of 'myMod.node_modules', and then opted in to the --dev-mod-res switch; a minuscule probability.

An alternative approach would be for all package managers to always require dependencies to specify precisely one version, rather than ranges of versions (i.e. like ^1.0.0). However, it would be much easier to implement this proposal, than require the developers of 100's of thousands for modules to change their dependency version specifiers in package.json.

@evanlucas
Copy link
Contributor

Can you help me understand how this issue was 'carefully evaluated', and how it would create 'major ecosystem breakage'. It seems neither have been established, and articulating how either was done with this specific issue will certainly help me decide how to pursue garnering support for this issue.

You haven't submitted a pull request, so we can't carefully evaluate whether or not it would break anything.

It does not break existing logic in any way.

My biggest concern with this is unintended consequences from the change (which have happened before)


People use node in a variety of ways. This would be a breaking change if someone already has a directory that is in the format you are suggesting (foo.node_modules). That would be a breaking change. Although that may (or may not) exist currently in the ecosystem, we don't have a reliable way to determine if there are private packages/applications that depend on this specific behavior.

This proposal extends the lookup paths for modules which in turn changes the module resolution algorithm.
I'm -1 for extending the lookup paths.

@Fishrock123
Copy link
Contributor

While I have not read though the wall of text, I suspect this will be impossible with ES Modules regardless, making this less favorable for the future.

@ghost
Copy link
Author

ghost commented Nov 3, 2016

@evanlucas I do appreciate you taking the time to respond.

I can submit a pull request, but given the sensitivities around change Modules, I'm trying to gauge if there is any support. What I am intending to do is add a little more to the npm side, then provide links to the forked branches so things can at least be looked at and experimented with.

People use node in a variety of ways. This would be a breaking change if someone already has a directory that is in the format you are suggesting (foo.node_modules). That would be a breaking change. Although that may (or may not) exist currently in the ecosystem, we don't have a reliable way to determine if there are private packages/applications that depend on this specific behavior.

Regarding any changes made to node, what reliable ways does the node team currently use to analyze if there are private packages/applications that depend on a particular behavior?

How does the node team balance implementing a change that can be empirically show to demonstrate obvious benefits to many (in this case most everyone using node/npm, with respect to storage space and install times) vs. it's potential in breaking some on a theoretical edge case (someone named their module "myMod.node_modules")?

This proposal extends the lookup paths for modules which in turn changes the module resolution algorithm.

True, but first it's opt-in by an explicit switch, meaning nothing changes unless the switch is activated. When the switch is active, then the user is requesting the change, and even in that case the change is in all probability not going to be a major breaking change. Node seemed to get along just fine with the --harmony switch. How would this be different?

Does the node team recognize that because a depending module (via package.json's dependencies property(s)) can specify ranges of versions for the modules it depends on, and that node requires that a module install can only be precise with what modules it depends on via placing in a node_modules subdirectory, that it's impossible to symlink modules to a global cache? No package manager can work around that.

Your time in helping understand the issues is greatly and genuinely appreciated.

@ghost
Copy link
Author

ghost commented Nov 3, 2016

@Fishrock123 I do appreciate you at least attempting to read the wall of text. I've read section 5.2 of the ES6 Module Interoperability eps and it seems this proposal would still be applicable and functional even in that case. Can you provide any more insight as to why you suspect it would be impossible?

@sam-github
Copy link
Contributor

Fwiw, I read the wall of text, and didn't understand what problem you are trying to solve until I saw the top example in #9428 (comment). I think this is a lot of fragile machinery to solve a problem not many people think they have, and npm3's dependency resolution algorithm is already touchy and slow enough, compared to npm2, I don't see how a system like this is going to help. Its a really complex layout on disk to share things between apps that should not be shared.

Also, you have not done this:

empirically show to demonstrate obvious benefits to many (in this case most everyone using node/npm, with respect to storage space and install times)

You have said that you personally would like to use less disk space for your apps, and you claim that resolving and building the links will somehow be faster than what is done now. That's a far cry from universal appeal.

Also, if you can't think of any negative side-effects of this, here's some:

For one, if I have app A that I'm working on using node 4, and app B that I'm working on with node 7, and their addon dependencies are symlinked, then one or the other is perpetually not going to work.

For another, the increased complexity of the module docs and burden on node to maintain, keep backwards compatible, and support for all time this complexity.

I'm -1 on this (whether the module system was locked or not). I think you should make some effort to modify or create a package manager that has most of the properties you want (though I agree it appears there is one thing you can't solve without changes to node core), demonstrate its appeal by its massive user base, and then come back and make the case it would be even better with some tweaks to module's resolution algorithm.

@ghost
Copy link
Author

ghost commented Nov 3, 2016

@sam-github Thanks for replying. It's becoming obvious my attempts to be clear instead created a 'wall-of-text' that's probably inhibiting in some way. My sincere apologies for putting you and others through that.

The docs for Module suggest a way package managers could use symlinking to modules; an excerpt:

When the code in the foo package does require('bar'), it will get the version that is symlinked into /usr/lib/node/foo/1.2.3/node_modules/bar

I took this to mean at one point the node caretakers did envision leveraging symlinks in principal. But in practice its not possible because of the lack of a precise version number when a module specifies its dependent modules in its package.json.

Today modules logically 'share' modules, it's just that the current install/runtime physically represents that sharing as copies everywhere, so I'm not quite sure what you mean by saying things should not be shared between apps; that reasoning would imply all node applications should not share node.exe.. Is that what you mean?

The actual layout on disk is no more complex than how things work today from node's perspective; the only change is rather than having only a folder underneath a module containing it's dependencies, there can be a folder adjacent to the module holding its dependencies.

With this very simple change, all package managers could symlink modules, but with the same version resolution guarantees as today, but they wouldn't be required to symlink anything. Without it, symlinking can't really be done for the purposes I'm proposing. This is the one thing I'm trying to solve, and its impossible to solve be any package manager, be-it npm or one I would create, so your suggestion of making my own package manager to generate massive appeal is impossible, because the constraint I'm attempting to overcome is with node itself.

You have a valid point regarding addons, but the --preserve-symlinks switch creates similar problems today. I have a simple solution that would address this, but haven't yet implemented as I'm trying to determine if anyone understands the what and why of this, and would support it.

I have implemented this change in my own node/npm branches, enough so that I could have some initial evidence, and it was compelling to me; much reduced disk space, dramatically faster installs, on the order of 20-30 times.

Given the changes in node were like ~40 lines of code, I'm surprised it seems there's some significant increase in complexity to maintain with respect to docs, backwards compatibility, and support.

When I've done a little more tweaking to npm, would you be willing to try my branches yourself and offer any feedback?

@sam-github
Copy link
Contributor

I'm not personally very interested in this, for the reasons I've stated, which you don't address (and in fairness, can't address right now, not without support from popular package managers requesting this feature). I'm just one person, but you haven't got anyone supporting this yet, AFAICT.

Btw, it helps to recognise and respond to criticism, instead of reject it.

The actual layout on disk is no more complex than how things work today

This is hyperbole. It is clearly more complex, it adds another directory to the list of places node modules are looked for, which is the definition of increased complexity. You think the complexity increase is minor for a great gain in usability, that's your opinion, but it is more complex.

@evanlucas already requested you PR your code since you have it implemented. Even if it doesn't get accepted, it would at least have the code attached for the record, and code might be more convincing, and a surge of support for this on an npm PR might sway people.

@ghost
Copy link
Author

ghost commented Nov 3, 2016

Your feedback is truly appreciated, and I clearly have even less than no support; yourself and others seem quite strongly opposed. In relation to other highly complicated and complex algorithms I've had to implement, adding another search path seamed comparatively simple; I sincerely apologize if I seemed to reject your valuable critiques as it was not at all my intent.

@ghost
Copy link
Author

ghost commented Nov 3, 2016

To all who have provided feedback on this issue, thank you kindly.

Going in I knew this would be controversial issue, and my attempt to diminish by being somewhat verbose I think backfired.

I will spare others and close this issue. However, I have taken your feedback in stride, and will shortly create a much more straight forward issue with links to version of node and npm that prove the concept.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants