-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider hardlinks rather than separate copy of packages per app #499
Comments
Yarn initially used to use symlinks and we changed it because our internal tooling (watchman etc.) doesn't work well with symlinks. Are hardlinks different in this case? If yes, that might be worth doing. I think the initial release should continue to use the copy approach; it is more consistent with the rest of the ecosystem and we should evaluate this behavior for a future major release. |
Upon thinking about it further, another issue that might come up is that people may try to modify their local node_modules for local debugging purposes or testing purposes, and not expect that they're actually modifying the node module linked to everywhere else. I don't know how often this happens with others, but I've definitely done it (though rarely) in the past. Apart from that, hardlinks seems to make sense. I'd guess that the tooling would be fine, since it should be the same as any other file. The primary issue this was intended to address was the cache causing issues with hardcoded paths that result from building packages (#480). |
Not sure, might be worth asking @wez whether Watchman can handle hardlinks.
I think this is the use case for |
I totally agree with @dxu and actually wanted to write the same thing. I do this often: I manually add some debugging code into a random node_module (that I don't have checked out locally). Once I'm done, I wipe it away and do |
Yeah, that's a use case I didn't really think about... Hmm... Oh well, we can still hold on to this idea. Maybe it could be an optional configuration setting for people that don't directly edit |
Going to close this since we decided long ago to go away from symlinks. It's required for compatibility with the existing ecosystem as even projects like ESLint rely on this directory structure to load rules etc. There's also a lot of problems with existing tooling not supporting them. For example when Yarn initially used Jest and it would fail and produce extremely long paths. Jest is much better now and the bug is likely fixed but small issues like this exist in a lot of tools. |
Sebastian, this task is for hardlinks not symlinks. Hardlinks shouldn't Sent from my phone. On Oct 5, 2016 5:24 AM, "Sebastian McKenzie" [email protected]
|
Hardlinks have the exact same problems and are semantically the same in this scenario. Why do you think they don't have any of the same issues? |
@kittens haven't really tested hardlinks. But once you hardlink a file, in theory from the filesystem's perspective, it should be exactly same as the original file -- you can remove the original file and the hardlinked file will still work. This is different from symlinks, whose content is just a pointer to the original file. |
You can have cycles though which is extremely problematic if tools aren't designed to handle them (most JavaScript tools aren't, and how would they?). Hardlinks and symlinks on Windows both require admin privileges (NTFS junctions don't but they're more synonymous with symlinks) which is a non-starter for a lot of environments. |
Good point of Windows. We can have platform specific logic maybe if we decide to go down this path. How do you create a cycle in hardlink? Note that there is no hardlink for directories. |
Going to reopen this for tracking purposes. It should be doable as hardlinked files look identical to the file system. I might prototype it. |
@Daniel15 one thing to keep in mind is that since hardlinks pretend to be the file system so well, deleting them usually deletes way more files than you're expecting. Since I remember unexpected deletions hitting users of |
Good point, I remember Steam on Linux accidentally running Maybe we need a safer "clean" function rather than just doing |
Are issues with hard links and * On macOS you can, but you shouldn't |
Symlinking to a global cache is essential. The copying approach is very slow for large projects (which I would argue are very common), and extremely slow on VMs, and very slow on Windows, and insanely slow on a virtualized Windows VM running on a macOS host in Parallels/VMWare. I have a relatively simple frontend/backend project and the With a warm global cache, the "Linking dependencies..." step takes about 5 minutes. Symlinking would take a couple of seconds.
So when I am building my Docker images, its taking me 5 mins everytime when it could be seconds. It seems every package manager's authors flatly ignore real-world performance. Is there a plan to support symlinking any time soon. I feel like it would be a simple implementation and just add a |
I wonder how long hardlinking would take. Definitely longer than symlinking as you need to hardlink each individual file, but it should be faster than copying the files over while avoiding some of the disadvantages of symlinks. I think it's worth having both a hardlink and a symlink mode, both of them opt-in. |
We could also use a symlink or hardlink feature when doing builds on our build server as copying of node modules is far the slowest part of the build, fx. our build time drops from 3 minutes with npm install(1:45 with yarn) to 15 seconds if we cache and symlink the node_modules folder between builds(we do hasing of the packages.json to know when to invalidate the cache). A raw copy with cp takes 45 seconds. |
Lack of symlink support in Watchman blocks more than Yarn: facebook/react-native#637 I develop React Native, Browser and Electron apps and I only had problems with symlinks in React Native and that was because of Watchman. The reason we can't have symlinking in Yarn shouldn't be Watchman or some other internal Facebook tooling. The rest of the ecosystem appears to support it well. Symlinking should be opt-out. |
Hardlinks should work fine with Watchman, and any other tool, since they look identical to "regular" files. That's one reason I suggested trying hardlinks rather than symlinks. |
Therein lies the problem. A good module system is not simply about importing files, and paths. However, as you say, "this issue is about Yarn", so problems with node/javascript are out of scope, sorry to have brought it up. As the title of this thread is about hard links rather than copy, |
There seems to be a lot of confusion about what hardlinks actually are. Most filesystems store files in two parts. The filename which points to the storage location, and the actual data. A symlink is a special filename that tells you to go look for another file. A hardlink is an second (or third, or ...) filename that points to the same data location. Therefore hardlinked files do not suffer the same problems as symlinks, because they truly look like copies of the original files. Also, assuming I only hardlink files and not directories, then if I do_ Basically, unless you are examining inode numbers, hardlinks look exactly like files copied from the originals, but they share the same storage location as the original files. So we will need to warn people not to modify the contents of their node_modules directories. Another potential problem is that on linux you can't make a hardlink across filesystem boundaries. I don't know about windows or mac os. So we would need to fall back on true copying when hardlinking doesn't work. Until something like this is implemented, I am going with the following approach...
Where |
@jpeg729 one problem that causes tho is that you’re supposed to be able to edit any file inside node_modules and see that change when you run your program - and if you have two places in node_modules that point to the same data location, editing one will end up editing the other, which might not be desired. |
|
@KSXGitHub "you are not supposed to" where does that rule come from? It's always been both possible, and something node and npm explicitly supports. As for being configurable, the problem is that users aren't going to know that this normal node ecosystem behavior behaves differently, and they could end up silently getting surprising behavior. |
If the default is to not use hard-links, and the user has to manually enable it, then that's not a problem: they know they're using weird yarn-specific behavior. |
APFS supports clonefiles, which are copy-on-write hardlinks. Eventhough I like the rubygems/bundlers package management more, the node_modules is a thing to stay for a while. The upside of using clonefiles is the if a malicious or misbehaving package / script (or user!) tries to update a file, it gets copied. This is a safe way to enable the space / time saving feature, and can be enabled by default. |
@jrz - Btrfs and ZFS both support Copy-on-Write too. Unfortunately, very few people are using CoW filesystems at the moment. Apple users are a relatively small proportion of the population, and many are still on HFS+. On Linux, ext3/4 is still much more common than Btrfs and ZFS. I think there's an issue somewhere to support CoW copies in Yarn, but I really think the "plug'n'play" functionality makes this obsolete anyways: yarnpkg/rfcs#101 |
Alternate solution: nodejs/node#25581 |
The solution is to use a virtual filesystem.
…On Tue 22. Jan 2019 at 10:58, Khải ***@***.***> wrote:
*Alternate solution:* nodejs/node#25581
<nodejs/node#25581>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#499 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFW7ZeMp6WVhS7fCIAIaKB9VKA6JgisPks5vFuC2gaJpZM4KOai0>
.
|
Temporary workaround --- a/src/package-linker.js
+++ b/src/package-linker.js
@@ -232,6 +232,7 @@ export default class PackageLinker {
const copyQueue: Map<string, CopyQueueItem> = new Map();
const hardlinkQueue: Map<string, CopyQueueItem> = new Map();
const hardlinksEnabled = linkDuplicates && (await fs.hardlinksWork(this.config.cwd));
+ const forceLinks = true;
const copiedSrcs: Map<string, string> = new Map();
const symlinkPaths: Map<string, string> = new Map();
@@ -302,7 +303,7 @@ export default class PackageLinker {
}
const copiedDest = copiedSrcs.get(src);
- if (!copiedDest) {
+ if (!forceLinks && !copiedDest) {
// no point to hardlink to a symlink
if (hardlinksEnabled && type !== 'symlink') {
copiedSrcs.set(src, dest);
@@ -319,7 +320,7 @@ export default class PackageLinker {
});
} else {
hardlinkQueue.set(dest, {
- src: copiedDest,
+ src: forceLinks ? src : copiedDest,
dest,
onFresh() {
if (ref) { |
It seems like everyone came to a consensus on this years ago:
What's blocking this? Are we just waiting for someone to submit a PR? |
There is an even better option. Use a virtual filesystem. The virtual FS can map the real directories to whatever location is needed, as many times as wanted with multiple versions. The modules exist once on the real FS. |
the copy-on-write thing is getting better with APFS. |
I'm not sure if yarn already uses clonefiles, but it looks like it:
|
Yeah, that thing gets passed down to libuv, but it only supports Linux for now. And oops I have two stale PRs (libuv/libuv#2577, libuv/libuv#2578)... maybe I should check the reviewer comments and stuff. |
Any news on if this is a thing in yarn? (hard links), while I like some of yarn's features there's no way I can move away from pnpm for my laptop with the amount of disk space that ends up being used otherwise.
That wouldn't be a viable option for a lot of people, or at the very least would be very awkward to setup, if your developing under Windows for example with a ntfs filesystem or Linux with an ext4 filesystem going through the hassle of mounting a block of storage as a "special" filesystem just for developing wouldn't be practical for a lot of folks. I'm thinking of giving Plug’n’Play a go to see if there are any compatibility issues with the code I'm running. |
I don't think hardlink support was ever implemented in Yarn. Plug'n'play
should work in many situations though!
…On Sun, Oct 11, 2020, 8:28 AM Hecatron ***@***.***> wrote:
Any news on if this is a thing in yarn? (hard links), while I like some of
yarn's features there's no way I can move away from pnpm for my laptop with
the amount of disk space that ends up being used otherwise.
There is an even better option.
Use a virtual filesystem.
The virtual FS can map the real directories to whatever location is
needed, as many times as wanted with multiple versions. The modules exist
once on the real FS.
That wouldn't be a viable option for a lot of people, or at the very least
would be very awkward to setup, if your developing under Windows for
example with a ntfs filesystem or Linux with an ext4 filesystem going
through the hassle of mounting a block of storage as a "special" filesystem
just for developing wouldn't be practical for a lot of folks.
Most people tend to follow the easiest path to a solution.
I'm thinking of giving Plug’n’Play a go to see if there are any
compatibility issues with the code I'm running.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#499 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAWOHJKFM4VQXFWX7VUTA3SKHFJBANCNFSM4CRZVC2A>
.
|
I've just been playing around with pnp under yarn and I do like it. |
I was not suggesting it as a user solution. It is something that should be implemented inside yarn. |
In a way this is what they have sort of done with yarn pnp pnpm's approach is to have a single global directory on the disk to store all the libs, then create a node_modules directory in the project and create hardlinks from there to the global store to save on disk space. This works but I've found can be a bit slower than yarn's pnp approach. The yarn pnp approach is to download the libs as compressed zips into ether a single global directory store (like pnpm) or local to the project .yarn\releases You then have a wrapper that reads in a generated .pnp.js file from yarn. The type of wrapper depends on the tool in use.
I think there may be some edge cases where it doesn't work, but I always have pnpm for that. |
Hard links work well to reduce the usage on disk and they work fine on linux, mac, and windows. I haven't found any disadvantages of using them. I currently use a package I wrote called pkglink to create them on my node_modules directories, but it would sure be nice if this was integrated into yarn so one didn't have to run things separately. But you can start using them today with pkglink https://github.com/jeffbski/pkglink Just run it on your js repos or even give it the folder above all of your repos and it will create hard links on the duplicate node_modules files, it verifies versions and file sizes, and dates before linking to make sure files are the same and can be linked. |
I know I am replying to something very old, but just for the record (since @also is only saying it vaguely) hardlinks should not cause accidental deletions at all. When anything is "deleted" on a filesystem with an inode table (including NTFS w/ MFT), all that happens is the inode gets one fewer reference count; it only gets removed when the count reaches 0. The only case for accidental deletion to happen is with directory hard links. Almost nobody besides macOS supports that, so I can't care to test. For the record, the |
Typically pnpm uses junctions instead of hardlinks for directory pointers / links So I suspect if you deleted the files inside a junction directory it would also delete them from the directory the junction pointed to (similar to a symlinked directory under Linux) |
@grbd, yes that's how junctions work. |
Closing as PnP is a thing now and the most space efficient option. If this were to be implemented it would happen in v2 and is tracked here yarnpkg/berry#1845 |
Update. This feature is supported starting from Yarn 3, via When enabled, the project files inside |
@larixer Using Right now my .yarnrc.yml file looks like this: enableGlobalCache: true
nmMode: hardlinks-global
nodeLinker: pnpm Is there something I'm missing? It appears to be linking to a hidden ".store" directory within my local node_modules folder, which isn't what I expected. |
@callaginn The information I provided was for |
This was touched on in a comment on #480, but I thought it's worth pulling into its own separate issue.
Currently, each app that uses Yarn (or npm) has its own
node_modules
directory with its own copies of all the modules. This results in a lot of duplicate files across the filesystem. If I have 10 sites that use the same version of Jest or React or Lodash or whatever else you want to install from npm, why do I need 10 identical copies of that package's contents on my system?We should instead consider extracting packages into a central location (eg.
~/.yarn/cache
) and hardlinking them. Note that this would be a hardlink rather than symlink, so that deleting the cache directory does not break the packages.The text was updated successfully, but these errors were encountered: