-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time-based resolution mode #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We should implement a PoC and see how it goes and if we run into issues during development.
|
||
Resolution will be divided into two stages. The first stage will resolve all the direct dependencies of all the workspace projects. This stage may work the same way as it works now (highest version is picked). When all the direct dependencies are resolved, we check when were the picked versions released. This information is present in the package metadata at the "time" field. For example, if we install webpack and eslint, we get webpack resolved to v5.74.0 and eslint resolved to v8.22.0. `[email protected]` was released at "2022-07-25T08:00:33.823Z". `[email protected]` was released at "2022-08-14T01:23:41.730Z". Now we compare the dates of each released packages and pick the nearest date. In this case, the nearest date is the date eslint was released: "2022-08-14T01:23:41.730Z". Let's call this date T. | ||
|
||
At the second stage, we resolve all the subdependencies. At this stage, instead of resolving a range to the highest available version, we filter out any versions that were released after T and pick the highest version from those. Let's say we need to resolve `ajv@^6.10.0`. We already have a metadata of ajv in cache and it was saved after T, so we don't need to redownload it. This are the versions of ajv that match `^6.10.0`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some monorepos packages are published in random order. So it might happen that a subdep was released later than the parent dep. Maybe we should add 10 minutes to the T.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, adding some delta might be a good idea. In the most common cases it will still not fetch the metadata from sub deps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a config-level setting to prefer the minimum version that matches the set of merged specifiers, rather than the maximum, for the sake of stability? With how most monorepos operate, this would also provide the same guarantee, because it is common to publish with other-workspace-package: ^current
in dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this. It doesn't work as you would expect. There are many projects that update the dependencies but keep the old ranges. So the "current" doesn't really work anymore with the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it is hard or not possible to implement with pnpm as I have stated in the alternatives section.
I have implemented a PoC for this resolution mode: pnpm/pnpm#5238 It works well but there is a bottleneck. The npm registry currently allows to fetch the full metadata of a package or a compact metadata. The compact metadata doesn't contain the "time" field. Hence, we can only implement this resolution mode using the full metadata. But that slows down installation almost twice. I have tested this resolution mode with a hot cache (a cache created from full metadata obfuscated after download) and this resolution mode is proven to be faster than the current resolution mode used by pnpm before
after
So it definitely makes sense to ship such option but in order to make it the default one, we will have to cooperate with a registry to include the "time" in the compact metadata. |
npm has a before option and when it is used, npm CLI requests the full metadata, making installation slower. So it may be in npm's interest to support such new metadata format that includes "time" |
cc @pnpm/collaborators |
@juanpicado has created a branch in the Verdaccio repository, where the "abbreviated" (the non-full metadata) contains the "time" field. This has allowed me to test the new resolution mode and run some benchmarks. With the new resolution mode, installation is a bit slower with cold cache: new resolution mode:
old resolution mode:
However, installation is a LOT faster with hot cache: New resolution mode:
Old resolution mode:
|
Suppose a company is concerned about NPM versions that might contain malware, maybe because the publishing token was compromised. In such an incident, eventually someone from the community will discover the malware, and report it, and then the bad version will get deprecated/removed from This suggests a simple strategy to reduce exposure to malware, by introducing a "waiting period" before new versions are installed. An NPM registry proxy can implement such a policy. But could PNPM provide this feature using similar logic as time-based resolution? |
@octogonz we may add something like a |
This is released in v7.10 |
Worth mentioning that the resolution strategy here (resolving transitives to be "under" the time of the most-recent direct) will constrain the ability to fix-forward a release by updating a package that would be a transitive to the installing project. // my-app/package.json
{
"dependencies": {
"ember-cli": "^3.1.0",
}
} // ember-cli/package.json
{
"version": "3.1.0",
"dependencies": {
"ember-cli-broccoli": "^1.1.0"
}
} Use-case:
This type of issue matters most particularly for generators like Workarounds:
|
Why? If a fix is important, that is what should happen. Even now there is no guarantee that a transitive dependency will be updated as the deduplication algorithm might pick a lower version. |
The way in which this issue will manifest itself is particularly at issue for code generators (which is why my examples are of a code generator). Extending the previous example, as the author of
However, if, a user runs: pnpx create-react-app@latest my-app
cd my-app
pnpm install --use-time-mode The local cache will create a situation of "accidentally locked all projects created on this box to an old version of a transitive" situation until one of the directs updates, triggering an update cascade. This is especially problematic for something like
This would be bonus "fun" in CI for the authors of code generators; we'd need to opt out of the metadata cache in order to ensure that CI pulls up-to-date versions of everything. Setting aside whether or not you believe the strategy is reasonable, bumping just the transitive has been an explicit strategy for fix-forward releases within the Ember community for nearly a decade—especially as it doesn't require any user action. The |
I have been using this feature for a week in several projects. I had no issues yet. It is opt-in for now. I understand your points, but I think you don't realize how rarely dependencies are updated in enterprise applications. The workflow you describe was already "broken" with the introduction of lockfiles. If this resolution mode will cause issues, we may add some minimum time for the threshold. Like metadata files should not be older than a week (or even a day, we can make it configurable). |
It would be nice to see the lockfiles abandoned altogether and superseded by the time based resolution feature, does it sounds realistic? |
I think a lockfile is still useful due to a few reasons. A lockfile makes installation faster. A lockfile also stores the checksum of each package. So if someone publishes new content to the same version, you will get an error. This is more secure than no lockfile. This resolution mode doesn't cover all dependency sources. For instance, a lockfile also locks githosted dependencies. |
This workflow is also not broken via lockfiles. For example Subsequent installations on the same machine using this time-based resolution strategy will "lock" those transitive dependencies via the metadata cache—even though these generators explicitly opted against a lockfile and specified floating versions. Yes, the package resolution is still valid, but it does not allow users to get security/bugfix updates when generating a new project and breaks established workflows. This issue is explicitly not with dependency updates, or enterprise usage patterns (though a failure scenario could be constructed for these). The issue presents most-significantly with project generators that emit a It could become the responsibility of every single generator application to help people avoid this trap—but first installs are where the metadata cache get you the largest wins. If the entire community of generators is guiding against using this (two I maintain, Template repositories that do not include lockfiles also suffer from this same problem and generally do not execute installation for their users so they don't have a hook to bypass the metadata cache—which increases the teaching costs for
|
Clearing the metadata cache is not enough. One of the direct dependencies should be updated and if the new version of the direct dependency has a newer publish date, then the transient dependency might be updated (the transient dep should still have an older publish date than the publish date of the newest direct dependency). I understand the disadvantages but I still think the advantages outweigh them. Speed is not the only reason for this change. Decentralization and security are also two important points. With good caching a third party registry like Verdaccio can work more effectively. We can also implement effective server side resolution. And from security perspective, as you may have noticed in the past year there were several incidents where the maintainers of transient dependencies have published malware in their packages. This doesn't protect the users completely from such updates but it reduces the chance by delaying the update. We could make an exception for |
Oh, this is stricter than I originally understood. It's not just a cache; it's an extension to the resolution algorithm. This resolution mode entirely disables floating versions on transitive dependencies from being upgraded beyond the most-recent direct dependency release date. It prevents leaf-node packages from having any hope of keeping their package up-to-date and relies on direct dependencies (e.g. one of those seven dependencies installed by
Current state is actually decent, only two days out of date:
But, if you dive into the full set of release times, you'll find that none of the directs updated between August 4 - August 23, and June 26 - July 14. No package I know of releases a new version every time a transitive dependency updates, and with 1394 packages in a standard We need to weigh:
Against:
Without data I can't be certain which of these is more-likely to be worse. |
To read it in rendered view: https://github.com/pnpm/rfcs/blob/time-base-resolution-mode/text/0002-time-based-resolution-mode.md
Created a discussion in Verdaccio to support a new metadata format: verdaccio/verdaccio#3315