Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: registry: dependency specifiers #314

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions accepted/0000-registry-spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# The `registry:` Dependency Specifier

## Summary

Add a dependency specifier which defines a registry base url, package name,
and optionally SemVer range or dist-tag.

## Motivation

Occasionally, users wish to use multiple npm registries. For example, they
may have some packages hosted on the public npm registry, and others within
a private registry on their company's intranet, or provided by a company
like GitHub, Jfrog, Sonatype, or others.

Currently, it is possible to map a scope to a given registry, and all
packages starting with that scope will be published to and installed from
the defined registry:

```ini
; .npmrc file
@company:registry = https://npm-registry.my-company.com
```

However, this does not address the following use cases:

- Users have a set of unscoped package dependencies, some of which come
from the public registry, and others which have patches applied to them
(either to the code, or to the packument to add warnings via the
`deprecated` field for example). This can be done by making the registry
proxy any packages that are not patched in this way. However, it becomes
challenging when using more than one such registry which serves different
purposes.

- Alias package specifiers cannot point to any registries other than the
primary `--registry` configuration. It would be useful in some scenarios
to be able to alias a package to a copy found on a different registry, or
to use aliases to multiple different registries at the same time.

- Migrating packages from one registry to another can be challenging,
requiring downloading the tarball locally and then re-uploading it. It
would be much simpler to script such migrations by being able to do `npm
publish registry:https://source#pkgname --registry=https://destination/`.

- A tarball or git URL is sometimes the last resort for fetching a
dependency. However, tarball urls cannot support SemVer ranges, and
their dependencies will be fetched from the user's configured registry.
By specifying a registry where a specific dependency should be found, it
is possible to _also_ fetch transitive dependencies from the same source.

## Detailed Explanation

A new dependency specifier is added:

```
registry:<registry url>#<package name>[@<specifier>]
Copy link

@zkochan zkochan Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not leveraging the current syntax? The <source>: syntax which is already used for github and some other sources.

if there are multiple packages from the same registry, the URL will be duplicated? Maybe it should be moved out to a separate field.

For instance:

"dependencies": {
  "foo": "corp:^1.0.0",
  "bar": "corp:^1.0.0"
},
"registries": {
  "corp": "https://registry.corp.com"
}

also, if such a package will be published to the public registry, are all the third-party registries trusted by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining a spec name to point to a registry is a good idea, and there's a number of ways we could go with various different trade-offs. Worth doing as a subsequent RFC that builds on this one, since corp:[email protected] would presumably desugar to registry:https://registry.corp.com#[email protected].

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only a matter of syntactic sugar - if we support / implement this, then we need to be sure the syntax meet our standards. If fully qualified URLs lead to subpar developer experience, then it'll be very confusing to later say "our bad, now you can use this other syntax that work better but isn't as well supported".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the objection here. If we want to support a custom registry url in the package manifest as suggested here, how is that made any more difficult by also supporting a full registry url as part of the dependency spec? In fact, it seems like it would be somewhat easier implementation-wise, because most of the code that consumes specs would be able to remain agnostic as to whether the alias spec was corp:[email protected] or npm:[email protected] or registry:https://registry.npmjs.org/#[email protected], and we would have a verbose canonical way to save it that doesn't rely on having the rest of the package.json file in order to parse it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to support a custom registry url in the package manifest as suggested here, how is that made any more difficult by also supporting a full registry url as part of the dependency spec?

I think our point is that we're not convinced we want to support both, since that has a cost in terms of documentation, and has the potential to be confusing for our users. Should they use an url or a name? In which context? Etc. I'd much prefer having a single consistent syntax, and that makes it important to discuss what this syntax would be.

we would have a verbose canonical way to save it that doesn't rely on having the rest of the package.json file in order to parse it

Imo registry names shouldn't be mapped to urls via the package.json, but rather by our respective configuration files - just like scope urls.

```

Where:

- `<registry>` is a fully qualified URL to an npm registry, which may not
contain a `hash` portion,
Comment on lines +60 to +61
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @zkochan mentioned, the RFC doesn't make it clear why fully qualified URLs are the right choice. They have various drawbacks (such as hard to reconfigure; strongly vulnerable to hosts going down; syntactically ambiguous), so I think it's important to have the discussion about this before the fact, not after.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the only reason mentioned in the RFC for using URLs seem to be this, but it could be expanded (as is, I'm not very convinced the pros outweigh the cons).

- `<package name>` is the (scoped or unscoped) name of the package to
resolve on the registry, and
- `<specifier>` is an optional dist-tag, version, or range.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `<specifier>` is an optional dist-tag, version, or range.
- `<specifier>` is an optional dist-tag or semver range.

Versions are valid ranges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are. And also, every time we rely on that fact in our documentation, I get someone asking whether it has to be a range, or if a single version is allowed, so I just got in the habit of being a little extra verbose about it.


If `<specifier>` is omitted, then it defaults to the `tag` config (or
`defaultTag` internal optional), which defaults to `latest`.

### Saving

When a package is installed using a registry specifier, it *must* be saved
using a registry specifier.
Comment on lines +71 to +72
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another argument in favor of not making them URLs. A common complaint back in Yarn 1 was that storing the registry urls within the lockfile was causing annoying issues when switching from a registry to another - common use cases for China users, for instance, which frequently need to use Taobao.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I run npm install foo@registry:https://registry.foo.com/#[email protected], and it is saved in such a way that future npm install invocations do not fetch from https://registry.foo.com/, then that is a clear violation of expectation and intent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely; and if you run npm install foo@registry:openjs#[email protected] then there's no such expectation, and users will instead assume they can swap the openjs registry url to another if they need - which is a reasonable feature, considering that it's already an established use case (cf the Taobao example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this is saying is that we have to save it to the package.json so that if you run this:

npm install foo@registry:https://registry.foo.com/#foo
rm -rf node_modules
npm install

then the second one gets it from the https://registry.foo.com registry, and not the default configured registry.

If you agree, then I don't understand the objection.


### Alias Specifiers

Alias specifiers starting with `npm:` desugar into registry specifiers with
the default configured registry url.

For example, the alias dependency spec `npm:foo@latest` will be equivalent
to `registry:https://registry.npmjs.org#foo@latest`.

### Deduping

Two packages with the same name and version which come from different
registries *must not* be deduplicated against one another unless:

- If either has a defined `integrity` value, then their `integrity` values
must match.
- If neither has a defined `integrity` value, they will be considered
deduplicable if their `resolved` values match (for example, `registry-a`
lists the tarball in `registry-b` as its `dist.tarball` url.)

### Specifying Package Name

The `<package name>` portion is always required, even when it would match
the `name` portion of a complete named specifier.

For example, `foo@registry:https://url.com#[email protected]` is acceptable.
`foo@registry:https://url.com#1.x` is not valid, and will attempt to alias
`foo` to the `1.x` package.
isaacs marked this conversation as resolved.
Show resolved Hide resolved

This avoids the hazards of attempting to infer whether the `hash` portion
of the url is a SemVer, dist-tag, or package name. It is always a named
specifier.

### Meta-Dependency Resolution

When a package is installed from a `registry` specifier, its dependencies
should in turn also be fetched from the registry in the specifier.

In most cases, a package will be published to a given registry with the
expectation that its dependencies will be found in the same registry, ie by
doing `npm install pkgname --registry=https://internal-registry.com`.

If a package's dependencies are instead fetched from the default configured
registry, then this expectation would be contradicted.

Thus, any package resolved via a `registry` specifier _must_ have its
dependencies in turn resolved against the same registry that it came from.
Note that they _may_ still be deduplicated against packages by the same
name from other registries, but only if the integrity values match
(indicating that they are identical content).

### Examples:

- on the command line:

```bash
# the name may be specified
npm install forked@registry:https://internal.local#forked
# but is not required, as with other specifier types
npm install registry:https://internal.local#[email protected]
```

- in a `package.json` file

```json
{
"dependencies": {
"aliased": "registry:https://internal.com#[email protected]",
"forked": "registry:https://other-internal.com#[email protected]",
"patched": "registry:https://security-provider.com#patched@^1.4 || 2"
}
}
```

## Rationale and Alternatives

Use cases described are challenging to address in any other way.

Initial proposal used a `:` character to delimit the url from the package
specifier, but this is a poor choice, since `:` appears in registry urls.

[RFC PR #217](https://github.com/npm/rfcs/pull/217) addressed some of the
use cases described by defining a registry per _package_ underneath a
scope. However, analysis and discussion uncovered security concerns that
would make that approach unwise to implement. Packages with `registry:`
specifiers in their dependencies will fail to install on older npm versions
that do not support the new spec type, so there is no chance of fetching
from the _wrong_ registry.

Tarball URLs can be used as dependency specifiers, however:

- They do not support SemVer ranges or dist-tags.
- The dependencies _of_ a package fetched via a tarball url specifier will
be fetched from the configured registry, creating a name collision
vulnerability.

The main hazard imposed by this proposal is that, if the specified registry
is unreachable, it cannot be installed. Packages may be published to the
public registry that reference a registry only accessible to certain people
or at certain times. However, this is no worse than the current situation
of supporting tarball and git URLs, while adding support for version ranges
and dist-tags in those cases, and avoids the hazard of fetching
meta-dependencies from the wrong place.

## Implementation

- Add support for `registry:` specifiers in `npm-package-arg` module. **This
is a breaking change**, but adding `registry:` specifier support to
npm/cli is SemVer-minor.
- Upgrade all modules depending on `npm-package-arg` to ensure that they
will behave properly with `registry:` specifiers. (Note: this is most of
npm.)
- Track the "specifier registry" in Arborist's `buildIdealTree`
implementation, so that subsequent dependencies are fetched from the
appropriate registry.

## Prior Art

Alias specifiers already present in npm.

URL and git specifiers.

## Future Work

A subsequent RFC may add support for mapping registry names to full URLs,
either in `package.json` or in npm configuration, using a shorter syntax
that desugars to `registry:` specifiers in much the same way as the `npm:`
alias specifier. Registry short names are out of scope for this proposal.