Skip to content

Commit

Permalink
fix(npm): stale metadata cache issue (#6101)
Browse files Browse the repository at this point in the history
**What's the problem this PR addresses?**

We now keep the package metadata in cache. To avoid missing new packages
being released we have a check so that we only accept the cached
metadata if 1/ the request asks for a semver version (not a range), and
2/ the requested version is found inside the cached metadata. In theory
this means that whenever a dependency asks for a version we didn't
cache, we assume something new got published, and we refetch it.

However, to prevent fetching the package metadata many times for many
different versions or ranges, we also have an in-memory metadata cache
where we store the cached metadata once we extracted them from either
the disk or the network.

This may lead to memory cache corruption issues when two versions from
the same package are resolved if one exists in the cached metadata but
the other doesn't. In that case, the first package will pass the check
for "is this version inside the cached metadata", get stored in the
in-memory cache, and be reused for further resolutions (even if those
resolutions would have failed this check). This is because the disk
cache and the memory cache are the same.

Fixes #5989

**How did you fix it?**

I separated the in-memory cache into two buckets: the disk cache, and
the network cache. This ensures that the disk cache gets properly
ignored when retrieving versions we don't know, rather than be
mistakenly assumed to be what the network fetched.

**Checklist**
<!--- Don't worry if you miss something, chores are automatically
tested. -->
<!--- This checklist exists to help you remember doing the chores when
you submit a PR. -->
<!--- Put an `x` in all the boxes that apply. -->
- [x] I have read the [Contributing
Guide](https://yarnpkg.com/advanced/contributing).

<!-- See
https://yarnpkg.com/advanced/contributing#preparing-your-pr-to-be-released
for more details. -->
<!-- Check with `yarn version check` and fix with `yarn version check
-i` -->
- [x] I have set the packages that need to be released for my changes to
be effective.

<!-- The "Testing chores" workflow validates that your PR follows our
guidelines. -->
<!-- If it doesn't pass, click on it to see details as to what your PR
might be missing. -->
- [x] I will check that all automated PR checks pass before the PR gets
reviewed.
  • Loading branch information
arcanis authored Jan 24, 2024
1 parent 3d99d6b commit 6db7b21
Show file tree
Hide file tree
Showing 3 changed files with 202 additions and 79 deletions.
24 changes: 24 additions & 0 deletions .yarn/versions/0fa4c1b4.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
releases:
"@yarnpkg/cli": patch
"@yarnpkg/plugin-npm": patch

declined:
- "@yarnpkg/plugin-compat"
- "@yarnpkg/plugin-constraints"
- "@yarnpkg/plugin-dlx"
- "@yarnpkg/plugin-essentials"
- "@yarnpkg/plugin-init"
- "@yarnpkg/plugin-interactive-tools"
- "@yarnpkg/plugin-nm"
- "@yarnpkg/plugin-npm-cli"
- "@yarnpkg/plugin-pack"
- "@yarnpkg/plugin-patch"
- "@yarnpkg/plugin-pnp"
- "@yarnpkg/plugin-pnpm"
- "@yarnpkg/plugin-stage"
- "@yarnpkg/plugin-typescript"
- "@yarnpkg/plugin-version"
- "@yarnpkg/plugin-workspace-tools"
- "@yarnpkg/builder"
- "@yarnpkg/core"
- "@yarnpkg/doctor"
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import {Filename, ppath, xfs} from '@yarnpkg/fslib';
import {tests} from 'pkg-tests-core';

describe(`Features`, () => {
describe(`Resolution cache`, () => {
test(
`it should use a cache metadata when resolving fixed versions`,
makeTemporaryEnv({}, {
enableGlobalCache: false,
}, async ({path, run, source}) => {
await run(`add`, `[email protected]`);

await xfs.removePromise(ppath.join(path, Filename.lockfile));

// We now hide any version other than 1.0.0 from the registry. If the
// install passes, it means that Yarn read the metadata from the cache rather
// than the registry, as we wanted.

await tests.setPackageWhitelist(new Map([
[`no-deps`, new Set([`1.0.0`])],
]), async () => {
await run(`install`);
});
}),
);

test(
`it should properly separate the disk metadata cache from the network metadata cache`,
makeTemporaryEnv({}, {
enableGlobalCache: false,
}, async ({path, run, source}) => {
await tests.setPackageWhitelist(new Map([
[`no-deps`, new Set([`1.0.0`])],
]), async () => {
await run(`add`, `[email protected]`);
});

await xfs.removePromise(ppath.join(path, Filename.lockfile));

// At this point, no-deps has been added into the metadata cache, but only
// with the 1.0.0 version. The metadata cache isn't aware of other versions.

// Now, we need a way to force the resolution cache to be used before resolving
// a version that it isn't aware of. To that end, we create a package.json with
// a dependency on one-fixed-dep@2, and we run 'yarn add [email protected]'. This
// ensure that Yarn will run getCandidate on [email protected] first (because it's
// required before adding it to the package.json), and [email protected] later.

await xfs.writeFilePromise(ppath.join(path, Filename.manifest), JSON.stringify({
dependencies: {
[`one-fixed-dep`]: `2.0.0`,
},
}));

await run(`add`, `[email protected]`);
}),
);
});
});
198 changes: 119 additions & 79 deletions packages/plugin-npm/sources/npmHttpUtils.ts
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
import {Configuration, Ident, formatUtils, httpUtils, nodeUtils, StreamReport, structUtils, IdentHash, hashUtils, Project, miscUtils, Cache} from '@yarnpkg/core';
import {MessageName, ReportError} from '@yarnpkg/core';
import {Filename, PortablePath, ppath, xfs} from '@yarnpkg/fslib';
import {prompt} from 'enquirer';
import pick from 'lodash/pick';
import semver from 'semver';
import {Configuration, Ident, formatUtils, httpUtils, nodeUtils, StreamReport, structUtils, hashUtils, Project, miscUtils, Cache} from '@yarnpkg/core';
import {MessageName, ReportError} from '@yarnpkg/core';
import {Filename, PortablePath, ppath, xfs} from '@yarnpkg/fslib';
import {prompt} from 'enquirer';
import pick from 'lodash/pick';
import semver from 'semver';

import {Hooks} from './index';
import * as npmConfigUtils from './npmConfigUtils';
import {MapLike} from './npmConfigUtils';
import {Hooks} from './index';
import * as npmConfigUtils from './npmConfigUtils';
import {MapLike} from './npmConfigUtils';

export enum AuthType {
NO_AUTH,
Expand Down Expand Up @@ -80,74 +80,31 @@ export type GetPackageMetadataOptions = Omit<Options, 'ident' | 'configuration'>
// - an in-memory cache, to avoid hitting the disk and the network more than once per process for each package
// - an on-disk cache, for exact version matches and to avoid refetching the metadata if the resource hasn't changed on the server

const PACKAGE_METADATA_CACHE = new Map<IdentHash, Promise<PackageMetadata> | PackageMetadata>();

/**
* Caches and returns the package metadata for the given ident.
*
* Note: This function only caches and returns specific fields from the metadata.
* If you need other fields, use the uncached {@link get} or consider whether it would make more sense to extract
* the fields from the on-disk packages using the linkers or from the fetch results using the fetchers.
*/
export async function getPackageMetadata(ident: Ident, {cache, project, registry, headers, version, ...rest}: GetPackageMetadataOptions): Promise<PackageMetadata> {
return await miscUtils.getFactoryWithDefault(PACKAGE_METADATA_CACHE, ident.identHash, async () => {
const {configuration} = project;

registry = normalizeRegistry(configuration, {ident, registry});

const registryFolder = getRegistryFolder(configuration, registry);
const identPath = ppath.join(registryFolder, `${structUtils.slugifyIdent(ident)}.json`);
const PACKAGE_DISK_METADATA_CACHE = new Map<PortablePath, Promise<CachedMetadata | null>>();
const PACKAGE_NETWORK_METADATA_CACHE = new Map<PortablePath, Promise<CachedMetadata | null>>();

async function loadPackageMetadataInfoFromDisk(identPath: PortablePath) {
return await miscUtils.getFactoryWithDefault(PACKAGE_DISK_METADATA_CACHE, identPath, async () => {
let cached: CachedMetadata | null = null;

// We bypass the on-disk cache for security reasons if the lockfile needs to be refreshed,
// since most likely the user is trying to validate the metadata using hardened mode.
if (!project.lockfileNeedsRefresh) {
try {
cached = await xfs.readJsonPromise(identPath) as CachedMetadata;
} catch {}

if (cached) {
if (typeof version !== `undefined` && typeof cached.metadata.versions[version] !== `undefined`)
return cached.metadata;

if (configuration.get(`enableOfflineMode`)) {
const copy = structuredClone(cached.metadata);
const deleted = new Set();

if (cache) {
for (const version of Object.keys(copy.versions)) {
const locator = structUtils.makeLocator(ident, `npm:${version}`);
const mirrorPath = cache.getLocatorMirrorPath(locator);

if (!mirrorPath || !xfs.existsSync(mirrorPath)) {
delete copy.versions[version];
deleted.add(version);
}
}

const latest = copy[`dist-tags`].latest;
if (deleted.has(latest)) {
const allVersions = Object.keys(cached.metadata.versions)
.sort(semver.compare);

let latestIndex = allVersions.indexOf(latest);
while (deleted.has(allVersions[latestIndex]) && latestIndex >= 0)
latestIndex -= 1;
try {
cached = await xfs.readJsonPromise(identPath) as CachedMetadata;
} catch {}

if (latestIndex >= 0) {
copy[`dist-tags`].latest = allVersions[latestIndex];
} else {
delete copy[`dist-tags`].latest;
}
}
}
return cached;
});
}

return copy;
}
}
}
type LoadPackageMetadataInfoFromNetworkOptions = {
configuration: Configuration;
cached: CachedMetadata | null;
registry: string;
headers?: {[key: string]: string | undefined};
version?: string;
};

async function loadPackageMetadataInfoFromNetwork(identPath: PortablePath, ident: Ident, {configuration, cached, registry, headers, version, ...rest}: LoadPackageMetadataInfoFromNetworkOptions) {
return await miscUtils.getFactoryWithDefault(PACKAGE_NETWORK_METADATA_CACHE, identPath, async () => {
return await get(getIdentUrl(ident), {
...rest,
customErrorMessage: customPackageError,
Expand Down Expand Up @@ -175,22 +132,28 @@ export async function getPackageMetadata(ident: Ident, {cache, project, registry

const packageMetadata = pickPackageMetadata(JSON.parse(response.body.toString()));

PACKAGE_METADATA_CACHE.set(ident.identHash, packageMetadata);

const metadata: CachedMetadata = {
metadata: packageMetadata,
etag: response.headers.etag,
lastModified: response.headers[`last-modified`],
};

// We append the PID because it is guaranteed that this code is only run once per process for a given ident
const identPathTemp = `${identPath}-${process.pid}.tmp` as PortablePath;
PACKAGE_DISK_METADATA_CACHE.set(identPath, Promise.resolve(metadata));

await xfs.mkdirPromise(registryFolder, {recursive: true});
await xfs.writeJsonPromise(identPathTemp, metadata, {compact: true});
// We don't need the cache in this process anymore (since we stored everything in both memory caches),
// so we can run the part that writes the cache to disk in the background.
Promise.resolve().then(async () => {
// We append the PID because it is guaranteed that this code is only run once per process for a given ident
const identPathTemp = `${identPath}-${process.pid}.tmp` as PortablePath;

// Doing a rename is important to ensure the cache is atomic
await xfs.renamePromise(identPathTemp, identPath);
await xfs.mkdirPromise(ppath.dirname(identPathTemp), {recursive: true});
await xfs.writeJsonPromise(identPathTemp, metadata, {compact: true});

// Doing a rename is important to ensure the cache is atomic
await xfs.renamePromise(identPathTemp, identPath);
}).catch(() => {
// It's not dramatic if the cache can't be written, so we just ignore the error
});

return {
...response,
Expand All @@ -201,6 +164,83 @@ export async function getPackageMetadata(ident: Ident, {cache, project, registry
});
}

/**
* Caches and returns the package metadata for the given ident.
*
* Note: This function only caches and returns specific fields from the metadata.
* If you need other fields, use the uncached {@link get} or consider whether it would make more sense to extract
* the fields from the on-disk packages using the linkers or from the fetch results using the fetchers.
*/
export async function getPackageMetadata(ident: Ident, {cache, project, registry, headers, version, ...rest}: GetPackageMetadataOptions): Promise<PackageMetadata> {
const {configuration} = project;

registry = normalizeRegistry(configuration, {ident, registry});

const registryFolder = getRegistryFolder(configuration, registry);
const identPath = ppath.join(registryFolder, `${structUtils.slugifyIdent(ident)}.json`);

let cached: CachedMetadata | null = null;

// We bypass the on-disk cache for security reasons if the lockfile needs to be refreshed,
// since most likely the user is trying to validate the metadata using hardened mode.
if (!project.lockfileNeedsRefresh) {
cached = await loadPackageMetadataInfoFromDisk(identPath);

if (cached) {
if (typeof version !== `undefined` && typeof cached.metadata.versions[version] !== `undefined`)
return cached.metadata;


// If in offline mode, we change the metadata to pretend that the only versions available
// on the registry are the ones currently stored in our cache. This is to avoid the resolver
// to try to resolve to a version that we wouldn't be able to download.
if (configuration.get(`enableOfflineMode`)) {
const copy = structuredClone(cached.metadata);
const deleted = new Set();

if (cache) {
for (const version of Object.keys(copy.versions)) {
const locator = structUtils.makeLocator(ident, `npm:${version}`);
const mirrorPath = cache.getLocatorMirrorPath(locator);

if (!mirrorPath || !xfs.existsSync(mirrorPath)) {
delete copy.versions[version];
deleted.add(version);
}
}

const latest = copy[`dist-tags`].latest;
if (deleted.has(latest)) {
const allVersions = Object.keys(cached.metadata.versions)
.sort(semver.compare);

let latestIndex = allVersions.indexOf(latest);
while (deleted.has(allVersions[latestIndex]) && latestIndex >= 0)
latestIndex -= 1;

if (latestIndex >= 0) {
copy[`dist-tags`].latest = allVersions[latestIndex];
} else {
delete copy[`dist-tags`].latest;
}
}
}

return copy;
}
}
}

return await loadPackageMetadataInfoFromNetwork(identPath, ident, {
...rest,
configuration,
cached,
registry,
headers,
version,
});
}

type CachedMetadata = {
metadata: PackageMetadata;
etag?: string;
Expand Down

0 comments on commit 6db7b21

Please sign in to comment.