Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MODULE.bazel.lock file "reads through" already-locked package manager #20272

Closed
alexeagle opened this issue Nov 20, 2023 · 13 comments
Closed

MODULE.bazel.lock file "reads through" already-locked package manager #20272

alexeagle opened this issue Nov 20, 2023 · 13 comments
Assignees
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug

Comments

@alexeagle
Copy link
Contributor

alexeagle commented Nov 20, 2023

Description of the bug:

MODULE.bazel.lock repeats content from package manager lockfiles, leading to serious developer experience regressions.

Desired behavior

As an example, in https://github.com/bazelbuild/examples/tree/611827d1bacfe83408cba2bbdcca4a801687d814/frontend

Bazel (via npm_translate_lock) starts from pnpm-lock.yaml [1] which fully specifies all the fields (URL, integrity) needed for rctx.download calls in rules_js.
When developers have a merge conflict in that file, they can just run the install again, because the tool is aware of git merge conflict markers. https://pnpm.io/git#merge-conflicts (the same is true for npm and yarn, the other popular JS package managers)

Note that in a big monorepo, it's very common to have lockfile merge conflicts because a PR that updates versions take a while to land (you have to green up all the usages) and in the meantime you need to rebase over other commits that have landed which also updated a dependency.

Also, when tooling like Dependabot updates their dependency versions, the lockfile is updated as well.

So, when the lockfile is updated, I would expect this entry in the MODULE.bazel.lock file to be trivially updated:

 "accumulatedFileDigests": {
    "@@//:.npmrc": "25afec638a4a1dc4a7277d8babd30464e9d942cf93d3e47505a4677adad1a901",
    "@@//:pnpm-lock.yaml": "656fb164bf8539b6945f307477907a7087d4b7b74db0e525065d1ba70898ebd5"
}

Actual behavior

However, when upgrading that repo to Bazel 7.0, we now get a MASSIVE Bazel lockfile, much larger than the original:

$ bazel query //...
$ wc -l MODULE.bazel.lock pnpm-lock.yaml 
 172470 MODULE.bazel.lock
  12893 pnpm-lock.yaml

Of these lines, nearly all are in this block:

"moduleExtensions": {
    "@@aspect_rules_js~1.32.2//npm:extensions.bzl%npm": {
      "general": {
        "bzlTransitiveDigest": "EkNX7WbLkG1/zKMTRsHgYFdQKwWxWuZMReEve7JbvqQ=",
        "accumulatedFileDigests": {
          "@@//:.npmrc": "25afec638a4a1dc4a7277d8babd30464e9d942cf93d3e47505a4677adad1a901",
          "@@//:pnpm-lock.yaml": "656fb164bf8539b6945f307477907a7087d4b7b74db0e525065d1ba70898ebd5"
        },
        "envVariables": {},
        "generatedRepoSpecs": {
... [170,990 lines elided] ...
        }
      }
    }
  }

Any merge conflicts in this file will present the developer with a massive headache, and they'll likely just delete-and-recreate the lockfile to get past it. Of course that's a bad practice as you've accidentally lost the pinning of all other Bazel modules.

Also, PRs from Dependabot will cause Bazel's lockfile to be out-of-date. Dependabot PRs are commonly security-related, so engineers have a great sense of urgency in merging them, and will again just work around the Bazel lockfile checking on CI or disable the feature.

[1] https://github.com/bazelbuild/examples/blob/main/frontend/pnpm-lock.yaml

Which category does this issue belong to?

External Dependency

What is the output of bazel info release?

7.0.0rc4

@alexeagle
Copy link
Contributor Author

alexeagle commented Nov 20, 2023

Proposal: there should be an API for npm_translate_lock to indicate that we believe all the repositories generated by our repository rule calls already meet the desired properties in https://bazel.build/external/lockfile#lockfile-benefits, and omit the generatedRepoSpecs altogether.

@Wyverald Wyverald added P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. area-Bzlmod Bzlmod-specific PRs, issues, and feature requests and removed untriaged labels Nov 20, 2023
@Wyverald Wyverald added this to Bzlmod Nov 20, 2023
@github-project-automation github-project-automation bot moved this to Todo in Bzlmod Nov 20, 2023
@Wyverald
Copy link
Member

Proposal: there should be an API for npm_translate_lock to indicate that we believe all the repositories generated by our repository rule calls already meet the desired properties in https://bazel.build/external/lockfile#lockfile-benefits, and omit the generatedRepoSpecs altogether.

The key action item here is to do exactly this -- in other words, allow a module extension to say "I don't want to be locked" in some manner. I've discussed this topic with the rules_python maintainers in some depth, with some more sophisticated ideas thrown around, but this is the first step.

The only question to address here would be whether we want this to be a per-extension setting (the extension is either always locked or never locked, à la my_ext = module_extension(..., local=True)), or a per-extension-evaluation setting (ie. have the extension impl function return a thing saying "given what has happened this time, I want the result to be locked/not locked"). I'm leaning towards the latter as it gives us more flexibility to introduce stuff like "custom lockfile content" further on.

@Wyverald Wyverald added this to the 7.1.0 release blockers milestone Nov 20, 2023
@alexeagle
Copy link
Contributor Author

Thanks! FWIW I don't have a preference between those two methods of "opting out" of having the extension results be locked. I could imagine there might be "custom lockfile content" issues with npm_translate_lock especially when "importing" from prior lockfile formats.

@fmeum
Copy link
Collaborator

fmeum commented Nov 20, 2023

I find it important to consider these two different benefits of the lockfile separately:

  1. reproducibility: Ensure that two builds using the lockfile create the exact same repository rule instances even if the implementation functions of the locked extensions depend on information other than that encoded in the tags.
  2. performance: No need to rerun the extension implementation function on server restarts or when making trivial modifications to MODULE.bazel (e.g. comments or use_repos).

Even when module extensions such as those provided by rules_js and rules_go are already reproducible in the sense that their lockfile representation does not contain any information that isn't already a pure function of the extension tags, file deps and implementation function, they may still benefit from 2. In particular in the case of rules_js this is noticeable since it has to parse a potentially large yaml file in Starlark, which can take on the order of 10 seconds for large projects. Ideally, a solution to this issue would resolve this duplication while still maintaining the performance benefit of the lockfile.

My idea is a combination of two features that actually share a lot of their implementation complexity:

  • Let Bazel automatically resolve merge conflicts in the lockfile, similar to what yarn does. This essentially just means parsing both "sides" of the conflict and keeping all the extension results whose tags still match the current state of the module files. This way, a merge conflict in the lockfile, which is very common due to it including a hash of the root module file, allows the user to preserve the results of unmodified and potentially non-hermetic module extensions.
  • Allow module extensions to declare that their current evaluation is reproducible: given the same tags, file deps and implementation function, they will always produce the same repository rule instantiations. Such an extension will not benefit from 1., but may still benefit from 2. and we should aim to support this in our design. I think that this can be achieved by maintaining a second lockfile in the output base which stores the results of only the module extensions marked as reproducible and that is always in update mode. Crucially, these results can still be used to speed up the loading phase on server restarts and changes to the root module file, but they will not be contained in the regular MODULE.bazel.lock and thus not be committed (and not cause merge conflicts or duplication).

Both of these features require generalizing the Java (not the JSON) representation of the lockfile to an (ordered) list of lockfiles. I prototyped this and found it to add a modest amount of complexity - it even makes some parts of the existing code cleaner and error messages more precise (e.g. the handling of a missing lockfile).

The only question to address here would be whether we want this to be a per-extension setting (the extension is either always locked or never locked, à la my_ext = module_extension(..., local=True)), or a per-extension-evaluation setting (ie. have the extension impl function return a thing saying "given what has happened this time, I want the result to be locked/not locked"). I'm leaning towards the latter as it gives us more flexibility to introduce stuff like "custom lockfile content" further on.

I also think that there is no reason to not make this decision per-evaluation. We already have the API for this in the form of module_ctx.extension_metadata, we just need to add a boolean parameter to it. Given the above, I would rather call this parameter reproducible instead of local.

The key action item here is to do exactly this -- in other words, allow a module extension to say "I don't want to be locked" in some manner.

I haven't been able to come up with an example of an extension that would rightfully want to say "I don't want to be locked", but isn't reproducible. Knowing that the extension is reproducible allows further optimizations as described above, so I would find this bit safer to add than a catch-all local.

@fmeum
Copy link
Collaborator

fmeum commented Nov 20, 2023

Also, PRs from Dependabot will cause Bazel's lockfile to be out-of-date.

Can we teach Dependabot to run bazel mod deps in projects with MODULE.bazel.lock files checked in?

(sorry for the unintended close, keyboard shortcuts can be dangerous)

@fmeum fmeum closed this as completed Nov 20, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in Bzlmod Nov 20, 2023
@fmeum fmeum reopened this Nov 20, 2023
@github-project-automation github-project-automation bot moved this from Done to In Progress in Bzlmod Nov 20, 2023
@iancha1992 iancha1992 removed this from the 7.1.0 release blockers milestone Nov 29, 2023
@iancha1992
Copy link
Member

@bazel-io fork 7.1.0

@alexeagle
Copy link
Contributor Author

@dzbarsky observed another more serious consequence of this, is that an auth token for your private npm registry will leak into the lockfile as well. I imagine the same could be true for other package managers.

@SalmaSamy
Copy link
Contributor

@alexeagle Can you explain more how/where would this token appear in the lockfile?
I believe this would happen if someone puts it in a repo attribute, which is not a good practice in general?

@alexeagle
Copy link
Contributor Author

@SalmaSamy yup, looks like that's here https://github.com/aspect-build/rules_js/blob/d4f2068adc2b97c3c97fdf588a765562591f410e/npm/extensions.bzl#L125
@fmeum suggested this could probably set a repo_env to pass this info along to avoid the locking mechanism seeing it.

@alexeagle
Copy link
Contributor Author

Note @fmeum I blogged about using the git merge driver feature to make the lockfile update less annoying: https://blog.aspect.dev/easier-merges-on-lockfiles

I can't tell if your feature of having bazel auto-resolve the conflicts is making progress? Or perhaps we should recommend this approach for Bazel users who have the lockfile enabled?

@fmeum
Copy link
Collaborator

fmeum commented Jan 22, 2024

@alexeagle I plan to write the design doc this week. In the meantime, the custom merge driver is a decent workaround.

@SalmaSamy
Copy link
Contributor

SalmaSamy commented Jan 25, 2024

@alexeagle I attached a PR to fix this, but @fmeum has a different idea that we're discussing. Do you think this is an urgent fix for 7.1.0? or we can wait to find the most suitable solution?

@github-project-automation github-project-automation bot moved this from In Progress to Done in Bzlmod Feb 12, 2024
SalmaSamy added a commit that referenced this issue Feb 12, 2024
… the lockfile (fixes #20272)

Adding 'reproducible' to the module extension metadata

PiperOrigin-RevId: 606288112
Change-Id: Ieca9c5eeb560e3556ba2b7e3f0a29c362bfa4f4b

# Conflicts:
#	src/main/java/com/google/devtools/build/lib/bazel/bzlmod/ModuleExtensionContext.java
#	src/main/java/com/google/devtools/build/lib/bazel/bzlmod/ModuleExtensionMetadata.java
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

8 participants