-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make a rust ruleset to wrap cargo and build the native engine #6891
make a rust ruleset to wrap cargo and build the native engine #6891
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really exciting, thanks for putting it together! I think there are ~2 pieces of work we should do before we can merge...
We definitely shouldn't be exposing PathGlobsAndRoot
to @rule
s. There are two reasons this is happening at the moment, as far as I can see:
- BinUtils - modifying BinUtil so that it uses
UrlToFetch
means the engine will do proper hermetic downloading and caching. - Grabbing
Snapshot
s out ofExecuteProcessResult
s - we should be able to futz with paths enough that we don't need global shared state that side-steps the caching, as long as everything is relative to thepwd
.
|
||
@memoized_property | ||
def pants_owned_rustup_globs(self): | ||
return PathGlobsAndRoot(PathGlobs(['rustup']), self._cargo_bin_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need a /**
at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the compiled binary I think.
self._cmake.bin_dir(), | ||
self._go_dist.bin_dir(), | ||
self._protoc.bin_dir(), | ||
# TODO: execute this in a sealed path! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this means drop the env
, let's definitely do this before landing :)
|
||
@rule(RustupExe, [Select(Rustup)]) | ||
def unpack_rustup(rustup): | ||
if not is_executable(rustup.pants_owned_rustup_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be building additional non-hermetic caching mechanisms on top of the engine's actual caching...
I suspect the steps we need to follow to make this actually work are:
- Modify
BinUtil
sufficiently that it can under the hood use theUrlToFetch -> Snapshot
rule, and so that we can get thatSnapshot
rather than materialize to a known constant path. TheUrlToFetch -> Snapshot
rule is cached both in the engine, and in the store, so if it's been run once on a machine, it won't re-download, even if you don't have a daemon. - Have rustup.sh write to (a child of) the pwd, rather than a system global path.
1 doesn't sound too horrible to do as a small separate PR, and then rebase this onto that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@illicitonion : One important missing bit here is that we do not have a local process execution cache, which means that invoking a process to extract a tarball isn't cached, and would run every time.
I think that with that support, a lot more of this would be cacheable in-engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #6898 about this: it's worth consider whether that is worth doing as a blocker here... and if not, whether we think it would be useful to allow for lowering-latency/increasing parallelism in the rsc
task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a reasonable middle-ground to get this in would be for the is_executable
checks to exist in the v1 Task
to decide whether to make a product request for a RustupExe
from a version, or make a RustupExe
from a Snapshot/directory path/something.
Then we'd either block migrating from a v1 Task
to a @console_rule
on having a local process cache, or acknowledge that this bootstrapping should be rare and not worry about the performance hit of re-extracting a tar file.
|
||
|
||
class RustUpdateRequest(datatype([ | ||
('toolchain_root', text_type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure toolchain_root
is removed before we we merge :)
env=which_cargo_env, | ||
) | ||
which_cargo_res = yield Get(ExecuteProcessResult, ExecuteProcessRequest, which_cargo_req) | ||
cargo_bin_path = which_cargo_res.stdout.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stdout
is a binary_type
not a text_type
, so for py3-ness there should be a .decode("utf-8")
before the .strip()
yield Get(ExecuteProcessResult, ExecuteProcessRequest, | ||
rustup_update_request(['component', 'add', '--toolchain', version] + list(components), | ||
'add components {} to the toolchain'.format(components))) | ||
# NB: We don't use the symlink here, but we keep in the fs for backwards compat for now (see how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Messing with the real filesystem should happen in @console_rule
s or a v1 Task
s, not in @rule
s.
Let's pull this into the Task
for now, so that from there we get the CargoBin
and materialize it, then pass the CargoBin
into the next rule we run.
|
||
@rule(CargoInstallation, [Select(Rustup), Select(CargoBin)]) | ||
def ensure_cargo_installed(rustup, cargo_bin): | ||
if not is_executable(rustup.cargo_ensure_installed_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -89,6 +89,10 @@ def load_plugins(build_configuration, plugins, working_set): | |||
subsystems = entries['global_subsystems'].load()() | |||
build_configuration.register_subsystems(subsystems) | |||
|
|||
if 'rules' in entries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's pull this change into a separate PR, and add a test to show it works.
'**/*.rs', | ||
exclude=[ | ||
zglobs('**/target/*'), | ||
'process_execution/bazel_protos' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need a /**
?
|
||
# TODO: check that the engine is output to the dist dir, check that the engine works | ||
# somehow? | ||
self.assert_success(pants_run) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking that we've produced a .so
and that nm
or similar claims it has a symbol called _initnative_engine
would probably be a reasonable test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty awesome, thanks!
I think that it currently violates a few too many of the Engine's "laws" to be safe to land. The big pieces are: 1) getting things into Snapshots
from arbitrary paths, 2) touching the filesystem in @rules
3) checking things out of Snapshots
into the workspace.
We should definitely have tickets/design for 1 and 3 before landing this. We should also most likely have implemented one or both of those things.
2 is trickier... we should definitely not be doing this, but we don't currently have a way to enforce that. Design of enforcement is ... larger, and thus probably not a blocker for landing. But we definitely can't land @rules
that use import os
APIs to access files, so the code just needs changes there. As Daniel said, actually using/expanding the Engine's caching to make poking paths unnecessary for caching feels like a path forward here.
# coding=utf-8 | ||
# Copyright 2018 Pants project contributors (see CONTRIBUTORS.md). | ||
# Licensed under the Apache License, Version 2.0 (see LICENSE). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be defined under pants-plugins
, which would send the signal that this is a backend that is only used in the pantsbuild/pants repo, and not actually shipped (and we shouldn't ship it!).
|
||
@memoized_property | ||
def pants_owned_rustup_globs(self): | ||
return PathGlobsAndRoot(PathGlobs(['rustup']), self._cargo_bin_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the compiled binary I think.
description='execute rustup', | ||
env=rustup.rustup_exec_env(), | ||
) | ||
# TODO: this isn't remotable -- it will modify the local filesystem at CARGO_HOME and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a blocker... although nothing is preventing access to CWD
here, we definitely should not be relying on that access.
So the big pieces here (and I think that they're important to plan/design/hopefully-implement before landing this in order to actually prove out the assumptions of the model):
- Using
PathGlobsAndRoot
from within an@rule
is not currently kosher because we cannot invalidate dependencies on files outside of the workdir. - The right way to get files out of a
Snapshot
and into a local working copy is via https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/scheduler.py#L332-L347... but that is not exposed to@rules
, and shouldn't be. It could probably be exposed to@console_rules
, with a bit of design work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 2, we can just do this in the v1 Task
for now, and worry about the design work later :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with that.
|
||
@rule(RustupExe, [Select(Rustup)]) | ||
def unpack_rustup(rustup): | ||
if not is_executable(rustup.pants_owned_rustup_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@illicitonion : One important missing bit here is that we do not have a local process execution cache, which means that invoking a process to extract a tarball isn't cached, and would run every time.
I think that with that support, a lot more of this would be cacheable in-engine.
### Problem When working on #6892 (and later #6891) I wasn't sure whether I was seeing an issue in the engine relating to requesting products from transitively linked rules -- I was able to figure out that the engine did not have any such issue, using the `--native-engine-visualize-to` option. However, these tests will help me to avoid worrying about that case in the future. ### Solution - Add the most basic of testing to `test_scheduler.py`.
I believe this is superseded by #9781. |
Problem
It would be great if we could use pants to build the native engine instead of a bunch of (very nice) shell scripts. See the "Result" section for how I plan to obtain a pair of pants in order to run this ruleset.
Solution
src/python/pants/backend/cargo_bootstrap/
with acargo_dist()
target and aBootstrapCargo
task.rustup.py
which attempts to directly translate the logic inbuild-support/bin/native/bootstrap_rust.sh
.yield Get(Snapshot, PathGlobsAndRoot, ...)
within the body of an@rule
.Result
After discussion with @stuhood, we think one possible way to use these rules is to modify the
./pants
runner script to download the most recent pants release pex, invoke that pex with this backend to build the native engine, then continue as normal (replacing the line that invokesbootstrap_code.sh
). We can't do that in this PR because we need the previous release to contain the native engine changes required to support usage ofPathGlobsAndRoot
in rule bodies.I could probably use an
@console_rule
instead of theBootstrapCargo
task post #6880.There are a lot of
yield Get(ExecuteProcessResponse, ...)
calls which aren't assigned to any variable -- this feels like a code smell and could probably be fixed in a followup PR.