Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make a rust ruleset to wrap cargo and build the native engine #6891

Conversation

cosmicexplorer
Copy link
Contributor

Problem

It would be great if we could use pants to build the native engine instead of a bunch of (very nice) shell scripts. See the "Result" section for how I plan to obtain a pair of pants in order to run this ruleset.

Solution

  • Create a new backend in src/python/pants/backend/cargo_bootstrap/ with a cargo_dist() target and a BootstrapCargo task.
  • Create a ruleset in rustup.py which attempts to directly translate the logic in build-support/bin/native/bootstrap_rust.sh.
  • Make some changes to the rust code to support doing a yield Get(Snapshot, PathGlobsAndRoot, ...) within the body of an @rule.
  • Add an integration test that builds the native engine using the new ruleset.

Result

After discussion with @stuhood, we think one possible way to use these rules is to modify the ./pants runner script to download the most recent pants release pex, invoke that pex with this backend to build the native engine, then continue as normal (replacing the line that invokes bootstrap_code.sh). We can't do that in this PR because we need the previous release to contain the native engine changes required to support usage of PathGlobsAndRoot in rule bodies.

I could probably use an @console_rule instead of the BootstrapCargo task post #6880.

There are a lot of yield Get(ExecuteProcessResponse, ...) calls which aren't assigned to any variable -- this feels like a code smell and could probably be fixed in a followup PR.

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really exciting, thanks for putting it together! I think there are ~2 pieces of work we should do before we can merge...

We definitely shouldn't be exposing PathGlobsAndRoot to @rules. There are two reasons this is happening at the moment, as far as I can see:

  1. BinUtils - modifying BinUtil so that it uses UrlToFetch means the engine will do proper hermetic downloading and caching.
  2. Grabbing Snapshots out of ExecuteProcessResults - we should be able to futz with paths enough that we don't need global shared state that side-steps the caching, as long as everything is relative to the pwd.


@memoized_property
def pants_owned_rustup_globs(self):
return PathGlobsAndRoot(PathGlobs(['rustup']), self._cargo_bin_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need a /** at the end?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the compiled binary I think.

self._cmake.bin_dir(),
self._go_dist.bin_dir(),
self._protoc.bin_dir(),
# TODO: execute this in a sealed path!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this means drop the env, let's definitely do this before landing :)


@rule(RustupExe, [Select(Rustup)])
def unpack_rustup(rustup):
if not is_executable(rustup.pants_owned_rustup_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be building additional non-hermetic caching mechanisms on top of the engine's actual caching...

I suspect the steps we need to follow to make this actually work are:

  1. Modify BinUtil sufficiently that it can under the hood use the UrlToFetch -> Snapshot rule, and so that we can get that Snapshot rather than materialize to a known constant path. The UrlToFetch -> Snapshot rule is cached both in the engine, and in the store, so if it's been run once on a machine, it won't re-download, even if you don't have a daemon.
  2. Have rustup.sh write to (a child of) the pwd, rather than a system global path.

1 doesn't sound too horrible to do as a small separate PR, and then rebase this onto that one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illicitonion : One important missing bit here is that we do not have a local process execution cache, which means that invoking a process to extract a tarball isn't cached, and would run every time.

I think that with that support, a lot more of this would be cacheable in-engine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #6898 about this: it's worth consider whether that is worth doing as a blocker here... and if not, whether we think it would be useful to allow for lowering-latency/increasing parallelism in the rsc task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a reasonable middle-ground to get this in would be for the is_executable checks to exist in the v1 Task to decide whether to make a product request for a RustupExe from a version, or make a RustupExe from a Snapshot/directory path/something.

Then we'd either block migrating from a v1 Task to a @console_rule on having a local process cache, or acknowledge that this bootstrapping should be rare and not worry about the performance hit of re-extracting a tar file.



class RustUpdateRequest(datatype([
('toolchain_root', text_type),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure toolchain_root is removed before we we merge :)

env=which_cargo_env,
)
which_cargo_res = yield Get(ExecuteProcessResult, ExecuteProcessRequest, which_cargo_req)
cargo_bin_path = which_cargo_res.stdout.strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdout is a binary_type not a text_type, so for py3-ness there should be a .decode("utf-8") before the .strip()

yield Get(ExecuteProcessResult, ExecuteProcessRequest,
rustup_update_request(['component', 'add', '--toolchain', version] + list(components),
'add components {} to the toolchain'.format(components)))
# NB: We don't use the symlink here, but we keep in the fs for backwards compat for now (see how
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Messing with the real filesystem should happen in @console_rules or a v1 Tasks, not in @rules.

Let's pull this into the Task for now, so that from there we get the CargoBin and materialize it, then pass the CargoBin into the next rule we run.


@rule(CargoInstallation, [Select(Rustup), Select(CargoBin)])
def ensure_cargo_installed(rustup, cargo_bin):
if not is_executable(rustup.cargo_ensure_installed_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -89,6 +89,10 @@ def load_plugins(build_configuration, plugins, working_set):
subsystems = entries['global_subsystems'].load()()
build_configuration.register_subsystems(subsystems)

if 'rules' in entries:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pull this change into a separate PR, and add a test to show it works.

'**/*.rs',
exclude=[
zglobs('**/target/*'),
'process_execution/bazel_protos'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need a /**?


# TODO: check that the engine is output to the dist dir, check that the engine works
# somehow?
self.assert_success(pants_run)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking that we've produced a .so and that nm or similar claims it has a symbol called _initnative_engine would probably be a reasonable test?

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty awesome, thanks!

I think that it currently violates a few too many of the Engine's "laws" to be safe to land. The big pieces are: 1) getting things into Snapshots from arbitrary paths, 2) touching the filesystem in @rules 3) checking things out of Snapshots into the workspace.

We should definitely have tickets/design for 1 and 3 before landing this. We should also most likely have implemented one or both of those things.

2 is trickier... we should definitely not be doing this, but we don't currently have a way to enforce that. Design of enforcement is ... larger, and thus probably not a blocker for landing. But we definitely can't land @rules that use import os APIs to access files, so the code just needs changes there. As Daniel said, actually using/expanding the Engine's caching to make poking paths unnecessary for caching feels like a path forward here.

# coding=utf-8
# Copyright 2018 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be defined under pants-plugins, which would send the signal that this is a backend that is only used in the pantsbuild/pants repo, and not actually shipped (and we shouldn't ship it!).


@memoized_property
def pants_owned_rustup_globs(self):
return PathGlobsAndRoot(PathGlobs(['rustup']), self._cargo_bin_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the compiled binary I think.

description='execute rustup',
env=rustup.rustup_exec_env(),
)
# TODO: this isn't remotable -- it will modify the local filesystem at CARGO_HOME and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a blocker... although nothing is preventing access to CWD here, we definitely should not be relying on that access.

So the big pieces here (and I think that they're important to plan/design/hopefully-implement before landing this in order to actually prove out the assumptions of the model):

  1. Using PathGlobsAndRoot from within an @rule is not currently kosher because we cannot invalidate dependencies on files outside of the workdir.
  2. The right way to get files out of a Snapshot and into a local working copy is via https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/scheduler.py#L332-L347... but that is not exposed to @rules, and shouldn't be. It could probably be exposed to @console_rules, with a bit of design work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2, we can just do this in the v1 Task for now, and worry about the design work later :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with that.


@rule(RustupExe, [Select(Rustup)])
def unpack_rustup(rustup):
if not is_executable(rustup.pants_owned_rustup_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illicitonion : One important missing bit here is that we do not have a local process execution cache, which means that invoking a process to extract a tarball isn't cached, and would run every time.

I think that with that support, a lot more of this would be cacheable in-engine.

cosmicexplorer added a commit that referenced this pull request Jan 3, 2019
### Problem

When working on #6892 (and later #6891) I wasn't sure whether I was seeing an issue in the engine relating to requesting products from transitively linked rules -- I was able to figure out that the engine did not have any such issue, using the `--native-engine-visualize-to` option. However, these tests will help me to avoid worrying about that case in the future.

### Solution

- Add the most basic of testing to `test_scheduler.py`.
@Eric-Arellano
Copy link
Contributor

I believe this is superseded by #9781.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants