make a rust ruleset to wrap cargo and build the native engine #6891

cosmicexplorer · 2018-12-10T01:27:39Z

Problem

It would be great if we could use pants to build the native engine instead of a bunch of (very nice) shell scripts. See the "Result" section for how I plan to obtain a pair of pants in order to run this ruleset.

Solution

Create a new backend in src/python/pants/backend/cargo_bootstrap/ with a cargo_dist() target and a BootstrapCargo task.
Create a ruleset in rustup.py which attempts to directly translate the logic in build-support/bin/native/bootstrap_rust.sh.
Make some changes to the rust code to support doing a yield Get(Snapshot, PathGlobsAndRoot, ...) within the body of an @rule.
Add an integration test that builds the native engine using the new ruleset.

Result

After discussion with @stuhood, we think one possible way to use these rules is to modify the ./pants runner script to download the most recent pants release pex, invoke that pex with this backend to build the native engine, then continue as normal (replacing the line that invokes bootstrap_code.sh). We can't do that in this PR because we need the previous release to contain the native engine changes required to support usage of PathGlobsAndRoot in rule bodies.

I could probably use an @console_rule instead of the BootstrapCargo task post #6880.

There are a lot of yield Get(ExecuteProcessResponse, ...) calls which aren't assigned to any variable -- this feels like a code smell and could probably be fixed in a followup PR.

…e:new-cargo`

…orks!

illicitonion

This is really exciting, thanks for putting it together! I think there are ~2 pieces of work we should do before we can merge...

We definitely shouldn't be exposing PathGlobsAndRoot to @rules. There are two reasons this is happening at the moment, as far as I can see:

BinUtils - modifying BinUtil so that it uses UrlToFetch means the engine will do proper hermetic downloading and caching.
Grabbing Snapshots out of ExecuteProcessResults - we should be able to futz with paths enough that we don't need global shared state that side-steps the caching, as long as everything is relative to the pwd.

illicitonion · 2018-12-10T09:36:09Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+
+  @memoized_property
+  def pants_owned_rustup_globs(self):
+    return PathGlobsAndRoot(PathGlobs(['rustup']), self._cargo_bin_dir)


Doesn't this need a /** at the end?

This is the compiled binary I think.

illicitonion · 2018-12-10T09:37:29Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+        self._cmake.bin_dir(),
+        self._go_dist.bin_dir(),
+        self._protoc.bin_dir(),
+        # TODO: execute this in a sealed path!


Assuming this means drop the env, let's definitely do this before landing :)

illicitonion · 2018-12-10T09:45:26Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+
+@rule(RustupExe, [Select(Rustup)])
+def unpack_rustup(rustup):
+  if not is_executable(rustup.pants_owned_rustup_path):


We shouldn't be building additional non-hermetic caching mechanisms on top of the engine's actual caching...

I suspect the steps we need to follow to make this actually work are:

Modify BinUtil sufficiently that it can under the hood use the UrlToFetch -> Snapshot rule, and so that we can get that Snapshot rather than materialize to a known constant path. The UrlToFetch -> Snapshot rule is cached both in the engine, and in the store, so if it's been run once on a machine, it won't re-download, even if you don't have a daemon.

Have rustup.sh write to (a child of) the pwd, rather than a system global path.

1 doesn't sound too horrible to do as a small separate PR, and then rebase this onto that one.

@illicitonion : One important missing bit here is that we do not have a local process execution cache, which means that invoking a process to extract a tarball isn't cached, and would run every time.

I think that with that support, a lot more of this would be cacheable in-engine.

Opened #6898 about this: it's worth consider whether that is worth doing as a blocker here... and if not, whether we think it would be useful to allow for lowering-latency/increasing parallelism in the rsc task.

I think a reasonable middle-ground to get this in would be for the is_executable checks to exist in the v1 Task to decide whether to make a product request for a RustupExe from a version, or make a RustupExe from a Snapshot/directory path/something.

Then we'd either block migrating from a v1 Task to a @console_rule on having a local process cache, or acknowledge that this bootstrapping should be rare and not worry about the performance hit of re-extracting a tar file.

illicitonion · 2018-12-10T09:47:29Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+
+
+class RustUpdateRequest(datatype([
+    ('toolchain_root', text_type),


We should make sure toolchain_root is removed before we we merge :)

illicitonion · 2018-12-10T09:53:02Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+    env=which_cargo_env,
+  )
+  which_cargo_res = yield Get(ExecuteProcessResult, ExecuteProcessRequest, which_cargo_req)
+  cargo_bin_path = which_cargo_res.stdout.strip()


stdout is a binary_type not a text_type, so for py3-ness there should be a .decode("utf-8") before the .strip()

illicitonion · 2018-12-10T09:58:56Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+    yield Get(ExecuteProcessResult, ExecuteProcessRequest,
+              rustup_update_request(['component', 'add', '--toolchain', version] + list(components),
+                                    'add components {} to the toolchain'.format(components)))
+    # NB: We don't use the symlink here, but we keep in the fs for backwards compat for now (see how


Messing with the real filesystem should happen in @console_rules or a v1 Tasks, not in @rules.

Let's pull this into the Task for now, so that from there we get the CargoBin and materialize it, then pass the CargoBin into the next rule we run.

illicitonion · 2018-12-10T10:00:02Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+
+@rule(CargoInstallation, [Select(Rustup), Select(CargoBin)])
+def ensure_cargo_installed(rustup, cargo_bin):
+  if not is_executable(rustup.cargo_ensure_installed_path):


illicitonion · 2018-12-10T10:07:49Z

src/python/pants/init/extension_loader.py

@@ -89,6 +89,10 @@ def load_plugins(build_configuration, plugins, working_set):
      subsystems = entries['global_subsystems'].load()()
      build_configuration.register_subsystems(subsystems)

+    if 'rules' in entries:


Let's pull this change into a separate PR, and add a test to show it works.

illicitonion · 2018-12-10T10:08:58Z

src/rust/engine/BUILD

+    '**/*.rs',
+    exclude=[
+      zglobs('**/target/*'),
+      'process_execution/bazel_protos'


Doesn't this need a /**?

illicitonion · 2018-12-10T10:12:24Z

tests/python/pants_test/backend/cargo_bootstrap/test_cargo_bootstrap_integration.py

+
+    # TODO: check that the engine is output to the dist dir, check that the engine works
+    # somehow?
+    self.assert_success(pants_run)


Checking that we've produced a .so and that nm or similar claims it has a symbol called _initnative_engine would probably be a reasonable test?

stuhood

This is pretty awesome, thanks!

I think that it currently violates a few too many of the Engine's "laws" to be safe to land. The big pieces are: 1) getting things into Snapshots from arbitrary paths, 2) touching the filesystem in @rules 3) checking things out of Snapshots into the workspace.

We should definitely have tickets/design for 1 and 3 before landing this. We should also most likely have implemented one or both of those things.

2 is trickier... we should definitely not be doing this, but we don't currently have a way to enforce that. Design of enforcement is ... larger, and thus probably not a blocker for landing. But we definitely can't land @rules that use import os APIs to access files, so the code just needs changes there. As Daniel said, actually using/expanding the Engine's caching to make poking paths unnecessary for caching feels like a path forward here.

stuhood · 2018-12-10T20:29:53Z

src/python/pants/backend/cargo_bootstrap/register.py

+# coding=utf-8
+# Copyright 2018 Pants project contributors (see CONTRIBUTORS.md).
+# Licensed under the Apache License, Version 2.0 (see LICENSE).
+


This should probably be defined under pants-plugins, which would send the signal that this is a backend that is only used in the pantsbuild/pants repo, and not actually shipped (and we shouldn't ship it!).

stuhood · 2018-12-10T20:31:04Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+
+  @memoized_property
+  def pants_owned_rustup_globs(self):
+    return PathGlobsAndRoot(PathGlobs(['rustup']), self._cargo_bin_dir)


This is the compiled binary I think.

stuhood · 2018-12-10T20:33:12Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+      description='execute rustup',
+      env=rustup.rustup_exec_env(),
+    )
+    # TODO: this isn't remotable -- it will modify the local filesystem at CARGO_HOME and


This is a blocker... although nothing is preventing access to CWD here, we definitely should not be relying on that access.

So the big pieces here (and I think that they're important to plan/design/hopefully-implement before landing this in order to actually prove out the assumptions of the model):

Using PathGlobsAndRoot from within an @rule is not currently kosher because we cannot invalidate dependencies on files outside of the workdir.

The right way to get files out of a Snapshot and into a local working copy is via https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/scheduler.py#L332-L347... but that is not exposed to @rules, and shouldn't be. It could probably be exposed to @console_rules, with a bit of design work.

For 2, we can just do this in the v1 Task for now, and worry about the design work later :)

Fine with that.

stuhood · 2018-12-10T20:36:21Z

src/python/pants/backend/cargo_bootstrap/subsystems/rustup.py

+
+@rule(RustupExe, [Select(Rustup)])
+def unpack_rustup(rustup):
+  if not is_executable(rustup.pants_owned_rustup_path):


@illicitonion : One important missing bit here is that we do not have a local process execution cache, which means that invoking a process to extract a tarball isn't cached, and would run every time.

I think that with that support, a lot more of this would be cacheable in-engine.

### Problem When working on #6892 (and later #6891) I wasn't sure whether I was seeing an issue in the engine relating to requesting products from transitively linked rules -- I was able to figure out that the engine did not have any such issue, using the `--native-engine-visualize-to` option. However, these tests will help me to avoid worrying about that case in the future. ### Solution - Add the most basic of testing to `test_scheduler.py`.

Eric-Arellano · 2020-05-17T21:47:41Z

I believe this is superseded by #9781.

cosmicexplorer added 18 commits December 9, 2018 10:56

try

7bd586e

add rules to plugins and add a python_library()

5edccfa

save

111e28b

bootstrap

450c722

we did some more things

676bf28

undo

4bae4a2

comment out the bootstrap code for now

0813caa

move everything

e9b8a6e

run this with `./pants -ldebug bootstrap-native-engine src/rust/engin…

def6252

…e:new-cargo`

./pants -ldebug bootstrap-native-engine src/rust/engine:new-cargo w…

60801bf

…orks!

remove the commenting-out!

3c34a84

keep the bootstrapping part commented out

58c0a3f

add testing!!!

d6eeacf

simplify the testing a bit

b6b09a4

remove changes made for bootstrapping, as we have made rust changes

377ef3d

add the integration test tag

b26f821

remove TODO and ???

5ef2e97

fix formatting

ac335cc

cosmicexplorer requested review from stuhood and illicitonion December 10, 2018 01:27

fix mismatched types

e6285b9

illicitonion requested changes Dec 10, 2018

View reviewed changes

stuhood reviewed Dec 10, 2018

View reviewed changes

stuhood mentioned this pull request Dec 11, 2018

Local caching for ExecuteProcess #6898

Closed

cosmicexplorer mentioned this pull request Jan 1, 2019

add some testing for transitive v2 product requests #7015

Merged

cosmicexplorer mentioned this pull request Jan 17, 2019

use the v2 engine to implement bazel skyframe evaluation to use rulesets ~verbatim #7093

Closed

6 tasks

Eric-Arellano closed this May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make a rust ruleset to wrap cargo and build the native engine #6891

make a rust ruleset to wrap cargo and build the native engine #6891

cosmicexplorer commented Dec 10, 2018

illicitonion left a comment

illicitonion Dec 10, 2018

stuhood Dec 10, 2018

illicitonion Dec 10, 2018

illicitonion Dec 10, 2018

stuhood Dec 10, 2018

stuhood Dec 11, 2018

illicitonion Dec 11, 2018

illicitonion Dec 10, 2018

illicitonion Dec 10, 2018

illicitonion Dec 10, 2018

illicitonion Dec 10, 2018

illicitonion Dec 10, 2018

illicitonion Dec 10, 2018

illicitonion Dec 10, 2018

stuhood left a comment

stuhood Dec 10, 2018

stuhood Dec 10, 2018

stuhood Dec 10, 2018

illicitonion Dec 11, 2018

stuhood Dec 11, 2018

stuhood Dec 10, 2018

Eric-Arellano commented May 17, 2020



		class RustUpdateRequest(datatype([
		('toolchain_root', text_type),

make a rust ruleset to wrap cargo and build the native engine #6891

make a rust ruleset to wrap cargo and build the native engine #6891

Conversation

cosmicexplorer commented Dec 10, 2018

Problem

Solution

Result

illicitonion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eric-Arellano commented May 17, 2020