Update engine README for Params (#7600)

### Problem The engine `README.md` had not been updated for the merge of "subjects" and `Variants` into `Params` (#6170). ### Solution Overhaul the explanations to refer to `Params`, and expand on how `Params` are propagated from dependents to dependencies. Additionally, simplify and document the codepath in rule graph construction that selects which `@rule` dependencies to use.
pantsbuild · Aug 2, 2019 · cdb2207 · cdb2207
1 parent 24e5f1a
commit cdb2207
Showing 1 changed file with 91 additions and 100 deletions.
diff --git a/src/python/pants/engine/README.md b/src/python/pants/engine/README.md
@@ -17,63 +17,74 @@ Once the engine is instantiated with a valid set of `@rule`s, a caller can synch
 computation of any of the product types provided by those `@rule`s by calling:
 
 ```python
-# Request a ThingINeed (a `Product`) for the thing_i_have (a `Subject`).
+# Request a ThingINeed (a `Product`) for a thing_i_have (a `Param`).
 thing_i_need, = scheduler.product_request(ThingINeed, [thing_i_have])
 ```
 
 The engine then takes care of concurrently executing all dependencies of the matched `@rule`s to
 produce the requested value.
 
-### Products and Subjects
+### Products and Params
 
 The engine executes your `@rule`s in order to (recursively) compute a `Product` of the requested
-type for a given `Subject`. This recursive type search leads to a very loosely coupled (and yet
+type for a set of `Param`s. This recursive type search leads to a loosely coupled (and yet
 still statically checked) form of dependency injection.
 
-When an `@rule` runs, it runs for a particular `Subject` value, which is part of the unique
-identity for that instance of the `@rule`. An `@rule` can request dependencies for different
-`Subject` values as it runs (see the section on `Get` requests below). Because the subject for
-an `@rule` is chosen by callers, a `Subject` can be of any (hashable) type that a user might want
-to compute a product for.
+When an `@rule` runs, it requires a set of `Param`s that the engine has determined are needed
+to compute its transitive `@rule` dependencies. So although an `@rule` might not have a particular
+`Param` type in its signature, it might depend on another `@rule` that does need that `Param`, and
+would thus need that `Param` in order to run. To see which `Params` the engine needs to run each
+`@rule`, refer to the `Visualization` section below.
 
-The return value of an `@rule` for a particular `Subject` is known as a `Product`. At some level,
-you can think of (`subject_value`, `product_type`) as a "key" that uniquely identifies a particular
-Product value and `@rule` execution.
+Any hashable type with useful equality may be used as a `Param`, and additional `Params` can be
+provided to an `@rule`'s dependencies via `Get` requests (see below). Each `Param` value in a set
+of `Params` is unique by type, so if `@rules` recursively introduce a particular `Param` type,
+there will still only be one value for that type in each `@rule`, but it will change as you move
+deeper into the dependency graph.
+
+The return value of an `@rule` is known as a `Product`. At some level, you can think
+of `(product_type, params_set)` as a "key" that uniquely identifies a particular `Product` value
+and `@rule` execution. If an `@rule` is able to produce a `Product` without consuming any `Params`,
+then the `@rule` will run exactly once, and the value that it produces will be a singleton.
 
 #### Example
 
 As a very simple example, you might register the following `@rule` that can compute a `String`
-Product given a single `Int` input.
+Product given a single `Int` argument.
 
 ```python
-@rule(StringType, [IntType])
+@rule(str, [int])
 def int_to_str(an_int):
-  return '{}'.format(an_int)
+  return str(an_int)
 ```
 
-The first argument to the `@rule` decorator is the Product (ie, return) type for the `@rule`. The
-second argument is a list of parameter selectors that declare the types of the input parameters for
-the `@rule`. In this case, because the Product type is `StringType` and there is one parameter
-selector (for `IntType`), this `@rule` represents a conversion from `IntType` to `StrType`, with no
-other inputs.
+The first argument to the `@rule` decorator is the `Product` (ie, return) type for the `@rule`. The
+second argument is a list of "parameter selectors" that declare the types of the input parameters for
+the `@rule`. In this case, because the `Product` type is `str` and there is one parameter
+selector (for `int`), this `@rule` represents a conversion from `int` to `str`, with no other inputs.
+
+When the engine encounters this `@rule` while compiling the rule graph for `str`-producing-`@rules`,
+it will next go hunting for the dependency `@rule` that can produce an `int` using the fewest number
+of `Params`. For example, if there was an `@rule` that could produce an `int` without consuming any
+`Params` at all (ie, a singleton), then that `@rule` would always be chosen first. If all `@rules` to
+produce `int`s required at least one `Param`, then the engine would next see whether the input `Params`
+contained an `int`, or whether there were any `@rules` that required only one `Param`, then two
+`Params`, and so on.
 
-When the engine statically checks whether it can use this `@rule` to create a string for a
-Subject, it will first see whether there are any ways to get an IntType for that Subject. If
-the subject is already of `type(subject) == IntType`, then the `@rule` will be satisfiable without
-any other dependencies. On the other hand, if the type _doesn't_ match, the engine doesn't give up:
-it will next look for any other registered `@rule`s that can compute an IntType Product for the
-Subject (and so on, recursively).
+In cases where this search detects any ambiguity (generally because there are two or more `@rules` that
+can provide the same product with the same number of parameters), rule graph compilation will fail with
+a useful error message.
 
 ### Datatypes
 
-In practical use, using basic types like `StringType` or `IntType` does not provide enough
-information to disambiguate between various types of data. So declaring small `datatype`
-definitions to provide a unique and descriptive type is strongly recommended:
+In practical use, builtin types like `str` or `int` do not provide enough information to disambiguate
+between various types of data in `@rule` signatures, so declaring small `datatype` definitions to
+provide a unique and descriptive type is highly recommended:
 
 ```python
 class FormattedInt(datatype(['content'])): pass
 
-@rule(FormattedInt, [IntType])
+@rule(FormattedInt, [int])
 def int_to_str(an_int):
   return FormattedInt('{}'.format(an_int))
 
@@ -105,29 +116,32 @@ class TypedDatatype(datatype([('field_name', Exactly(str, int))])):
 ```
 
 Assigning a specific type to a field can be somewhat unidiomatic in Python, and may be unexpected or
-unnatural to use. Additionally, the engine already applies a form of implicit type checking by
-ensuring there is a unique path from subject to product when a product request is made. However,
-regardless of whether the object is created directly with type-checked fields or whether it's
-produced from a set of rules by the engine's dependency injection, it is extremely useful to
-formalize the assumptions made about the value of an object into a specific type, even if the type
-just wraps a single field. The `datatype()` function makes it simple and efficient to apply that
-strategy.
+unnatural to use. However, regardless of whether the object is created directly with type-checked
+fields or whether it's produced from a set of rules by the engine's dependency injection, it is
+extremely useful to formalize the assumptions made about the value of an object into a specific type,
+even if the type just wraps a single field. The `datatype()` function makes it simple and efficient
+to apply that strategy.
 
-### Parameter selectors and Gets
+### Gets and RootRules
 
-As demonstrated above, parameter selectors select `@rule` inputs in the context of a particular
-`Subject` (and its `Variants`: discussed below). But it is frequently necessary to "change" the
-subject and request products for subjects other than the one that the `@rule` is running for.
+As demonstrated above, parameter selectors select `@rule` arguments in the context of a set of `Params`.
+But where do `Params` come from?
 
-In cases where this is necessary, `@rule`s may be written as coroutines (ie, using the python
-`yield` statement) that yield "`Get` requests" that request products for other subjects. Just like
-`@rule` parameter selectors, `Get` requests instantiated in the body of an `@rule` are statically
-checked to be satisfiable in the set of installed `@rule`s.
+One source of `Params` is the root of a request, where a `Param` type that may be provided by a caller
+of the engine can be declared using a `RootRule`. Installing a `RootRule` is sometimes necessary to
+seal the rule graph in cases where a `Param` could only possibly be computed outside of the rule graph
+and then passed in.
+
+The second case for introducing new `Params` occurs within the running graph when an `@rule` needs
+to pass values to its dependencies that are necessary to compute a product. In this case, `@rule`s may
+be written as coroutines (ie, using the python `yield` statement) that yield "`Get` requests" that request
+products for other `Params`. Just like `@rule` parameter selectors, `Get` requests instantiated in the
+body of an `@rule` are statically checked to be satisfiable in the set of installed `@rule`s.
 
 #### Example
 
 For example, you could declare an `@rule` that requests FileContent for each entry in a Files list,
-and then concatentates that content into a (typed) string:
+and then concatentates that content into a (datatype-wrapped) string:
 
 ```python
 @rule(ConcattedFiles, [Files])
@@ -136,27 +150,27 @@ def concat(files):
   yield ConcattedFiles(''.join(fc.content for fc in file_content_list))
 ```
 
-This `@rule` declares that: "for any Subject for which we can compute `Files`, we can also compute
-`ConcattedFiles`". Each yielded `Get` request results in FileContent for a different File Subject
-from the Files list.
+This `@rule` declares that: "for any `Params` for which we can compute `Files`, we can also compute
+`ConcattedFiles`". Each yielded `Get` request results in FileContent for a different File `Param`
+from the Files list. And, happily, all of these requests can proceed in parallel.
+
+### Advanced Param Usage
 
-### Variants
+Sometimes `@rule`s will need to consume multiple `Params` in order to tailor their output Products
+to their consumers.
 
-Certain `@rule`s will also need parameters provided by their dependents in order to tailor their output
-Products to their consumers.  For example, a javac `@rule` might need to know the version of the java
-platform for a given dependent binary target (say Java 9), or an ivy `@rule` might need to identify a
-globally consistent ivy resolve for a test target.  To allow for this the engine introduces the
-concept of `Variants`, which are passed recursively from dependents to dependencies.
+For example, a javac `@rule` might need to know the version of the java platform for a given
+dependent binary target, or an ivy `@rule` might need to identify a globally consistent ivy resolve
+for a test target. In both of these cases, the `@rule` requires two `Params` to be in scope. But
+due to the fact that `Params` are implicitly propagated from dependents to dependencies, it's possible
+for these `Params` to be provided much higher in the graph, without intermediate `@rules` needing to
+be aware of them.
 
-If a Rule uses a `SelectVariants` Selector to indicate that a variant is required, consumers can use
-a `@[type]=[name]` address syntax extension to pass a variant that matches a particular configuration
-for a `@rule`. A dependency declared as `src/java/com/example/lib:lib` specifies no particular variant, but
-`src/java/com/example/lib:lib@java=java8` asks for the configured variant of the lib named "java8".
+The result would be that any subgraph that transitively consumed a `Param` to produce Java 11 (for
+example) would be safely isolated and distinct from one that produced Java 9.
 
-Additionally, it is possible to specify the "default" variants for an Address by installing an `@rule`
-function that can provide `Variants(default=..)`. Since the purpose of variants is to collect
-information from dependents, only default variant values which have not been set by a dependent
-will be used.
+_(This section needs an example, but that will have to wait for
+[#7490](https://github.com/pantsbuild/pants/issues/7490)!)_
 
 ## Internal API
 
@@ -168,44 +182,32 @@ To compute a value for a Node, the engine uses the `Node.run` method starting fr
 roots. If a Node needs more inputs, it requests them via `Context.get`, which will declare a
 dependency, and memoize the computation represented by the requested `Node`.
 
-The initial Nodes are [launched by the engine](https://github.com/pantsbuild/pants/blob/16d43a06ba3751e22fdc7f69f009faeb59a33930/src/rust/engine/src/scheduler.rs#L116-L126),
-but the rest of execution is driven by Nodes recursively calling `Context.get` to request their
-dependencies.
+This recorded `Graph` tracks all dependencies between `@rules` and builtin "intrinsic" rules that
+provide filesystem and network access. That dependency tracking allows for invalidation and dirtying
+of `Nodes` as their dependencies change.
 
-### Registering Rules
+## Registering Rules
 
-Currently, it is only possible to load rules into the pants scheduler in two ways: by importing and
-using them in `src/python/pants/bin/engine_initializer.py`, or by adding them to the list returned
-by a `rules()` method defined in `src/python/backend/<backend_name>/register.py`. Plugins cannot add
-new rules yet. Unit tests, however, can mix in `TestBase` from
-`tests/python/pants_test/test_base.py` to generate and execute a scheduler from a given set of
-rules.
+The recommended way to install `@rules` is to return them as a list from a `def rules()` definition
+in a plugin's `register.py` file. Unit tests can either invoke `@rules` with fully mocked
+dependencies via `pants_test.engine.util.run_rule`, or extend `pants_test.test_base.TestBase` to
+construct and execute a scheduler for a given set of rules.
 
 In general, there are two types of rules that you can define:
 
 1. an `@rule`, which has a single product type and selects its inputs as described above.
-2. a `RootRule`, which declares a type that can be used as a *subject*, which means it can be
-   provided as an input to a `product_request()`.
-
-In more depth, a `RootRule` for some type is required when no other rule might provide that
-type (i.e. it is not provided as the product of any `@rule`) in some context. In the absence of a
-`RootRule`, any subject type involved in a request "at runtime" (i.e. via `product_request()`),
-would show up as an an unused or impossible path in the rule graph. Another potential name for
-`RootRule` might be `ParamRule`, or something similar, as it can be thought of as saying that the
-type represents a sort of "public API entrypoint" via a `product_request()`.
-
-Note that `Get` requests do not require a `RootRule`, as their requests are statically verified when
-the `@rule` definition is parsed, so we know before runtime that they might be requested.
+2. a `RootRule`, which declares a type that a caller of the engine may provide as a `Param` in a
+   call to `Scheduler.product_request(..)` (ie, at the "root" of the graph).
 
 This interface is being actively developed at this time and this documentation may be out of
 date. Please feel free to file an issue or pull request if you notice any outdated or incorrect
 information in this document!
 
-## Execution
+## Visualization
 
-The engine executes work concurrently wherever possible; to help visualize executions, a visualization
-tool is provided that, after executing a `Graph`, generates a `dot` file that can be rendered using
-Graphviz:
+To help visualize executions, the engine can render both the static rule graph that is compiled
+on startup, and also the content of the `Graph` that is produced while `@rules` run. This generates
+`dot` files that can be rendered using Graphviz:
 
 ```console
 $ mkdir viz
@@ -214,17 +216,6 @@ $ ls viz
 run.0.dot
 ```
 
-## Native Engine
-
-The native engine is integrated into the pants codebase via `native.py` in
-this directory along with `build-support/bin/native/bootstrap.sh` which ensures a
-pants native engine library is built and available for linking. The glue is the
-sha1 hash of the native engine source code used as its version by the `Native`
-class. This hash is maintained by `build-support/bin/native/bootstrap.sh` and
-output to the `native_engine_version` file in this directory. Any modification
-to this resource file's location will need adjustments in
-`build-support/bin/native/bootstrap.sh` to ensure the linking continues to work.
-
 ## History
 
 The need for an engine that could schedule all work as a result of linking required products to
@@ -240,7 +231,7 @@ Work stalled on the later phases of the `RoundEngine` and talks re-booted about
 it stood and proposed the idea of a "tuple-engine".  With some license taken in representation, this
 idea took the `RoundEngine` to the extreme of generating a round for each target-task pair.  The
 pair formed the tuple of schedulable work and this concept combined with others to form the design
-[here][tuple-design].
+[here][https://docs.google.com/document/d/1OARyIZSnw6XQiPlMydi57l_tS_JbFTJH6KLX61kPInI/edit?usp=sharing].
 
 Meanwhile, need for fine-grained parallelism was acute to help speed up jvm compilation, especially
 in the context of scala and mixed scala & java builds.  Twitter spiked on a project to implement