Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add core abstractions for the V2 Target API #9284

Merged
merged 14 commits into from
Mar 13, 2020

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Mar 12, 2020

Problem

See #4535 and the recent design doc.

This design implements the main goals of the Target API:

  1. Extensibility - add new target types.
  2. Extensibility - add new fields to pre-existing target types.
  3. Typed fields.
  4. All fields are lazily hydrated/validated.
  5. Has a utility for filtering based on required fields (see test_has_fields())
    • This is important to how rules will consume the Target API. python_test_runner.py might say something like if my_tgt.has_fields([PythonSources, Timeout]).
    • See Proposed Design for the importance of using Pure Python for this filtering, rather than engine code.
  6. Works 100% with MyPy.
  7. Nice repr and dir functions to allow for easy debugging.

Solution

Add Target and Field types. A Target type is a combination of several Fields that are valid together.

Given a target, the two main expected API calls will be:

my_tgt.has_fields([PythonSources, Compatibility])
compatibility: Compatibility = my_tgt.get(Compatibility)

MyPy understands both of these calls, including Target.get thanks to generics.

Lazy hydration - primitive vs. async fields

About 5% of fields require the engine for hydration, e.g. sources and bundles. The other 95% of fields, though, are nothing more than a str, List[str], bool, etc.

We do not want to complicate accessing the majority of fields. While some field values will need to be accessed via await Get, the majority should not be this complex.

So, we introduce PrimitiveField and AsyncField to distinguish between the two.

Thanks to MyPy and autocomplete, rule authors will quickly know when working with an AsyncField that they cannot call my_field.value like they normally could.

Extensibility - add new fields on pre-existing targets

This is a new feature that we want as a better version of tags. To add a new field, plugin authors simply need to create the new field then register, for example, UnionRule(PythonLibrary, TypeChecked).

The Target constructor will recognize this new field and treat it as first-class. Core Pants code can safely ignore the field, while plugins can now use the new field.

Followups

  1. Add a way to go from HydratedStruct -> Target, so that we can actually use this with production code.
  2. Add syntactic sugar for declaring new fields, e.g. add BoolField and StringField.
  3. Add a mechanism to actually register new target types and their aliases, e.g. a new backend registration hook
def targets() -> Sequence[Type[Target]]:
   return (PythonLibrary, PythonTests)
  1. Add implementations for Python targets as a real-world use case.
  2. Use the new Python target types in some rules like Black and Isort.
  3. Allow going from a V2 Target to a V1 Target.
  4. Improve error messages for invalid fields and targets, e.g. point to the actual target upon failure.
  5. Remove TargetAdaptor.

TODOs:
* How do we handle the 10% of fields that need the engine for hydration?
* Given a target, how do you actually get its fields?
* Can we simplify hooking up Targets to custom new fields?
This makes things more explicit when declaring a new target type. It moves the fields used closer to the actual Target definition (principle of proximity).

We still use unions to allow adding custom fields.
We must store all the unhydrated values at parse time. Once we parse a BUILD file, we never want to re-parse it, so we instead store all that information in the most minimally hydrated form possible.

We also calculate all default values when the fields are not explicitly defined.

We add Target.get(Type[Field]) -> Field as a way to get MyPy working with the target adaptor. Through generics, MyPy will know exactly which Field was returned!

The definition of Target is now mostly complete!
This method is going to be used _a lot_ by call sites. `.hydrated` is too technical.

It's fine to keep `unhydrated`, because that's more of an implementation detail that should not be referenced by your typical use case.
* Add __repr__ and __str__
* Remove @DataClass where it wasn't needed. It was overriding the superclass's __repr__ implementation. And this means that call sites have less imports and less code. The subclasses are still dataclasses!
* Rename unhydrated to raw_value. This fits better with terminology understood by the average plugin developer and pairs nicely with `value`
This will allow us to ergonomically filter out irrelevant targets! Call sites can simply say `my_target.has_fields()` with the fields they care about.
…aw_value

Field.raw_value should always be None if it was not configured. We shouldn't allow default values.
Copy link
Contributor Author

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello. I recommend starting by reading the PR description, then jumping straight to target_test.py to see what the goals of this project look like in practice.

This is a paradigm shift from V1, so it might look weird at first glance. Particularly different is the idea of Typed Fields and of having a unified mechanism for hydration and/or validation of fields.

As mentioned in the follow-ups section, there is much room for syntactic sugar, as well, such as reducing the verbosity of declaring new fields by adding sugar, like BoolField and StringField.

"""


_F = TypeVar("_F", bound=Field)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is important. It's what gets MyPy to understand everything, like that my_tgt.get(Compatibility).value would be List[str] but my_tgt.get(ZipSafe).value would be bool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a magical thing, but also this line is totally inscrutable. Maybe add this comment to the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I think probably not to adding a comment. We use TypeVars pretty liberally in Pants and MyPy has good documentation on it: https://mypy.readthedocs.io/en/stable/generics.html

Our usage here is very idiomatic for how you do generics in Python.

I only wanted to call attention because this one line of code has much more power than you might think at first glance.

# These get calculated in the constructor
plugin_fields: Tuple[Type[Field], ...]
field_values: Dict[Type[Field], Any]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely, we'll want to add an Address property. I believe every Target has an Address. I saved it for a followup because this PR is only focused on adding the core abstractions.

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really exciting step, congrats!

I left a bunch of open questions, but most of them are idle thoughts about the future, rather than blockers for this. Hopefully which is which is obvious! :)_

def test_invalid_fields_rejected() -> None:
with pytest.raises(ValueError) as exc:
HaskellTarget({"invalid_field": True})
assert "Unrecognized field `invalid_field=True` for target type `haskell`." in str(exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important follow-up is going to be attributing this to a particular target in a particular file (and ideally a particular line in that file, but I suspect that'll be pretty hard with our current parsing model)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely agreed. At a minimum, we should point to the owning target. This is possible in a followup change.


class HaskellSources(AsyncField):
alias: ClassVar = "sources"
raw_value: Optional[List[str]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are default values modelled here? Is it just by adding = ["*.hs"] to the end of this line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as a follow-up, is there a way to set post-hydration defaults? i.e. is there a way for me to provide a default Snapshot rather than a default PathGlobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, default values would be applied in the corresponding rule @hydrate_haskell_sources.

Almost certainly, what will actually end up happening is that we create a Sources(AsyncField) class and have PythonLibrarySources, PythonTestsSources, etc all subclass it. That superclass would expect a field like default_globs and would use that to correctly apply them. We have total flexibility here because these are simple dataclasses that work with inheritance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as a follow-up, is there a way to set post-hydration defaults? i.e. is there a way for me to provide a default Snapshot rather than a default PathGlobs?

Not yet, but we could easily add that because these are simple Python dataclasses that respect inheritance.

(For now, I'm trying to avoid adding any sugar until we know we actually need it.)

# Test pre-hydration validation
with pytest.raises(ValueError) as exc:
hydrate_field(raw_source_files=[0, 1, 2], hydrated_source_files=()) # type: ignore[call-arg]
assert "Not all elements of the iterable have type" in str(exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again user-facing error-wise ideally with will be a lot more guiding; I'd hope it would read something like:
"java_library target some:target defined in some/BUILD:15 has an invalid sources argument. Expected a list with all elements of type string, but found number element(s)"

Fine with this as-is today, but we should make sure we have a path to something like the above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There's a path for this. The first change is (likely) storing an Address on each Field so that they can refer to it in any error messages.

assert tgt.has_fields([HaskellGhcExtensions, UnrelatedField]) is False


def test_field_hydration_is_lazy() -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idle question: How would you feel about a ./pants validate-build-files goal or similar? People used to use ./pants list to make sure BUILD files mostly look right, and it feels like something to force pre- and post-hydration validation, without all of the cost of actually compiling and running tests, would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to have that!

return False
return self.raw_value

union_membership = UnionMembership(OrderedDict({HaskellField: OrderedSet([CustomField])}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here that every Target type will have its own union? There are no union parent-child relationships, so will we end up with separate PythonBinaryField and PythonLibraryField and PythonTestsField unions which are 90% the same but each has a few extra elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although, those unions are only for plugin fields, meaning users are adding additional custom fields to pre-existing target types.

So, a user's register.py would likely look like this:

def rules():
  return [UnionRule(python_tgt, TypeChecked) for python_tgt in [PythonBinary, PythonLibrary, Pythontests])

return False
return self.raw_value

union_membership = UnionMembership(OrderedDict({HaskellField: OrderedSet([CustomField])}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the case that each UnionMembership call from different plugins will merge? So if I have a plugin adding coverage, and a plugin adding a formatter, and both want to add fields, we'll end up with one OrderedSet with some arbitrary order from each plugin merged together? Or something different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding a new UnionRule appends to the list, rather than overwriting. This is how we are able to have multiple Linters at the same time, where each registers UnionRule(Linter, BanditLinter), UnionRule(Linter, IsortFormatter), etc.

The value will possibly be first hydrated and/or validated, such as using a default value
if the field was not defined or ensuring that an int value is positive.

This property is memoized because hydration and validation can often be costly. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting... v2 rules are a mechanism we've built for caching things which may be costly, and which we already need to use for other field types.

What's the argument for not just making all of this caching happen in the v2 graph, and having people access all fields the same, whether or not the engine is just caching or also doing other heavy lifting? Just that the await Get feels heavy to have to write?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only is the await Get heavy, but adding a rule for every single Field is extremely heavy-weight, both with boilerplate and with blowing up the rule graph. For example, we'd need to introduce a Result class for each of the PrimitiveFields now so that the graph has a product - we couldn't use a generic wrapper like FieldResult because then MyPy couldn't infer the underlying type, and 100% MyPy support is a crucial requirement of this project.

Compare:

compatibility: List[str] = my_tgt.get(Compatibility).value

vs.

compatibility_result = await Get[CompatibilityResult](Compatibility, my_tgt.get(Compatiblity)
compatibility: List[str] = compatibility_result.value

Also, it's worth noting that the majority of the hydration/validation for PrimitiveField will be exceptionally simple. E.g. if the field was not defined, use this default value. Many fields will simply return self.raw_value

Thanks Daniel!
We can simplify have plugin authors add a `UnionRule` against the `Target` itself, rather than needing some sentinel `PluginField` type.

This greatly reduces the boilerplate needed to define a new target. Now, we avoid creating an additional class for every target and an additional property for every target.

This also means there are fewer concepts to understand the Target API. This extensibility is a neat feature that you get automatically and can generally ignore, whereas previously you had to add boilerplate to get it working.
Comment on lines +69 to +71
class HaskellTarget(Target):
alias: ClassVar = "haskell"
core_fields: ClassVar = (HaskellGhcExtensions, HaskellSources)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all it will ever take to define a new target type! 3 lines of declarative code 🎉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

So... while I recognize the desire to keep the Target declaration site very small, having a different pattern for "core" (better name might be "default"?) fields than for extension fields might actually make things more challenging to document and explain.

It's ok to roll with this for now though I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a different pattern for "core" (better name might be "default"?) fields than for extension fields might actually make things more challenging to document and explain.

Generally, users will always want to access self.field_types. This is what everything like Target.get() and Target.has_sources() uses. So, plugin fields do behave as first-class citizens.

We only differentiate between core fields vs. plugin fields at the constructor time so that:

  1. We have better debugging. You can see in repr() which fields were added by you.
  2. We can declare core_fields as a class var, whereas plugin_types must be an instance var because we have to inspect UnionRule at runtime. It's generally easier to work with class vars so we should use that where possible.

--

Why not unify plugin and core fields to always be set up via UnionMembership? Originally, I did take this approach. See the first few commits. But, it resulted in horrible verbosity and violated the principle of proximity.

Instead of core_fields: ClassVar = (HaskellGhcExtensions, HaskellSources) in the class definition, we'd need [UnionRule(HaskellTargetField, HaskellGhcExtensions), UnionRule(HaskellTargetField, HaskellSources).

There's not a very compelling reason to have that verbosity + reduced proximity.

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I love where this is heading.

Comment on lines +148 to +156
def __repr__(self) -> str:
return (
f"{self.__class__}("
f"alias={repr(self.alias)}, "
f"core_fields={list(self.core_fields)}, "
f"plugin_fields={list(self.plugin_fields)}, "
f"raw_field_values={list(self.field_values.values())}"
f")"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Thank you for making this so much easier to debug!

Copy link
Contributor

@TansyArron TansyArron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome, Eric! Thank you for all the super documentation and for making the effort to make Targets easier to debug! ❤️

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Eric-Arellano ! Looks great.

Actually having @rules in place in the test to demonstrate usage would have been good... I'm unsure about how the AsyncField vs Field split is going to look in practice. But not a blocker.

alias: ClassVar = "sources"
raw_value: Optional[List[str]]

def validate_pre_hydration(self) -> None:
Copy link
Member

@stuhood stuhood Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, validate_pre_hydration and validate_post_hydration feel like they can/should go in the @rule that hydrates the field, otherwise your field definition API requires declaring/extending three different methods.

Was the idea here that someone might want to change the rule that hydrates a field, while keeping the validation of the field the same? That feels like a pattern that we could wait to see happening in the wild before adding support for... or one that we should explicitly label here as something that we're anticipating (and should remove complexity for if we don't see the pattern in practice.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed over video chat, there is no super compelling reason to have these methods, other than the principle of proximity and trying to give some structure to AsyncField. I'll remove them for now to avoid premature optimization.

Comment on lines +69 to +71
class HaskellTarget(Target):
alias: ClassVar = "haskell"
core_fields: ClassVar = (HaskellGhcExtensions, HaskellSources)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

So... while I recognize the desire to keep the Target declaration site very small, having a different pattern for "core" (better name might be "default"?) fields than for extension fields might actually make things more challenging to document and explain.

It's ok to roll with this for now though I think.

return False
return self.raw_value

union_membership = UnionMembership(OrderedDict({HaskellTarget: OrderedSet([CustomField])}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a confusing union, I think.

In general, union relationships (like Rust enums and Python unions) represent an "is-a" relationship (similar to subclassing.) CustomField "is-not-a" HaskellTarget.

I recognize that changing this will likely require a bit more boilerplate. But violating the is-a pattern is going to make unions more challenging to explain, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the commit 8f32955. Originally, we more correctly modeled this as is-a by requiring that every class define a PluginField like PythonLibraryField. But, this both resulted in lots of boilerplate and was confusing, as Daniel first pointed out in #9284 (comment).

I'm willing to go back to having PluginField, but don't think it's worth it. Beyond extra boilerplate, it forces target authors to consider the concept of additional fields being added, even though they are by definition never going to consume those extra fields in their rules. (They wouldn't know at the time of writing the rules that those fields exist, so they wouldn't know to consume them). Instead, now, all targets work automatically with plugin fields for no additional work.

We could do some type of generated field, like HaskellTarget.PluginField, but that seems even more confusing to me because it's not discoverable and it could trip up MyPy (not certain).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. But I still strongly believe that this union construction is problematic. If it is completely hidden from users, it might be less of an issue, but using unions uniformly will make them easier to teach, and make it easier to generate documentation for them later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If our focus is avoiding this violation of is-a, then I think I'd advocate the inner field HaskellTarget.PluginField (or something like that).

I like that approach the most because it avoids the verbosity and forcing target authors to think about plugin fields, which, they never will end up actually consuming as explained in the above comment.

It's easy for us to add that inner class in a followup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that would be much better. Thanks.

Comment on lines +180 to +182
assert tgt.field_types == (HaskellGhcExtensions, HaskellSources, CustomField)
assert tgt.core_fields == (HaskellGhcExtensions, HaskellSources)
assert tgt.plugin_fields == (CustomField,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Re: naming: plugin is slightly ill-defined, and a good plugin model mostly erases the boundaries between plugins and non-plugins, making everything feel first class. That might affect naming here, but you might also want to hide the plugin_fields accessor and just expose "default_fields" and "fields".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/pantsbuild/pants/pull/9284/files#r391961580 on why I think there's value in still having plugin_fields. tl;dr, we expect to almost always use tgt.field_types, and the Target code will always use that. But, it helps with debugging to know what was set up by default vs. through your plugin.

hydration.
"""

def validate_post_hydration(self, result: Any) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the helpful docstrings!

There was no need for this premature abstraction. Instead, all validation should happen in the corresponding hydrate rule.
@Eric-Arellano Eric-Arellano merged commit 8ed6795 into pantsbuild:master Mar 13, 2020
@Eric-Arellano Eric-Arellano deleted the target-api-prototype branch March 13, 2020 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants