-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add core abstractions for the V2 Target API #9284
Add core abstractions for the V2 Target API #9284
Conversation
TODOs: * How do we handle the 10% of fields that need the engine for hydration? * Given a target, how do you actually get its fields? * Can we simplify hooking up Targets to custom new fields?
This makes things more explicit when declaring a new target type. It moves the fields used closer to the actual Target definition (principle of proximity). We still use unions to allow adding custom fields.
We must store all the unhydrated values at parse time. Once we parse a BUILD file, we never want to re-parse it, so we instead store all that information in the most minimally hydrated form possible. We also calculate all default values when the fields are not explicitly defined. We add Target.get(Type[Field]) -> Field as a way to get MyPy working with the target adaptor. Through generics, MyPy will know exactly which Field was returned! The definition of Target is now mostly complete!
This method is going to be used _a lot_ by call sites. `.hydrated` is too technical. It's fine to keep `unhydrated`, because that's more of an implementation detail that should not be referenced by your typical use case.
* Add __repr__ and __str__ * Remove @DataClass where it wasn't needed. It was overriding the superclass's __repr__ implementation. And this means that call sites have less imports and less code. The subclasses are still dataclasses! * Rename unhydrated to raw_value. This fits better with terminology understood by the average plugin developer and pairs nicely with `value`
This will allow us to ergonomically filter out irrelevant targets! Call sites can simply say `my_target.has_fields()` with the fields they care about.
…aw_value Field.raw_value should always be None if it was not configured. We shouldn't allow default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello. I recommend starting by reading the PR description, then jumping straight to target_test.py
to see what the goals of this project look like in practice.
This is a paradigm shift from V1, so it might look weird at first glance. Particularly different is the idea of Typed Fields and of having a unified mechanism for hydration and/or validation of fields.
As mentioned in the follow-ups section, there is much room for syntactic sugar, as well, such as reducing the verbosity of declaring new fields by adding sugar, like BoolField
and StringField
.
""" | ||
|
||
|
||
_F = TypeVar("_F", bound=Field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is important. It's what gets MyPy to understand everything, like that my_tgt.get(Compatibility).value
would be List[str]
but my_tgt.get(ZipSafe).value
would be bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a magical thing, but also this line is totally inscrutable. Maybe add this comment to the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I think probably not to adding a comment. We use TypeVars pretty liberally in Pants and MyPy has good documentation on it: https://mypy.readthedocs.io/en/stable/generics.html
Our usage here is very idiomatic for how you do generics in Python.
I only wanted to call attention because this one line of code has much more power than you might think at first glance.
# These get calculated in the constructor | ||
plugin_fields: Tuple[Type[Field], ...] | ||
field_values: Dict[Type[Field], Any] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely, we'll want to add an Address
property. I believe every Target
has an Address
. I saved it for a followup because this PR is only focused on adding the core abstractions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really exciting step, congrats!
I left a bunch of open questions, but most of them are idle thoughts about the future, rather than blockers for this. Hopefully which is which is obvious! :)_
def test_invalid_fields_rejected() -> None: | ||
with pytest.raises(ValueError) as exc: | ||
HaskellTarget({"invalid_field": True}) | ||
assert "Unrecognized field `invalid_field=True` for target type `haskell`." in str(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important follow-up is going to be attributing this to a particular target in a particular file (and ideally a particular line in that file, but I suspect that'll be pretty hard with our current parsing model)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely agreed. At a minimum, we should point to the owning target. This is possible in a followup change.
|
||
class HaskellSources(AsyncField): | ||
alias: ClassVar = "sources" | ||
raw_value: Optional[List[str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are default values modelled here? Is it just by adding = ["*.hs"]
to the end of this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as a follow-up, is there a way to set post-hydration defaults? i.e. is there a way for me to provide a default Snapshot
rather than a default PathGlobs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, default values would be applied in the corresponding rule @hydrate_haskell_sources
.
Almost certainly, what will actually end up happening is that we create a Sources(AsyncField)
class and have PythonLibrarySources
, PythonTestsSources
, etc all subclass it. That superclass would expect a field like default_globs
and would use that to correctly apply them. We have total flexibility here because these are simple dataclasses that work with inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as a follow-up, is there a way to set post-hydration defaults? i.e. is there a way for me to provide a default Snapshot rather than a default PathGlobs?
Not yet, but we could easily add that because these are simple Python dataclasses that respect inheritance.
(For now, I'm trying to avoid adding any sugar until we know we actually need it.)
# Test pre-hydration validation | ||
with pytest.raises(ValueError) as exc: | ||
hydrate_field(raw_source_files=[0, 1, 2], hydrated_source_files=()) # type: ignore[call-arg] | ||
assert "Not all elements of the iterable have type" in str(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again user-facing error-wise ideally with will be a lot more guiding; I'd hope it would read something like:
"java_library
target some:target
defined in some/BUILD:15
has an invalid sources
argument. Expected a list with all elements of type string, but found number element(s)"
Fine with this as-is today, but we should make sure we have a path to something like the above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There's a path for this. The first change is (likely) storing an Address
on each Field
so that they can refer to it in any error messages.
assert tgt.has_fields([HaskellGhcExtensions, UnrelatedField]) is False | ||
|
||
|
||
def test_field_hydration_is_lazy() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idle question: How would you feel about a ./pants validate-build-files
goal or similar? People used to use ./pants list
to make sure BUILD files mostly look right, and it feels like something to force pre- and post-hydration validation, without all of the cost of actually compiling and running tests, would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to have that!
return False | ||
return self.raw_value | ||
|
||
union_membership = UnionMembership(OrderedDict({HaskellField: OrderedSet([CustomField])})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea here that every Target type will have its own union? There are no union parent-child relationships, so will we end up with separate PythonBinaryField
and PythonLibraryField
and PythonTestsField
unions which are 90% the same but each has a few extra elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, although, those unions are only for plugin fields, meaning users are adding additional custom fields to pre-existing target types.
So, a user's register.py
would likely look like this:
def rules():
return [UnionRule(python_tgt, TypeChecked) for python_tgt in [PythonBinary, PythonLibrary, Pythontests])
return False | ||
return self.raw_value | ||
|
||
union_membership = UnionMembership(OrderedDict({HaskellField: OrderedSet([CustomField])})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that each UnionMembership
call from different plugins will merge? So if I have a plugin adding coverage, and a plugin adding a formatter, and both want to add fields, we'll end up with one OrderedSet
with some arbitrary order from each plugin merged together? Or something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding a new UnionRule
appends to the list, rather than overwriting. This is how we are able to have multiple Linters
at the same time, where each registers UnionRule(Linter, BanditLinter)
, UnionRule(Linter, IsortFormatter)
, etc.
The value will possibly be first hydrated and/or validated, such as using a default value | ||
if the field was not defined or ensuring that an int value is positive. | ||
|
||
This property is memoized because hydration and validation can often be costly. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting... v2 rules are a mechanism we've built for caching things which may be costly, and which we already need to use for other field types.
What's the argument for not just making all of this caching happen in the v2 graph, and having people access all fields the same, whether or not the engine is just caching or also doing other heavy lifting? Just that the await Get
feels heavy to have to write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only is the await Get
heavy, but adding a rule for every single Field
is extremely heavy-weight, both with boilerplate and with blowing up the rule graph. For example, we'd need to introduce a Result
class for each of the PrimitiveField
s now so that the graph has a product - we couldn't use a generic wrapper like FieldResult
because then MyPy couldn't infer the underlying type, and 100% MyPy support is a crucial requirement of this project.
Compare:
compatibility: List[str] = my_tgt.get(Compatibility).value
vs.
compatibility_result = await Get[CompatibilityResult](Compatibility, my_tgt.get(Compatiblity)
compatibility: List[str] = compatibility_result.value
Also, it's worth noting that the majority of the hydration/validation for PrimitiveField
will be exceptionally simple. E.g. if the field was not defined, use this default value.
Many fields will simply return self.raw_value
Thanks Daniel!
We can simplify have plugin authors add a `UnionRule` against the `Target` itself, rather than needing some sentinel `PluginField` type. This greatly reduces the boilerplate needed to define a new target. Now, we avoid creating an additional class for every target and an additional property for every target. This also means there are fewer concepts to understand the Target API. This extensibility is a neat feature that you get automatically and can generally ignore, whereas previously you had to add boilerplate to get it working.
class HaskellTarget(Target): | ||
alias: ClassVar = "haskell" | ||
core_fields: ClassVar = (HaskellGhcExtensions, HaskellSources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all it will ever take to define a new target type! 3 lines of declarative code 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
So... while I recognize the desire to keep the Target declaration site very small, having a different pattern for "core" (better name might be "default"?) fields than for extension fields might actually make things more challenging to document and explain.
It's ok to roll with this for now though I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a different pattern for "core" (better name might be "default"?) fields than for extension fields might actually make things more challenging to document and explain.
Generally, users will always want to access self.field_types
. This is what everything like Target.get()
and Target.has_sources()
uses. So, plugin fields do behave as first-class citizens.
We only differentiate between core fields vs. plugin fields at the constructor time so that:
- We have better debugging. You can see in
repr()
which fields were added by you. - We can declare
core_fields
as a class var, whereasplugin_types
must be an instance var because we have to inspectUnionRule
at runtime. It's generally easier to work with class vars so we should use that where possible.
--
Why not unify plugin and core fields to always be set up via UnionMembership
? Originally, I did take this approach. See the first few commits. But, it resulted in horrible verbosity and violated the principle of proximity.
Instead of core_fields: ClassVar = (HaskellGhcExtensions, HaskellSources)
in the class definition, we'd need [UnionRule(HaskellTargetField, HaskellGhcExtensions), UnionRule(HaskellTargetField, HaskellSources)
.
There's not a very compelling reason to have that verbosity + reduced proximity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I love where this is heading.
def __repr__(self) -> str: | ||
return ( | ||
f"{self.__class__}(" | ||
f"alias={repr(self.alias)}, " | ||
f"core_fields={list(self.core_fields)}, " | ||
f"plugin_fields={list(self.plugin_fields)}, " | ||
f"raw_field_values={list(self.field_values.values())}" | ||
f")" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Thank you for making this so much easier to debug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome, Eric! Thank you for all the super documentation and for making the effort to make Targets easier to debug! ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Eric-Arellano ! Looks great.
Actually having @rules
in place in the test to demonstrate usage would have been good... I'm unsure about how the AsyncField
vs Field
split is going to look in practice. But not a blocker.
alias: ClassVar = "sources" | ||
raw_value: Optional[List[str]] | ||
|
||
def validate_pre_hydration(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, validate_pre_hydration
and validate_post_hydration
feel like they can/should go in the @rule
that hydrates the field, otherwise your field definition API requires declaring/extending three different methods.
Was the idea here that someone might want to change the rule that hydrates a field, while keeping the validation of the field the same? That feels like a pattern that we could wait to see happening in the wild before adding support for... or one that we should explicitly label here as something that we're anticipating (and should remove complexity for if we don't see the pattern in practice.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over video chat, there is no super compelling reason to have these methods, other than the principle of proximity and trying to give some structure to AsyncField
. I'll remove them for now to avoid premature optimization.
class HaskellTarget(Target): | ||
alias: ClassVar = "haskell" | ||
core_fields: ClassVar = (HaskellGhcExtensions, HaskellSources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
So... while I recognize the desire to keep the Target declaration site very small, having a different pattern for "core" (better name might be "default"?) fields than for extension fields might actually make things more challenging to document and explain.
It's ok to roll with this for now though I think.
return False | ||
return self.raw_value | ||
|
||
union_membership = UnionMembership(OrderedDict({HaskellTarget: OrderedSet([CustomField])})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a confusing union, I think.
In general, union relationships (like Rust enums and Python unions) represent an "is-a" relationship (similar to subclassing.) CustomField
"is-not-a" HaskellTarget
.
I recognize that changing this will likely require a bit more boilerplate. But violating the is-a pattern is going to make unions more challenging to explain, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the commit 8f32955. Originally, we more correctly modeled this as is-a
by requiring that every class define a PluginField
like PythonLibraryField
. But, this both resulted in lots of boilerplate and was confusing, as Daniel first pointed out in #9284 (comment).
I'm willing to go back to having PluginField
, but don't think it's worth it. Beyond extra boilerplate, it forces target authors to consider the concept of additional fields being added, even though they are by definition never going to consume those extra fields in their rules. (They wouldn't know at the time of writing the rules that those fields exist, so they wouldn't know to consume them). Instead, now, all targets work automatically with plugin fields for no additional work.
We could do some type of generated field, like HaskellTarget.PluginField
, but that seems even more confusing to me because it's not discoverable and it could trip up MyPy (not certain).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. But I still strongly believe that this union construction is problematic. If it is completely hidden from users, it might be less of an issue, but using unions uniformly will make them easier to teach, and make it easier to generate documentation for them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our focus is avoiding this violation of is-a
, then I think I'd advocate the inner field HaskellTarget.PluginField
(or something like that).
I like that approach the most because it avoids the verbosity and forcing target authors to think about plugin fields, which, they never will end up actually consuming as explained in the above comment.
It's easy for us to add that inner class in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that would be much better. Thanks.
assert tgt.field_types == (HaskellGhcExtensions, HaskellSources, CustomField) | ||
assert tgt.core_fields == (HaskellGhcExtensions, HaskellSources) | ||
assert tgt.plugin_fields == (CustomField,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Re: naming: plugin
is slightly ill-defined, and a good plugin model mostly erases the boundaries between plugins and non-plugins, making everything feel first class. That might affect naming here, but you might also want to hide the plugin_fields
accessor and just expose "default_fields" and "fields".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/pantsbuild/pants/pull/9284/files#r391961580 on why I think there's value in still having plugin_fields
. tl;dr, we expect to almost always use tgt.field_types
, and the Target code will always use that. But, it helps with debugging to know what was set up by default vs. through your plugin.
src/python/pants/engine/target.py
Outdated
hydration. | ||
""" | ||
|
||
def validate_post_hydration(self, result: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the helpful docstrings!
There was no need for this premature abstraction. Instead, all validation should happen in the corresponding hydrate rule.
f159da0
to
71800ba
Compare
Problem
See #4535 and the recent design doc.
This design implements the main goals of the Target API:
test_has_fields()
)python_test_runner.py
might say something likeif my_tgt.has_fields([PythonSources, Timeout])
.repr
anddir
functions to allow for easy debugging.Solution
Add
Target
andField
types. ATarget
type is a combination of severalField
s that are valid together.Given a target, the two main expected API calls will be:
MyPy understands both of these calls, including
Target.get
thanks to generics.Lazy hydration - primitive vs. async fields
About 5% of fields require the engine for hydration, e.g.
sources
andbundles
. The other 95% of fields, though, are nothing more than astr
,List[str]
,bool
, etc.We do not want to complicate accessing the majority of fields. While some field values will need to be accessed via
await Get
, the majority should not be this complex.So, we introduce
PrimitiveField
andAsyncField
to distinguish between the two.Thanks to MyPy and autocomplete, rule authors will quickly know when working with an
AsyncField
that they cannot callmy_field.value
like they normally could.Extensibility - add new fields on pre-existing targets
This is a new feature that we want as a better version of
tags
. To add a new field, plugin authors simply need to create the new field then register, for example,UnionRule(PythonLibrary, TypeChecked)
.The Target constructor will recognize this new field and treat it as first-class. Core Pants code can safely ignore the field, while plugins can now use the new field.
Followups
HydratedStruct -> Target
, so that we can actually use this with production code.BoolField
andStringField
.Target
to a V1Target
.TargetAdaptor
.