-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add core abstractions for the V2 Target API #9284
Changes from all commits
37dabed
b628608
111d9d0
81ea8ce
60554c3
3ace7d3
f07e488
1e1aafe
5bfeeb7
c0ede40
4373970
071fff3
8f32955
71800ba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
# Copyright 2020 Pants project contributors (see CONTRIBUTORS.md). | ||
# Licensed under the Apache License, Version 2.0 (see LICENSE). | ||
|
||
from abc import ABC, ABCMeta, abstractmethod | ||
from dataclasses import dataclass | ||
from typing import Any, ClassVar, Dict, Iterable, Optional, Tuple, Type, TypeVar, cast | ||
|
||
from pants.engine.rules import UnionMembership | ||
from pants.util.memo import memoized_property | ||
from pants.util.meta import frozen_after_init | ||
|
||
|
||
@dataclass(frozen=True) | ||
class Field(ABC): | ||
alias: ClassVar[str] | ||
raw_value: Optional[Any] # None indicates that the field was not explicitly defined | ||
|
||
def __repr__(self) -> str: | ||
return f"{self.__class__}(alias={repr(self.alias)}, raw_value={self.raw_value})" | ||
|
||
|
||
class PrimitiveField(Field, metaclass=ABCMeta): | ||
"""A Field that does not need the engine in order to be hydrated. | ||
|
||
This should be subclassed by the majority of fields. | ||
""" | ||
|
||
def __str__(self) -> str: | ||
return f"{self.alias}={self.value}" | ||
|
||
@memoized_property | ||
@abstractmethod | ||
def value(self) -> Any: | ||
"""Get the field's value. | ||
|
||
The value will possibly be first hydrated and/or validated, such as using a default value | ||
if the field was not defined or ensuring that an int value is positive. | ||
|
||
This property is memoized because hydration and validation can often be costly. This | ||
hydration is lazy, i.e. it will only happen when a downstream rule explicitly requests this | ||
field. | ||
""" | ||
|
||
|
||
class AsyncField(Field, metaclass=ABCMeta): | ||
"""A field that needs the engine in order to be hydrated. | ||
|
||
You should create a corresponding Result class and define a rule to go from this AsyncField to | ||
the Result. For example: | ||
|
||
class Sources(AsyncField): | ||
alias: ClassVar = "sources" | ||
raw_value: Optional[List[str]] | ||
|
||
|
||
@dataclass(frozen=True) | ||
class SourcesResult: | ||
snapshot: Snapshot | ||
|
||
|
||
@rule | ||
def hydrate_sources(sources: Sources) -> SourcesResult: | ||
# possibly validate `sources.raw_value` | ||
... | ||
result = await Get[Snapshot](PathGlobs(sources.raw_value)) | ||
# possibly validate `result` | ||
... | ||
return SourcesResult(result) | ||
|
||
|
||
def rules(): | ||
return [hydrate_sources] | ||
|
||
Then, call sites can `await Get` if they need to hydrate the field: | ||
|
||
sources = await Get[SourcesResult](Sources, my_tgt.get(Sources)) | ||
""" | ||
|
||
|
||
_F = TypeVar("_F", bound=Field) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line is important. It's what gets MyPy to understand everything, like that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is a magical thing, but also this line is totally inscrutable. Maybe add this comment to the code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eh, I think probably not to adding a comment. We use TypeVars pretty liberally in Pants and MyPy has good documentation on it: https://mypy.readthedocs.io/en/stable/generics.html Our usage here is very idiomatic for how you do generics in Python. I only wanted to call attention because this one line of code has much more power than you might think at first glance. |
||
|
||
|
||
@frozen_after_init | ||
@dataclass(unsafe_hash=True) | ||
class Target(ABC): | ||
"""A Target represents a combination of fields that are valid _together_. | ||
|
||
Plugin authors may add additional fields to pre-existing target types by simply registering | ||
UnionRules between the `Target` type and the custom field, e.g. `UnionRule(PythonLibrary, | ||
TypeChecked)`. The `Target` will then treat `TypeChecked` as a first-class citizen and plugins | ||
can use that Field like any other Field. | ||
""" | ||
|
||
# Subclasses must define these | ||
alias: ClassVar[str] | ||
core_fields: ClassVar[Tuple[Type[Field], ...]] | ||
|
||
# These get calculated in the constructor | ||
plugin_fields: Tuple[Type[Field], ...] | ||
field_values: Dict[Type[Field], Any] | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Likely, we'll want to add an |
||
def __init__( | ||
self, | ||
unhydrated_values: Dict[str, Any], | ||
*, | ||
union_membership: Optional[UnionMembership] = None, | ||
) -> None: | ||
self.plugin_fields = cast( | ||
Tuple[Type[Field], ...], | ||
( | ||
() | ||
if union_membership is None | ||
else tuple(union_membership.union_rules.get(self.__class__, ())) | ||
), | ||
) | ||
|
||
self.field_values = {} | ||
aliases_to_field_types = {field_type.alias: field_type for field_type in self.field_types} | ||
for alias, value in unhydrated_values.items(): | ||
if alias not in aliases_to_field_types: | ||
raise ValueError( | ||
f"Unrecognized field `{alias}={value}` for target type `{self.alias}`." | ||
) | ||
field_type = aliases_to_field_types[alias] | ||
self.field_values[field_type] = field_type(value) | ||
# For undefined fields, mark the raw value as None. | ||
for field_type in set(self.field_types) - set(self.field_values.keys()): | ||
self.field_values[field_type] = field_type(raw_value=None) | ||
|
||
@property | ||
def field_types(self) -> Tuple[Type[Field], ...]: | ||
return (*self.core_fields, *self.plugin_fields) | ||
|
||
def __repr__(self) -> str: | ||
return ( | ||
f"{self.__class__}(" | ||
f"alias={repr(self.alias)}, " | ||
f"core_fields={list(self.core_fields)}, " | ||
f"plugin_fields={list(self.plugin_fields)}, " | ||
f"raw_field_values={list(self.field_values.values())}" | ||
f")" | ||
) | ||
Comment on lines
+134
to
+142
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💯 Thank you for making this so much easier to debug! |
||
|
||
def __str__(self) -> str: | ||
fields = ", ".join(str(field) for field in self.field_values.values()) | ||
return f"{self.alias}({fields})" | ||
|
||
def get(self, field: Type[_F]) -> _F: | ||
return cast(_F, self.field_values[field]) | ||
|
||
def has_fields(self, fields: Iterable[Type[Field]]) -> bool: | ||
# TODO: consider if this should support subclasses. For example, if a target has a | ||
# field PythonSources(Sources), then .has_fields(Sources) should still return True. Why? | ||
# This allows overriding how fields behave for custom target types, e.g. a `python3_library` | ||
# subclassing the Field Compatibility with its own custom Python3Compatibility field. | ||
# When adding this, be sure to update `.get()` to allow looking up by subclass, too. | ||
# (Is it possible to do that in a performant way, i.e. w/o having to iterate over every | ||
# self.field_type (O(n) vs the current O(1))?) | ||
return all(field in self.field_types for field in fields) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
# Copyright 2020 Pants project contributors (see CONTRIBUTORS.md). | ||
# Licensed under the Apache License, Version 2.0 (see LICENSE). | ||
|
||
from collections import OrderedDict | ||
from dataclasses import dataclass | ||
from pathlib import PurePath | ||
from typing import ClassVar, List, Optional, Tuple | ||
|
||
import pytest | ||
|
||
from pants.engine.fs import EMPTY_DIRECTORY_DIGEST, PathGlobs, Snapshot | ||
from pants.engine.rules import UnionMembership, rule | ||
from pants.engine.selectors import Get | ||
from pants.engine.target import AsyncField, PrimitiveField, Target | ||
from pants.testutil.engine.util import MockGet, run_rule | ||
from pants.util.collections import ensure_str_list | ||
from pants.util.memo import memoized_property | ||
from pants.util.ordered_set import OrderedSet | ||
|
||
|
||
class HaskellGhcExtensions(PrimitiveField): | ||
alias: ClassVar = "ghc_extensions" | ||
raw_value: Optional[List[str]] | ||
|
||
@memoized_property | ||
def value(self) -> List[str]: | ||
if self.raw_value is None: | ||
return [] | ||
# Add some arbitrary validation to test that hydration works properly. | ||
bad_extensions = [ | ||
extension for extension in self.raw_value if not extension.startswith("Ghc") | ||
] | ||
if bad_extensions: | ||
raise ValueError( | ||
f"All elements of `{self.alias}` must be prefixed by `Ghc`. Received " | ||
f"{bad_extensions}." | ||
) | ||
return self.raw_value | ||
|
||
|
||
class HaskellSources(AsyncField): | ||
alias: ClassVar = "sources" | ||
raw_value: Optional[List[str]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How are default values modelled here? Is it just by adding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And as a follow-up, is there a way to set post-hydration defaults? i.e. is there a way for me to provide a default There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently, default values would be applied in the corresponding rule Almost certainly, what will actually end up happening is that we create a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not yet, but we could easily add that because these are simple Python dataclasses that respect inheritance. (For now, I'm trying to avoid adding any sugar until we know we actually need it.) |
||
|
||
|
||
@dataclass(frozen=True) | ||
class HaskellSourcesResult: | ||
snapshot: Snapshot | ||
|
||
|
||
@rule | ||
async def hydrate_haskell_sources(sources: HaskellSources) -> HaskellSourcesResult: | ||
# Validate before hydration | ||
ensure_str_list(sources.raw_value) | ||
result = await Get[Snapshot](PathGlobs(sources.raw_value)) | ||
# Validate after hydration | ||
non_haskell_sources = [fp for fp in result.files if PurePath(fp).suffix != ".hs"] | ||
if non_haskell_sources: | ||
raise ValueError(f"Received non-Haskell sources in {sources.alias}: {non_haskell_sources}.") | ||
return HaskellSourcesResult(result) | ||
|
||
|
||
class HaskellTarget(Target): | ||
alias: ClassVar = "haskell" | ||
core_fields: ClassVar = (HaskellGhcExtensions, HaskellSources) | ||
Comment on lines
+63
to
+65
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is all it will ever take to define a new target type! 3 lines of declarative code 🎉 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! So... while I recognize the desire to keep the Target declaration site very small, having a different pattern for "core" (better name might be "default"?) fields than for extension fields might actually make things more challenging to document and explain. It's ok to roll with this for now though I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Generally, users will always want to access We only differentiate between core fields vs. plugin fields at the constructor time so that:
-- Why not unify plugin and core fields to always be set up via Instead of There's not a very compelling reason to have that verbosity + reduced proximity. |
||
|
||
|
||
def test_invalid_fields_rejected() -> None: | ||
with pytest.raises(ValueError) as exc: | ||
HaskellTarget({"invalid_field": True}) | ||
assert "Unrecognized field `invalid_field=True` for target type `haskell`." in str(exc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An important follow-up is going to be attributing this to a particular target in a particular file (and ideally a particular line in that file, but I suspect that'll be pretty hard with our current parsing model) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Absolutely agreed. At a minimum, we should point to the owning target. This is possible in a followup change. |
||
|
||
|
||
def test_get_primitive_field() -> None: | ||
extensions = ["GhcExistentialQuantification"] | ||
extensions_field = HaskellTarget({HaskellGhcExtensions.alias: extensions}).get( | ||
HaskellGhcExtensions | ||
) | ||
assert extensions_field.raw_value == extensions | ||
assert extensions_field.value == extensions | ||
|
||
default_extensions_field = HaskellTarget({}).get(HaskellGhcExtensions) | ||
assert default_extensions_field.raw_value is None | ||
assert default_extensions_field.value == [] | ||
|
||
|
||
def test_get_async_field() -> None: | ||
def hydrate_field( | ||
*, raw_source_files: List[str], hydrated_source_files: Tuple[str, ...] | ||
) -> HaskellSourcesResult: | ||
sources_field = HaskellTarget({HaskellSources.alias: raw_source_files}).get(HaskellSources) | ||
assert sources_field.raw_value == raw_source_files | ||
result: HaskellSourcesResult = run_rule( | ||
hydrate_haskell_sources, | ||
rule_args=[sources_field], | ||
mock_gets=[ | ||
MockGet( | ||
product_type=Snapshot, | ||
subject_type=PathGlobs, | ||
mock=lambda _: Snapshot( | ||
directory_digest=EMPTY_DIRECTORY_DIGEST, | ||
files=hydrated_source_files, | ||
dirs=(), | ||
), | ||
) | ||
], | ||
) | ||
return result | ||
|
||
# Normal field | ||
expected_files = ("monad.hs", "abstract_art.hs", "abstract_algebra.hs") | ||
assert ( | ||
hydrate_field( | ||
raw_source_files=["monad.hs", "abstract_*.hs"], hydrated_source_files=expected_files | ||
).snapshot.files | ||
== expected_files | ||
) | ||
|
||
# Test pre-hydration validation | ||
with pytest.raises(ValueError) as exc: | ||
hydrate_field(raw_source_files=[0, 1, 2], hydrated_source_files=()) # type: ignore[call-arg] | ||
assert "Not all elements of the iterable have type" in str(exc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again user-facing error-wise ideally with will be a lot more guiding; I'd hope it would read something like: Fine with this as-is today, but we should make sure we have a path to something like the above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. There's a path for this. The first change is (likely) storing an |
||
|
||
# Test post-hydration validation | ||
with pytest.raises(ValueError) as exc: | ||
hydrate_field(raw_source_files=["*.js"], hydrated_source_files=("not_haskell.js",)) | ||
assert "Received non-Haskell sources" in str(exc) | ||
|
||
|
||
def test_has_fields() -> None: | ||
class UnrelatedField(PrimitiveField): | ||
alias: ClassVar = "unrelated" | ||
raw_value: Optional[bool] | ||
|
||
@memoized_property | ||
def value(self) -> bool: | ||
if self.raw_value is None: | ||
return False | ||
return self.raw_value | ||
|
||
tgt = HaskellTarget({}) | ||
assert tgt.has_fields([]) is True | ||
assert tgt.has_fields([HaskellGhcExtensions]) is True | ||
assert tgt.has_fields([UnrelatedField]) is False | ||
assert tgt.has_fields([HaskellGhcExtensions, UnrelatedField]) is False | ||
|
||
|
||
def test_field_hydration_is_lazy() -> None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Idle question: How would you feel about a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd love to have that! |
||
# No error upon creating the Target because validation does not happen until a call site | ||
# hydrates the specific field. | ||
tgt = HaskellTarget( | ||
{HaskellGhcExtensions.alias: ["GhcExistentialQuantification", "DoesNotStartWithGhc"]} | ||
) | ||
# When hydrating, we expect a failure. | ||
with pytest.raises(ValueError) as exc: | ||
tgt.get(HaskellGhcExtensions).value | ||
assert "must be prefixed by `Ghc`" in str(exc) | ||
|
||
|
||
def test_add_custom_fields() -> None: | ||
class CustomField(PrimitiveField): | ||
alias: ClassVar = "custom_field" | ||
raw_value: Optional[bool] | ||
|
||
@memoized_property | ||
def value(self) -> bool: | ||
if self.raw_value is None: | ||
return False | ||
return self.raw_value | ||
|
||
union_membership = UnionMembership(OrderedDict({HaskellTarget: OrderedSet([CustomField])})) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a confusing union, I think. In general, union relationships (like Rust enums and Python unions) represent an "is-a" relationship (similar to subclassing.) I recognize that changing this will likely require a bit more boilerplate. But violating the is-a pattern is going to make unions more challenging to explain, I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See the commit 8f32955. Originally, we more correctly modeled this as I'm willing to go back to having We could do some type of generated field, like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. But I still strongly believe that this union construction is problematic. If it is completely hidden from users, it might be less of an issue, but using unions uniformly will make them easier to teach, and make it easier to generate documentation for them later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If our focus is avoiding this violation of I like that approach the most because it avoids the verbosity and forcing target authors to think about plugin fields, which, they never will end up actually consuming as explained in the above comment. It's easy for us to add that inner class in a followup. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, that would be much better. Thanks. |
||
tgt_values = {CustomField.alias: True} | ||
tgt = HaskellTarget(tgt_values, union_membership=union_membership) | ||
assert tgt.field_types == (HaskellGhcExtensions, HaskellSources, CustomField) | ||
assert tgt.core_fields == (HaskellGhcExtensions, HaskellSources) | ||
assert tgt.plugin_fields == (CustomField,) | ||
Comment on lines
+174
to
+176
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice. Re: naming: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See https://github.com/pantsbuild/pants/pull/9284/files#r391961580 on why I think there's value in still having |
||
assert tgt.get(CustomField).value is True | ||
|
||
default_tgt = HaskellTarget({}, union_membership=union_membership) | ||
assert default_tgt.get(CustomField).value is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting... v2 rules are a mechanism we've built for caching things which may be costly, and which we already need to use for other field types.
What's the argument for not just making all of this caching happen in the v2 graph, and having people access all fields the same, whether or not the engine is just caching or also doing other heavy lifting? Just that the
await Get
feels heavy to have to write?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only is the
await Get
heavy, but adding a rule for every singleField
is extremely heavy-weight, both with boilerplate and with blowing up the rule graph. For example, we'd need to introduce aResult
class for each of thePrimitiveField
s now so that the graph has a product - we couldn't use a generic wrapper likeFieldResult
because then MyPy couldn't infer the underlying type, and 100% MyPy support is a crucial requirement of this project.Compare:
vs.
Also, it's worth noting that the majority of the hydration/validation for
PrimitiveField
will be exceptionally simple. E.g.if the field was not defined, use this default value.
Many fields will simply returnself.raw_value