Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add core abstractions for the V2 Target API #9284

Merged
merged 14 commits into from
Mar 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions src/python/pants/engine/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ python_tests(
':goal',
':platform',
':rules',
':target',
'src/python/pants/testutil:test_base',
'src/python/pants/testutil/engine:util',
],
Expand Down Expand Up @@ -261,6 +262,18 @@ python_library(
tags = {'partially_type_checked'},
)

python_library(
name='target',
sources=['target.py'],
dependencies=[
':rules',
'src/python/pants/util:collections',
'src/python/pants/util:meta',
'src/python/pants/util:memo',
],
tags = {'type_checked'},
)

resources(
name='native_engine_shared_library',
sources=['native_engine.so']
Expand Down
159 changes: 159 additions & 0 deletions src/python/pants/engine/target.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# Copyright 2020 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

from abc import ABC, ABCMeta, abstractmethod
from dataclasses import dataclass
from typing import Any, ClassVar, Dict, Iterable, Optional, Tuple, Type, TypeVar, cast

from pants.engine.rules import UnionMembership
from pants.util.memo import memoized_property
from pants.util.meta import frozen_after_init


@dataclass(frozen=True)
class Field(ABC):
alias: ClassVar[str]
raw_value: Optional[Any] # None indicates that the field was not explicitly defined

def __repr__(self) -> str:
return f"{self.__class__}(alias={repr(self.alias)}, raw_value={self.raw_value})"


class PrimitiveField(Field, metaclass=ABCMeta):
"""A Field that does not need the engine in order to be hydrated.

This should be subclassed by the majority of fields.
"""

def __str__(self) -> str:
return f"{self.alias}={self.value}"

@memoized_property
@abstractmethod
def value(self) -> Any:
"""Get the field's value.

The value will possibly be first hydrated and/or validated, such as using a default value
if the field was not defined or ensuring that an int value is positive.

This property is memoized because hydration and validation can often be costly. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting... v2 rules are a mechanism we've built for caching things which may be costly, and which we already need to use for other field types.

What's the argument for not just making all of this caching happen in the v2 graph, and having people access all fields the same, whether or not the engine is just caching or also doing other heavy lifting? Just that the await Get feels heavy to have to write?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only is the await Get heavy, but adding a rule for every single Field is extremely heavy-weight, both with boilerplate and with blowing up the rule graph. For example, we'd need to introduce a Result class for each of the PrimitiveFields now so that the graph has a product - we couldn't use a generic wrapper like FieldResult because then MyPy couldn't infer the underlying type, and 100% MyPy support is a crucial requirement of this project.

Compare:

compatibility: List[str] = my_tgt.get(Compatibility).value

vs.

compatibility_result = await Get[CompatibilityResult](Compatibility, my_tgt.get(Compatiblity)
compatibility: List[str] = compatibility_result.value

Also, it's worth noting that the majority of the hydration/validation for PrimitiveField will be exceptionally simple. E.g. if the field was not defined, use this default value. Many fields will simply return self.raw_value

hydration is lazy, i.e. it will only happen when a downstream rule explicitly requests this
field.
"""


class AsyncField(Field, metaclass=ABCMeta):
"""A field that needs the engine in order to be hydrated.

You should create a corresponding Result class and define a rule to go from this AsyncField to
the Result. For example:

class Sources(AsyncField):
alias: ClassVar = "sources"
raw_value: Optional[List[str]]


@dataclass(frozen=True)
class SourcesResult:
snapshot: Snapshot


@rule
def hydrate_sources(sources: Sources) -> SourcesResult:
# possibly validate `sources.raw_value`
...
result = await Get[Snapshot](PathGlobs(sources.raw_value))
# possibly validate `result`
...
return SourcesResult(result)


def rules():
return [hydrate_sources]

Then, call sites can `await Get` if they need to hydrate the field:

sources = await Get[SourcesResult](Sources, my_tgt.get(Sources))
"""


_F = TypeVar("_F", bound=Field)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is important. It's what gets MyPy to understand everything, like that my_tgt.get(Compatibility).value would be List[str] but my_tgt.get(ZipSafe).value would be bool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a magical thing, but also this line is totally inscrutable. Maybe add this comment to the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I think probably not to adding a comment. We use TypeVars pretty liberally in Pants and MyPy has good documentation on it: https://mypy.readthedocs.io/en/stable/generics.html

Our usage here is very idiomatic for how you do generics in Python.

I only wanted to call attention because this one line of code has much more power than you might think at first glance.



@frozen_after_init
@dataclass(unsafe_hash=True)
class Target(ABC):
"""A Target represents a combination of fields that are valid _together_.

Plugin authors may add additional fields to pre-existing target types by simply registering
UnionRules between the `Target` type and the custom field, e.g. `UnionRule(PythonLibrary,
TypeChecked)`. The `Target` will then treat `TypeChecked` as a first-class citizen and plugins
can use that Field like any other Field.
"""

# Subclasses must define these
alias: ClassVar[str]
core_fields: ClassVar[Tuple[Type[Field], ...]]

# These get calculated in the constructor
plugin_fields: Tuple[Type[Field], ...]
field_values: Dict[Type[Field], Any]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely, we'll want to add an Address property. I believe every Target has an Address. I saved it for a followup because this PR is only focused on adding the core abstractions.

def __init__(
self,
unhydrated_values: Dict[str, Any],
*,
union_membership: Optional[UnionMembership] = None,
) -> None:
self.plugin_fields = cast(
Tuple[Type[Field], ...],
(
()
if union_membership is None
else tuple(union_membership.union_rules.get(self.__class__, ()))
),
)

self.field_values = {}
aliases_to_field_types = {field_type.alias: field_type for field_type in self.field_types}
for alias, value in unhydrated_values.items():
if alias not in aliases_to_field_types:
raise ValueError(
f"Unrecognized field `{alias}={value}` for target type `{self.alias}`."
)
field_type = aliases_to_field_types[alias]
self.field_values[field_type] = field_type(value)
# For undefined fields, mark the raw value as None.
for field_type in set(self.field_types) - set(self.field_values.keys()):
self.field_values[field_type] = field_type(raw_value=None)

@property
def field_types(self) -> Tuple[Type[Field], ...]:
return (*self.core_fields, *self.plugin_fields)

def __repr__(self) -> str:
return (
f"{self.__class__}("
f"alias={repr(self.alias)}, "
f"core_fields={list(self.core_fields)}, "
f"plugin_fields={list(self.plugin_fields)}, "
f"raw_field_values={list(self.field_values.values())}"
f")"
)
Comment on lines +134 to +142
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Thank you for making this so much easier to debug!


def __str__(self) -> str:
fields = ", ".join(str(field) for field in self.field_values.values())
return f"{self.alias}({fields})"

def get(self, field: Type[_F]) -> _F:
return cast(_F, self.field_values[field])

def has_fields(self, fields: Iterable[Type[Field]]) -> bool:
# TODO: consider if this should support subclasses. For example, if a target has a
# field PythonSources(Sources), then .has_fields(Sources) should still return True. Why?
# This allows overriding how fields behave for custom target types, e.g. a `python3_library`
# subclassing the Field Compatibility with its own custom Python3Compatibility field.
# When adding this, be sure to update `.get()` to allow looking up by subclass, too.
# (Is it possible to do that in a performant way, i.e. w/o having to iterate over every
# self.field_type (O(n) vs the current O(1))?)
return all(field in self.field_types for field in fields)
180 changes: 180 additions & 0 deletions src/python/pants/engine/target_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Copyright 2020 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

from collections import OrderedDict
from dataclasses import dataclass
from pathlib import PurePath
from typing import ClassVar, List, Optional, Tuple

import pytest

from pants.engine.fs import EMPTY_DIRECTORY_DIGEST, PathGlobs, Snapshot
from pants.engine.rules import UnionMembership, rule
from pants.engine.selectors import Get
from pants.engine.target import AsyncField, PrimitiveField, Target
from pants.testutil.engine.util import MockGet, run_rule
from pants.util.collections import ensure_str_list
from pants.util.memo import memoized_property
from pants.util.ordered_set import OrderedSet


class HaskellGhcExtensions(PrimitiveField):
alias: ClassVar = "ghc_extensions"
raw_value: Optional[List[str]]

@memoized_property
def value(self) -> List[str]:
if self.raw_value is None:
return []
# Add some arbitrary validation to test that hydration works properly.
bad_extensions = [
extension for extension in self.raw_value if not extension.startswith("Ghc")
]
if bad_extensions:
raise ValueError(
f"All elements of `{self.alias}` must be prefixed by `Ghc`. Received "
f"{bad_extensions}."
)
return self.raw_value


class HaskellSources(AsyncField):
alias: ClassVar = "sources"
raw_value: Optional[List[str]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are default values modelled here? Is it just by adding = ["*.hs"] to the end of this line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as a follow-up, is there a way to set post-hydration defaults? i.e. is there a way for me to provide a default Snapshot rather than a default PathGlobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, default values would be applied in the corresponding rule @hydrate_haskell_sources.

Almost certainly, what will actually end up happening is that we create a Sources(AsyncField) class and have PythonLibrarySources, PythonTestsSources, etc all subclass it. That superclass would expect a field like default_globs and would use that to correctly apply them. We have total flexibility here because these are simple dataclasses that work with inheritance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as a follow-up, is there a way to set post-hydration defaults? i.e. is there a way for me to provide a default Snapshot rather than a default PathGlobs?

Not yet, but we could easily add that because these are simple Python dataclasses that respect inheritance.

(For now, I'm trying to avoid adding any sugar until we know we actually need it.)



@dataclass(frozen=True)
class HaskellSourcesResult:
snapshot: Snapshot


@rule
async def hydrate_haskell_sources(sources: HaskellSources) -> HaskellSourcesResult:
# Validate before hydration
ensure_str_list(sources.raw_value)
result = await Get[Snapshot](PathGlobs(sources.raw_value))
# Validate after hydration
non_haskell_sources = [fp for fp in result.files if PurePath(fp).suffix != ".hs"]
if non_haskell_sources:
raise ValueError(f"Received non-Haskell sources in {sources.alias}: {non_haskell_sources}.")
return HaskellSourcesResult(result)


class HaskellTarget(Target):
alias: ClassVar = "haskell"
core_fields: ClassVar = (HaskellGhcExtensions, HaskellSources)
Comment on lines +63 to +65
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all it will ever take to define a new target type! 3 lines of declarative code 🎉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

So... while I recognize the desire to keep the Target declaration site very small, having a different pattern for "core" (better name might be "default"?) fields than for extension fields might actually make things more challenging to document and explain.

It's ok to roll with this for now though I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a different pattern for "core" (better name might be "default"?) fields than for extension fields might actually make things more challenging to document and explain.

Generally, users will always want to access self.field_types. This is what everything like Target.get() and Target.has_sources() uses. So, plugin fields do behave as first-class citizens.

We only differentiate between core fields vs. plugin fields at the constructor time so that:

  1. We have better debugging. You can see in repr() which fields were added by you.
  2. We can declare core_fields as a class var, whereas plugin_types must be an instance var because we have to inspect UnionRule at runtime. It's generally easier to work with class vars so we should use that where possible.

--

Why not unify plugin and core fields to always be set up via UnionMembership? Originally, I did take this approach. See the first few commits. But, it resulted in horrible verbosity and violated the principle of proximity.

Instead of core_fields: ClassVar = (HaskellGhcExtensions, HaskellSources) in the class definition, we'd need [UnionRule(HaskellTargetField, HaskellGhcExtensions), UnionRule(HaskellTargetField, HaskellSources).

There's not a very compelling reason to have that verbosity + reduced proximity.



def test_invalid_fields_rejected() -> None:
with pytest.raises(ValueError) as exc:
HaskellTarget({"invalid_field": True})
assert "Unrecognized field `invalid_field=True` for target type `haskell`." in str(exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important follow-up is going to be attributing this to a particular target in a particular file (and ideally a particular line in that file, but I suspect that'll be pretty hard with our current parsing model)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely agreed. At a minimum, we should point to the owning target. This is possible in a followup change.



def test_get_primitive_field() -> None:
extensions = ["GhcExistentialQuantification"]
extensions_field = HaskellTarget({HaskellGhcExtensions.alias: extensions}).get(
HaskellGhcExtensions
)
assert extensions_field.raw_value == extensions
assert extensions_field.value == extensions

default_extensions_field = HaskellTarget({}).get(HaskellGhcExtensions)
assert default_extensions_field.raw_value is None
assert default_extensions_field.value == []


def test_get_async_field() -> None:
def hydrate_field(
*, raw_source_files: List[str], hydrated_source_files: Tuple[str, ...]
) -> HaskellSourcesResult:
sources_field = HaskellTarget({HaskellSources.alias: raw_source_files}).get(HaskellSources)
assert sources_field.raw_value == raw_source_files
result: HaskellSourcesResult = run_rule(
hydrate_haskell_sources,
rule_args=[sources_field],
mock_gets=[
MockGet(
product_type=Snapshot,
subject_type=PathGlobs,
mock=lambda _: Snapshot(
directory_digest=EMPTY_DIRECTORY_DIGEST,
files=hydrated_source_files,
dirs=(),
),
)
],
)
return result

# Normal field
expected_files = ("monad.hs", "abstract_art.hs", "abstract_algebra.hs")
assert (
hydrate_field(
raw_source_files=["monad.hs", "abstract_*.hs"], hydrated_source_files=expected_files
).snapshot.files
== expected_files
)

# Test pre-hydration validation
with pytest.raises(ValueError) as exc:
hydrate_field(raw_source_files=[0, 1, 2], hydrated_source_files=()) # type: ignore[call-arg]
assert "Not all elements of the iterable have type" in str(exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again user-facing error-wise ideally with will be a lot more guiding; I'd hope it would read something like:
"java_library target some:target defined in some/BUILD:15 has an invalid sources argument. Expected a list with all elements of type string, but found number element(s)"

Fine with this as-is today, but we should make sure we have a path to something like the above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There's a path for this. The first change is (likely) storing an Address on each Field so that they can refer to it in any error messages.


# Test post-hydration validation
with pytest.raises(ValueError) as exc:
hydrate_field(raw_source_files=["*.js"], hydrated_source_files=("not_haskell.js",))
assert "Received non-Haskell sources" in str(exc)


def test_has_fields() -> None:
class UnrelatedField(PrimitiveField):
alias: ClassVar = "unrelated"
raw_value: Optional[bool]

@memoized_property
def value(self) -> bool:
if self.raw_value is None:
return False
return self.raw_value

tgt = HaskellTarget({})
assert tgt.has_fields([]) is True
assert tgt.has_fields([HaskellGhcExtensions]) is True
assert tgt.has_fields([UnrelatedField]) is False
assert tgt.has_fields([HaskellGhcExtensions, UnrelatedField]) is False


def test_field_hydration_is_lazy() -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idle question: How would you feel about a ./pants validate-build-files goal or similar? People used to use ./pants list to make sure BUILD files mostly look right, and it feels like something to force pre- and post-hydration validation, without all of the cost of actually compiling and running tests, would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to have that!

# No error upon creating the Target because validation does not happen until a call site
# hydrates the specific field.
tgt = HaskellTarget(
{HaskellGhcExtensions.alias: ["GhcExistentialQuantification", "DoesNotStartWithGhc"]}
)
# When hydrating, we expect a failure.
with pytest.raises(ValueError) as exc:
tgt.get(HaskellGhcExtensions).value
assert "must be prefixed by `Ghc`" in str(exc)


def test_add_custom_fields() -> None:
class CustomField(PrimitiveField):
alias: ClassVar = "custom_field"
raw_value: Optional[bool]

@memoized_property
def value(self) -> bool:
if self.raw_value is None:
return False
return self.raw_value

union_membership = UnionMembership(OrderedDict({HaskellTarget: OrderedSet([CustomField])}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a confusing union, I think.

In general, union relationships (like Rust enums and Python unions) represent an "is-a" relationship (similar to subclassing.) CustomField "is-not-a" HaskellTarget.

I recognize that changing this will likely require a bit more boilerplate. But violating the is-a pattern is going to make unions more challenging to explain, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the commit 8f32955. Originally, we more correctly modeled this as is-a by requiring that every class define a PluginField like PythonLibraryField. But, this both resulted in lots of boilerplate and was confusing, as Daniel first pointed out in #9284 (comment).

I'm willing to go back to having PluginField, but don't think it's worth it. Beyond extra boilerplate, it forces target authors to consider the concept of additional fields being added, even though they are by definition never going to consume those extra fields in their rules. (They wouldn't know at the time of writing the rules that those fields exist, so they wouldn't know to consume them). Instead, now, all targets work automatically with plugin fields for no additional work.

We could do some type of generated field, like HaskellTarget.PluginField, but that seems even more confusing to me because it's not discoverable and it could trip up MyPy (not certain).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. But I still strongly believe that this union construction is problematic. If it is completely hidden from users, it might be less of an issue, but using unions uniformly will make them easier to teach, and make it easier to generate documentation for them later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If our focus is avoiding this violation of is-a, then I think I'd advocate the inner field HaskellTarget.PluginField (or something like that).

I like that approach the most because it avoids the verbosity and forcing target authors to think about plugin fields, which, they never will end up actually consuming as explained in the above comment.

It's easy for us to add that inner class in a followup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that would be much better. Thanks.

tgt_values = {CustomField.alias: True}
tgt = HaskellTarget(tgt_values, union_membership=union_membership)
assert tgt.field_types == (HaskellGhcExtensions, HaskellSources, CustomField)
assert tgt.core_fields == (HaskellGhcExtensions, HaskellSources)
assert tgt.plugin_fields == (CustomField,)
Comment on lines +174 to +176
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Re: naming: plugin is slightly ill-defined, and a good plugin model mostly erases the boundaries between plugins and non-plugins, making everything feel first class. That might affect naming here, but you might also want to hide the plugin_fields accessor and just expose "default_fields" and "fields".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/pantsbuild/pants/pull/9284/files#r391961580 on why I think there's value in still having plugin_fields. tl;dr, we expect to almost always use tgt.field_types, and the Target code will always use that. But, it helps with debugging to know what was set up by default vs. through your plugin.

assert tgt.get(CustomField).value is True

default_tgt = HaskellTarget({}, union_membership=union_membership)
assert default_tgt.get(CustomField).value is False