-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packages can inherit fields from their root workspace #8664
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
45b8db9
to
b582eba
Compare
Defined(T), | ||
} | ||
|
||
impl<'de, T> de::Deserialize<'de> for MaybeWorkspace<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to move all the impl Deserialize
and Serialize
to its own file - maybe src/cargo/util/toml/serde.rs
? I think it would help make mod.rs
more skimmable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me! Might be best to defer that to a second PR though
This will need more tests once we determine the desired behavior.
b582eba
to
0170d00
Compare
0170d00
to
9bddf28
Compare
This is now ready for a review. (Posting this in case changing GH draft status didn't trigger a notification.) |
Thanks for this again @yaymukund! I'll set aside some time tomorrow to review this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok here's an initial round of comments, thanks so much again for helping to take this on! My main thoughts can largely be summarized with:
-
This is introducing a number of duplicated types in TOML parsing where one has workspace stuff in it and the other doesn't. This ends up having a lot of extra boilerplate to convert back-and-forth, but I was wondering if it would be possible to avoid doing that? Ideally we would only have one type which is perhaps mutated in-place after being deserialized to remove all the internal workspace references.
-
The frequency of parsing manifests and finding the workspace root is a little worrisome to me. It looks like
find_workspace_root
is called many more times than previously, and I'm a little worried about the performance. I largely just want to make sure that we're not parsing toml twice nor are we traversing the filesystem more than necessary. I think that the toml parsing cache should be sufficient for this, but a cache may also need to be introduced for finding the workspace root. This could actually be more efficient than it is today where you could, for all levels of a path, insert a cache entry saying where the root was found (or that a root wasn't found), so sub-crates wouldn't have to do as many filesystem operations.Another thing that I noticed is that each manifest, when parsed, independently finds the workspace root and parses it. Could this instead be refactored a bit? Ideally we'd find the workspace root only once and then pass that as a parameter to parsing so each member doesn't have to independently find the root, and that also ensures that we only parse the root once.
I've only skimmed the tests lightly for now, but I'm hoping that the internals can be simplified a bit, but I'm curious what you think!
@@ -0,0 +1,102 @@ | |||
use std::collections::{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah mod.rs
is pretty huge as-is, so I think it's fine for this to be its own file.
} | ||
} | ||
|
||
pub fn parse_manifest<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some comments to what this is doing? I didn't immediately understand what you meant by there was a circular dependency in manifest parsing in the PR description. I originally thought this was a performance-related cache but it sounds like this is required for correctness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a more succinct comment describing this but here is an explanation of the correctness issue:
do_read_manifest()
used to be self-contained:
do_read_manifest()
Now, do_read_manifest()
potentially needs the workspace root in order to process MaybeWorkspace
values:
do_read_manifest() -> find_root()
find_root()
in turn tries to read the root workspace. A manifest can be its own root, so it forms a cycle:
do_read_manifest() -> find_root() -> do_read_manifest() (-> find_root() -> do_read_manifest() -> ...)
To break this cycle, I added the ManifestCache
which lets you shallow parse (parse_manifest
) - i.e. deserializes & gives you access to the toml - without calling all of do_read_manifest()
.
do_read_manifest() -> find_workspace_root() -> parse_manifest()
This breaks the cycle while also ensuring we never parse the same manifest twice.
use crate::util::{short_hash, Config, Filesystem}; | ||
|
||
pub enum EitherManifest { | ||
Real(Manifest), | ||
Virtual(VirtualManifest), | ||
} | ||
|
||
impl EitherManifest { | ||
pub fn workspace(&self) -> Rc<WorkspaceConfig> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I see a number of methods here returning a bare Rc<T>
, but if possible I think it would be best to return &Rc<T>
since it defers the clone to the point at which it's actually necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really a perf concern since the reference counting is almost nothing in Cargo, mostly just a stylistic thing.
} | ||
|
||
ws.load_current()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this was added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, ws.current()
& ws.current_mut()
fail.
We were previously relying on ws.find_root()
implicitly loading the current package into ws.packages
. Since find_workspace_root
does a shallow parse, we now have to do it explicitly.
.workspace_config(manifest_path.parent().unwrap(), config) | ||
} | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW nowadays this extra block isn't necessary
Defined(T), | ||
} | ||
|
||
impl<'de, T> de::Deserialize<'de> for MaybeWorkspace<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me! Might be best to defer that to a second PR though
#[derive(Serialize, Clone, Debug)] | ||
#[serde(untagged)] | ||
pub enum MaybeWorkspace<T> { | ||
Workspace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this leverage serde instead of writing a custom Deserialize
by defining this as Workspace { workspace: bool }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or something like Workspace(TomlWorkspaceField)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change it to one of those forms, but we also need the Deserializer
for its error message because the default "untagged enum" error messages are opaque. (Remember? 😋 😅 )
There are a couple open issues about this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW with a custom Deserialize
implementation this should be able to customize error messages as well, I think we have one or two implementations that do that already?
|
||
#[derive(Debug, Deserialize)] | ||
#[serde(untagged)] | ||
enum MaybeWorkspaceBadge { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible to be MaybeWorkspacee<BTreeMap<...>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enum also exists for the error messaging when deserialize
fails. How would we implement MaybeWorkspace
's visit_map()
to give an adequate error message for MaybeWorkspace<BTreeMap<_>>
fields?
(Some(MaybeWorkspace::Defined(value)), _) => Ok(Some(value)), | ||
(Some(MaybeWorkspace::Workspace), Some(ws)) => f(ws) | ||
.clone() | ||
.ok_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this is an error? If a value isn't defined in a package or in a workspace, then the final value is None
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC says (emphasis mine):
Cargo will now accept a table definition of
package.$key
which defines thepackage.$key.workspace
key as a boolean. For example you can specify:[package] name = "foo" license = { workspace = true }This directive indicates that the license of foo is the same as workspace.license. If workspace.license isn't defined then this generates an error.
This code corresponds to the section I bolded, unless I misunderstood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see nah that makse sense, thanks!
} | ||
|
||
#[derive(Clone, Debug)] | ||
struct DefinedTomlPackage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid duplicate type definitions like TomlProject
and DefinedTomlPackage
? I was hoping that we could get by where after some point in Cargo it's just statically known that the type here doesn't have workspace = true
anywhere, and otherwise the "real" deserialized forms of things like Manifest
and Summary
in Cargo don't have to deal with the toml details at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid duplicate type definitions like
TomlProject
andDefinedTomlPackage
?
TomlManifest
,TomlProject
: These are needed for their#[derive(Deserialize)]
.DefinedTomlManifest
,DefinedTomlPackage
: These provide the static guarantees since the types don't have anyMaybeWorkspace
.
If we're willing to relax the second requirement, I could do something like:
struct TomlManifest {
version: MaybeWorkspace<Version>,
}
impl TomlManifest {
// parse manifest if required & define all `MaybeWorkspace::Workspace` fields.
fn define_workspace_fields(&mut self) {
self.version = MaybeWorkspace::Defined(...);
}
fn unwrap_version(&self) -> semver::Version {
// "unwrap()" a `MaybeWorkspace`, panicking if it's { workspace = true }
self.version.unwrap();
}
}
This still requires us to always call define_workspace_fields()
before we try to unwrap_version()
- so it's not as strong of a guarantee - but it would allow us to eliminate the duplication between TomlManifest
and TomlProject
. Is that what you had in mind?
I was hoping that we could get by where after some point in Cargo it's just statically known that the type here doesn't have workspace = true anywhere
DefinedTomlManifest
was my attempt at doing this. It's returned by prepare_for_publish()
and stored in Manifest
. Aside from parse_manifest()
- which needs to be shallow to avoid a circular dependency* - the rest of Cargo doesn't have access to MaybeWorkspace<_>
fields.
* see this comment for details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks v much for the review! I tried to address some of your higher level concerns & will work on the smaller fixes starting next week.
This is introducing a number of duplicated types in TOML parsing where one has workspace stuff in it and the other doesn't.
I have posted more details in this inline comment, so maybe we can discuss there.
I think that the toml parsing cache should be sufficient for this, but a cache may also need to be introduced for finding the workspace root. This could actually be more efficient than it is today where you could, for all levels of a path, insert a cache entry saying where the root was found (or that a root wasn't found), so sub-crates wouldn't have to do as many filesystem operations.
Another thing that I noticed is that each manifest, when parsed, independently finds the workspace root and parses it. Could this instead be refactored a bit? Ideally we'd find the workspace root only once and then pass that as a parameter to parsing so each member doesn't have to independently find the root, and that also ensures that we only parse the root once.
I really like both of these ideas. The first should be relatively straightforward. The second might be more involved, but I'll have a look.
} | ||
|
||
ws.load_current()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, ws.current()
& ws.current_mut()
fail.
We were previously relying on ws.find_root()
implicitly loading the current package into ws.packages
. Since find_workspace_root
does a shallow parse, we now have to do it explicitly.
#[derive(Serialize, Clone, Debug)] | ||
#[serde(untagged)] | ||
pub enum MaybeWorkspace<T> { | ||
Workspace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change it to one of those forms, but we also need the Deserializer
for its error message because the default "untagged enum" error messages are opaque. (Remember? 😋 😅 )
There are a couple open issues about this:
|
||
#[derive(Debug, Deserialize)] | ||
#[serde(untagged)] | ||
enum MaybeWorkspaceBadge { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enum also exists for the error messaging when deserialize
fails. How would we implement MaybeWorkspace
's visit_map()
to give an adequate error message for MaybeWorkspace<BTreeMap<_>>
fields?
(Some(MaybeWorkspace::Defined(value)), _) => Ok(Some(value)), | ||
(Some(MaybeWorkspace::Workspace), Some(ws)) => f(ws) | ||
.clone() | ||
.ok_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC says (emphasis mine):
Cargo will now accept a table definition of
package.$key
which defines thepackage.$key.workspace
key as a boolean. For example you can specify:[package] name = "foo" license = { workspace = true }This directive indicates that the license of foo is the same as workspace.license. If workspace.license isn't defined then this generates an error.
This code corresponds to the section I bolded, unless I misunderstood.
} | ||
|
||
#[derive(Clone, Debug)] | ||
struct DefinedTomlPackage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid duplicate type definitions like
TomlProject
andDefinedTomlPackage
?
TomlManifest
,TomlProject
: These are needed for their#[derive(Deserialize)]
.DefinedTomlManifest
,DefinedTomlPackage
: These provide the static guarantees since the types don't have anyMaybeWorkspace
.
If we're willing to relax the second requirement, I could do something like:
struct TomlManifest {
version: MaybeWorkspace<Version>,
}
impl TomlManifest {
// parse manifest if required & define all `MaybeWorkspace::Workspace` fields.
fn define_workspace_fields(&mut self) {
self.version = MaybeWorkspace::Defined(...);
}
fn unwrap_version(&self) -> semver::Version {
// "unwrap()" a `MaybeWorkspace`, panicking if it's { workspace = true }
self.version.unwrap();
}
}
This still requires us to always call define_workspace_fields()
before we try to unwrap_version()
- so it's not as strong of a guarantee - but it would allow us to eliminate the duplication between TomlManifest
and TomlProject
. Is that what you had in mind?
I was hoping that we could get by where after some point in Cargo it's just statically known that the type here doesn't have workspace = true anywhere
DefinedTomlManifest
was my attempt at doing this. It's returned by prepare_for_publish()
and stored in Manifest
. Aside from parse_manifest()
- which needs to be shallow to avoid a circular dependency* - the rest of Cargo doesn't have access to MaybeWorkspace<_>
fields.
* see this comment for details
@@ -134,7 +134,7 @@ fn verify_dependencies( | |||
registry_src: SourceId, | |||
) -> CargoResult<()> { | |||
for dep in pkg.dependencies().iter() { | |||
if dep.source_id().is_path() || dep.source_id().is_git() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, good catch, I shall fix.
} | ||
} | ||
|
||
pub fn parse_manifest<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a more succinct comment describing this but here is an explanation of the correctness issue:
do_read_manifest()
used to be self-contained:
do_read_manifest()
Now, do_read_manifest()
potentially needs the workspace root in order to process MaybeWorkspace
values:
do_read_manifest() -> find_root()
find_root()
in turn tries to read the root workspace. A manifest can be its own root, so it forms a cycle:
do_read_manifest() -> find_root() -> do_read_manifest() (-> find_root() -> do_read_manifest() -> ...)
To break this cycle, I added the ManifestCache
which lets you shallow parse (parse_manifest
) - i.e. deserializes & gives you access to the toml - without calling all of do_read_manifest()
.
do_read_manifest() -> find_workspace_root() -> parse_manifest()
This breaks the cycle while also ensuring we never parse the same manifest twice.
☔ The latest upstream changes (presumably #8165) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
Sorry again for the slow response :( In any case though responding to this thread because it was probably the meatiest thing, I wonder if we could hit a few birds with one stone. Could we perhaps change the return value of the toml parsing of manifests to not be a For example we could take the raw |
I'll double back and try your new approach probably sometime this weekend. |
Sorry, my life got suddenly busy but I'll put in some hours tomorrow to see how far I can get. |
@yaymukund let me know if there is anyway I can help with this, more than happy to 😊 |
Hello, I apologize that this is taking a minute. I think the requested changes would necessitate rewriting much of the branch, and I haven't found an uninterrupted length of time to commit to that. @ParkMyCar Thanks for your offer! I can take a look over the weekend and try to split off some meaningful subset of work? Or if you're itching to get started, feel free to message me before that (my github username at gmail or twitter dm). |
@yaymukund No problem at all, life comes before open source 🙂 There seems to be a lot of comments/context for this task that I still need to read. I'll spend some time catching up on that and this weekend or later we can figure out how to split up the work, sound good? |
After taking another look at this, I've realized I don't have the time to continue working on it. Feel free to pick this up if you're interested, @ParkMyCar . I think you'd basically need to start from scratch with Alex's revised approach, but I think many parts (e.g. the serde logic) could be reused if you so choose. I've tried to document my decisions as I went along in the form of PR comments, but am also happy to answer any questions to save future contributors from duplicating my efforts. |
@yaymukund Sounds good, I'll get started on this sometime this week or next weekend. Thanks for getting us this far and leaving detailed comments 😊 |
Just a note: serde-rs/serde#1883 has been resolved, so you should be able to customize the error message without having to do all the work of |
@ParkMyCar Hey, I was wondering if you were still interested in contributing this feature to cargo? 🙂 |
Use serde's error message option to avoid implementing `Deserialize`. This is a cleanup based on serde-rs/serde#1883, which fell out of #8664 It looks like this changes the resulting error messages. I'll leave it up to you to decide if the tradeoff makes sense here. Maybe the correct answer here is to make `serde`'s error messages include more details (e.g. `invalid type: int`).
How's the outlook on merging this PR? |
I'm trying to take a look at it and see if I can make the changes talked about |
I'm hoping to open a new PR in the next few days with the changes above to try and get them merged, I will update with a link once its ready |
Sorry for the delay ran into a few problems with grabbing the right files for publishing and I'm trying to clean that up |
New pull request is here! |
For now, I'm going to close this in favor of #9684 ; if the situation changes we can always reopen this. |
Part 1 of RFC2906 - Packages can inherit fields from their root workspace Tracking issue: #8415 RFC: rust-lang/rfcs#2906 All changes were heavily inspired by #8664 and #9684 A big thanks to both `@yaymukund` and `@ParkMyCar.` I pulled a lot of inspiration from their branches. I would also like to thank `@alexcrichton` for all the feedback on the first attempt. It has brought this into a much better state. All changes have been made according to the RFC as well as `@alexcrichton's` [comment](#8664 (comment)). This is part 1 of many, as described by [this comment](#9684 (comment)), [this comment](#9684 (review)) and redefined [by this one](#10497 (comment)). This PR focuses on inheriting in root package, including: - Add `MaybeWorkspace<T>` to allow for `{ workspace = true }` - Add a new variant to `TomlDependency` to allow inheriting dependencies from a workspace - Add `InheritableFields` so that we have somewhere to get a value from when we `resolve` a `MaybeWorkspace<T>` - `license_file` and `readme` are in `InheritableFields` in this part but are not `MaybeWorkspace` for reasons [described here](#10497 (comment)) - Add a method to `resolve` a `MaybeWorkspace<T>` into a `T` that can fail if we have nowhere to pull from or the workspace does not define that field - Disallow inheriting from a different `Cargo.toml` - This means that you can only inherit from inside the same `Cargo.toml`, i.e. it has both a `[workspace]` and a `[package]` - Forcing this requirement allows us to test the implementations of `MaybeWorkspace<T>` and the new `TomlDependency` variant without having to add searching for a workspace root and the complexity with it Remaining implementation work for the RFC - Support inheriting in any workspace member - Correctly inherit fields that are relative paths - Including adding support for inheriting `license_file`, `readme`, and path-dependencies - Path dependencies infer version directive - Lock workspace dependencies and warn when unused - Optimizations, as needed - Evaluate any new fields for being inheritable (e.g. `rust-version`) - Evaluate making users of `{ workspace = true }` define the workspace they pull from in `[package]` Areas of concern: - `TomlDependency` Deserialization is a mess, and I could not figure out how to do it in a cleaner way without significant headaches. I ended up having to do the same thing as last time [which was an issue](#9684 (comment)). - Resolving on a `MaybeWorkspace` feels extremely verbose currently: ```rust project.homepage.clone().map(|mw| mw.resolve(&features, "homepage", || get_ws(inheritable)?.homepage())).transpose()?, ``` This way allows for lazy resolution of inheritable fields and finding a workspace (in part 2) so you do not pay a penalty for not using this feature. The other bit of good news is `&features` will go away as it is just feature checking currently. - This feature requires some level of duplication of code as well as remaking the original `TomlManifest` which adds extra length to the changes. - This feature also takes a linear process and makes it potentially non-linear which adds lots of complexity (which will get worse in Part 2) Please let me know if you have any feedback or suggestions!
Tracking issue: #8415
RFC: rust-lang/rfcs#2906
This is not finished yet, but I think there's enough to review.9/9: I've removed the draft status. There are still some items left, but I don't think they will change the high level design.
9/13: I think I am blocked on the remaining tasks.
[workspace]
{ workspace = true }
for non-dependency fieldsimpl Serialize
andDeserialize
WorkspaceRootConfig
before they are referencedlicense-file
,readme
relative to the workspace{ workspace = true }
for dependenciesCargo.lock
with[workspace.dependencies]
(even if unreferenced) (Blocked, see Note 2 below)parse_manifest()
fails - normalize paths, smarter contextNotes
read_manifest()
needs to find the root workspace, which in turn needs toread_manifest()
. To break this circular dependency, I added aManifestCache
which allowsfind_workspace_root()
to do its job without parsing it all the way through. This is the biggest change in the PR.DefinedTomlManifest
which is used by the rest of the app. This is where the bulk of the interesting logic lives (seeDefinedTomlManifest::from_toml_manifest()
)