-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packages can inherit fields from their root workspace #9684
Conversation
…has been found and added IntermediateManifest
- Added inheriting dependencies from Workspaces - Added tests for inheriting TomlProject fields and dependencies - Fixed tests as needed Signed-off-by: Scott Schafer <[email protected]>
Signed-off-by: Scott Schafer <[email protected]>
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. Please see the contribution instructions for more information. |
Signed-off-by: Scott Schafer <[email protected]>
Thanks so much for helping to push this forward! I've done a once-over of the PR and here's some thoughts I've got so far:
I'm not 100% certain if it'll work out to pass this context around in fewer places, but if you have difficult trying it out I can try to dig in more as well. |
|
Oh for two workspaces I mean a crate depending on a crate in a separate workspace. For example you could use a git dependency which is using these workspace directives, so for that dependency to work we'd have to elaborate the manifest of the git dependency. Similarly if you're using a path dependency (or, e.g. a [patch] pointing to a path) which is a member of a separate workspace then you'd also have to do workspace manifest elaboration on that second workspace. For caches and such, I think that the exact shape of the solution doesn't matter too too much, but I think the main priorities should be:
I suspect that this implementation as-is might parse TOML files multiple times, and while the I haven't tried to implement this myself so I don't know exactly what a solution might look like. I naively envisioned in #8664 that a cache wouldn't be necessary since in theory TOML parsing might be done within the context of "here's your workspace fields already for you" or something like that. That may not be true and a cache may be the easiest to implement. I don't want to give the impression that I'm expecting a specific shape of PR, though! If a cache works best I don't think there's any need to avoid it, I mostly just think the above two points are the main things to consider for implementing this. |
I know you mentioned how in |
I think a cache will work best here because workspaces can show up in lots of various locations. With path dependencies Cargo could have multiple path dependencies into a different workspace, and we wouldn't want to re-parse the workspace each time we discover that. Similarly if we have multiple dependencies on a git repository we wouldn't want to re-traverse and re-parse the workspace manifest in a git repo each time as well. To handle these things I think a cache buried in I do think that we'll need to be careful to handle workspace root discovery in a somewhat clever fashion as well or otherwise all path and git dependencies will trigger their own search for a workspace root which may be inefficient. |
Currently, in src/cargo/util/toml/mod.rs, we search for the root workspace if needed, is this what you mean we should be doing? One solution to searching each time for the root workspace could be to keep a list of a workspace's members so when you go to look for a root manifest you would first check if it's in a map of something identifying the member to the root manifest path. You could use that path to find the |
Indeed yeah that's one location that I think may be multi-parsing manifests and multi-searching for a workspace root. I agree though that a cache can help the searching problem as well since we can record various paths in the cache as Cargo works its way upward in the filesystem |
☔ The latest upstream changes (presumably #9632) made this pull request unmergeable. Please resolve the merge conflicts. |
…true Signed-off-by: Scott Schafer <[email protected]>
# Conflicts: # src/cargo/core/source/source_id.rs # src/cargo/ops/cargo_package.rs
☔ The latest upstream changes (presumably #9793) made this pull request unmergeable. Please resolve the merge conflicts. |
# Conflicts: # src/cargo/ops/cargo_install.rs
Hey @Muscraft, I just wanted to check in and see where you're at with this PR. I was just curious if you had any questions or major blockers. |
I've been busy with school and work but those things should resolve this week to allow me time to work on this. I have the initial implementation for the inheritable fields cache done. While doing that it appeared to me that with how things were being done currently there are a few more places that will still cause multiple parses of a toml file. I have found those areas and am hoping to be able to push all of the changes in the next week or so. |
- Changed parsing of manifests, so they only get parsed once - Removed InheritableFields from places it does not belong - Updated tests Signed-off-by: Scott Schafer <[email protected]>
# Conflicts: # tests/testsuite/build_script.rs
Signed-off-by: Scott Schafer <[email protected]>
Signed-off-by: Scott Schafer <[email protected]>
Everything should only be parsed once now. I did not add the cache for workspace member lookup but I can add that if it is needed. There are a few areas that may need a little touching up but I figured it was better to get the core ideas reviewed than touch up what was needed. I purged most places of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update here! In the interest of making progress, I'd like to get your thoughts on the possibility of splitting this up. This is a many-months-old PR and is quite difficult to reboot all the context all the time each time I come back to this. Additionally this is a huge change at nearly 3k lines and it's very difficult to productively give feedback on such a large review because I'm sure it can feel overwhelming on your end as well.
My suggestion would be to probably close this PR and start with a fresh one. I think that we should sequence the implementation of this RFC incrementally so it doesn't have to go in all-at-once. This I think will provide a better opportunity to review this in detail and make sure we're building on solid foundations. I in general want to be quite careful with the implementation of this RFC since it touches a lot of very core parts of Cargo and we want to make sure that it's all on solid ground for the future.
Specifically I think this could at least be split up into:
- First updates to the TOML parsing code. Somewhere a method would take
&InheritableFields
(or similar) but that's actually a blank struct. There's some method (I recommended aresolve
method here) which would always returnNone
(as if it weren't configured in the workspace) which would always return an error. This means that the change isn't actually test-able yet but it will help get a lot of the structure for TOML parsing in place and how we handle elaboration. Notably this change would not include actually searching for the workspace root or creation ofInheritableFields
. - Next would be a PR to actually create and pass in an
InheritableFields
value. This would involve managing all the workspace changes and figuring out where to put all that. The main goal here is that the TOML parsing is all "taken care of" and all we really need to do is figure out how to calculate the parameter to pass into "The Function" which does the parsing. - Finally would be other miscellaneous things like fixing up paths on publish, making sure published things are all in order, etc.
I don't think that this can be split up into a ton of parts, but I do really think it will be beneficial to split this up at least into toml parsing and workspace-field-finding because I find it difficult to keep it all in my head right now and I don't want to overwhelm you with review feedback as well.
@@ -203,7 +215,35 @@ impl<'de, P: Deserialize<'de>> de::Deserialize<'de> for TomlDependency<P> { | |||
V: de::MapAccess<'de>, | |||
{ | |||
let mvd = de::value::MapAccessDeserializer::new(map); | |||
DetailedTomlDependency::deserialize(mvd).map(TomlDependency::Detailed) | |||
let details: IntermediateDependency<P> = IntermediateDependency::deserialize(mvd)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the only purpose for this custom Deserialize
implementation is to provide a better error string with the expecting
method. Having an entirely duplicated IntermediateDependency
below I think puts this a bit over the top to the point that we should figure out an alternative method of doing so. Ideally we could go back to #[serde(untagged)]
on TomlDependency
and just go from there.
Would you be up for seeing if serde has a better way to customize the error string nowadays or whether there's an alternative way to do this without duplicating so much?
} | ||
|
||
/// Parses an optional field, defaulting to the workspace's value. | ||
fn ws_default<T, F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I think would be a good candidate to be a method on MaybeWorkspace
perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think that the signature of this function should be:
impl<T> MaybeWorkspace<T> {
fn resolve(self, label: &str, get_ws: impl FnOnce() -> &T) -> CargoResult<T>;
}
Basically by having this as a method it removes the Option
layer, and I don't think that the return value should be MaybeWorkspace
but rather always simply T
because the purpose of this method is to extract the T
from the MaybeWorkspace
one way or another.
|
||
#[derive(Debug, Deserialize)] | ||
#[serde(untagged)] | ||
enum MaybeWorkspaceBadge { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come badges needed separate handling for this? If it's due to some odd bug somewhere else or something like that I think it'd be fine to not include badges in the first implementation here and to instead have this in a follow-up PR (this one's already quite large)
match (value, workspace) { | ||
(None, _) => Ok(None), | ||
(Some(MaybeWorkspace::Defined(value)), _) => Ok(Some(MaybeWorkspace::Defined(value))), | ||
(Some(MaybeWorkspace::Workspace(TomlWorkspaceField { workspace: true })), ws) => f(ws) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this is roughly the location I would expect some feature-checking to happen along the lines of "you need to enable this nightly feature in your manifest to use this"
@@ -817,14 +894,144 @@ impl<'de> de::Deserialize<'de> for VecStringOrBool { | |||
} | |||
} | |||
|
|||
fn version_trim_whitespace<'de, D>(deserializer: D) -> Result<semver::Version, D::Error> | |||
pub fn map_deps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this moved around?
workspace: true, | ||
})) | ||
} else { | ||
Err(de::Error::custom("workspace cannot be false")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is already checked in other locations, so does it need to be checked here as well?
|
||
/// This type is used to deserialize `Cargo.toml` files. | ||
#[derive(Debug, Clone)] | ||
pub struct IntermediateManifest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this structure may want to be defined in the toml parsing area, otherwise it's pretty back-and-forth where it's created here but all the guts of processing happen later with TomlManifest::to_real_manifest
.
Also, would it be possible for TomlManifest
to itself be an "intermediate" manifest?
let project = me.project.clone().or_else(|| me.package.clone()); | ||
let project = &mut project.ok_or_else(|| anyhow!("no `package` section found"))?; | ||
|
||
let workspace_config = match (me.workspace.as_ref(), project.workspace.as_ref()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks duplicated with the to_intermediate
method?
} | ||
}; | ||
|
||
TomlManifest::parse_toml_project(project, &inheritable)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the primary locations where I don't think this is quite the right model to implement this. I think it would be best to leave the fields as MaybeWorkspace<T>
within the manifest and then whenever they're read in this function they call the resolve
function I suggested above instead of altering the structure of the manifest itself and then reading all the Defined
states.
|
||
TomlManifest::parse_toml_project(project, &inheritable)?; | ||
|
||
project.license_file = match ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can be done I think it would be best to represent this path-handling in the type of the field in the TomlManifest
itself. That would remove the duplication here with license/readmes and encode more information in the type structure of the deserialized type.
I am going to leave the final decision of splitting this up or not to you. That being said I agree that this needs to be done in a careful way as it touches many of the core parts of cargo. That is one of the main problems with this RFC. It takes a system that works in a linear fashion and adds components of nonlinearity. This makes changes to many cores areas and causes a lot of the headache. I am willing to deal with the changes but I understand that it is a lot to comment on so I will be happy to split this up to make it easier to review. If we were to split this up I would like more of a description of what is to be done in each part. My current understanding is in part 1 we would
I think this might be able to be expanded upon so that
I think it makes this whole process a bit more testable as we can test the parsing of This makes it so all that is needed in part 2 is to set up finding the workspace root, set up the cache of I may have oversimplified things a bit but I am happy to discuss this more either on here, in Zulip, on a call, or another communication method. |
That sounds reasonable to me, yeah. I'm ok with that split! My goal is to basically to trim this down to being more manageable to land in incremental pieces so review/feedback can be more effective. |
☔ The latest upstream changes (presumably #10081) made this pull request unmergeable. Please resolve the merge conflicts. |
Is this PR still being worked on? |
Yes, I am hoping after the holiday season I can open the new PR |
Thanks for the PR, but I'm going to be stepping down from the Cargo team so I'm going to un-assign myself from this. The Cargo team will help review this when they get a chance. |
Closed in favor of This PR |
Part 1 of RFC2906 - Packages can inherit fields from their root workspace Tracking issue: #8415 RFC: rust-lang/rfcs#2906 All changes were heavily inspired by #8664 and #9684 A big thanks to both `@yaymukund` and `@ParkMyCar.` I pulled a lot of inspiration from their branches. I would also like to thank `@alexcrichton` for all the feedback on the first attempt. It has brought this into a much better state. All changes have been made according to the RFC as well as `@alexcrichton's` [comment](#8664 (comment)). This is part 1 of many, as described by [this comment](#9684 (comment)), [this comment](#9684 (review)) and redefined [by this one](#10497 (comment)). This PR focuses on inheriting in root package, including: - Add `MaybeWorkspace<T>` to allow for `{ workspace = true }` - Add a new variant to `TomlDependency` to allow inheriting dependencies from a workspace - Add `InheritableFields` so that we have somewhere to get a value from when we `resolve` a `MaybeWorkspace<T>` - `license_file` and `readme` are in `InheritableFields` in this part but are not `MaybeWorkspace` for reasons [described here](#10497 (comment)) - Add a method to `resolve` a `MaybeWorkspace<T>` into a `T` that can fail if we have nowhere to pull from or the workspace does not define that field - Disallow inheriting from a different `Cargo.toml` - This means that you can only inherit from inside the same `Cargo.toml`, i.e. it has both a `[workspace]` and a `[package]` - Forcing this requirement allows us to test the implementations of `MaybeWorkspace<T>` and the new `TomlDependency` variant without having to add searching for a workspace root and the complexity with it Remaining implementation work for the RFC - Support inheriting in any workspace member - Correctly inherit fields that are relative paths - Including adding support for inheriting `license_file`, `readme`, and path-dependencies - Path dependencies infer version directive - Lock workspace dependencies and warn when unused - Optimizations, as needed - Evaluate any new fields for being inheritable (e.g. `rust-version`) - Evaluate making users of `{ workspace = true }` define the workspace they pull from in `[package]` Areas of concern: - `TomlDependency` Deserialization is a mess, and I could not figure out how to do it in a cleaner way without significant headaches. I ended up having to do the same thing as last time [which was an issue](#9684 (comment)). - Resolving on a `MaybeWorkspace` feels extremely verbose currently: ```rust project.homepage.clone().map(|mw| mw.resolve(&features, "homepage", || get_ws(inheritable)?.homepage())).transpose()?, ``` This way allows for lazy resolution of inheritable fields and finding a workspace (in part 2) so you do not pay a penalty for not using this feature. The other bit of good news is `&features` will go away as it is just feature checking currently. - This feature requires some level of duplication of code as well as remaking the original `TomlManifest` which adds extra length to the changes. - This feature also takes a linear process and makes it potentially non-linear which adds lots of complexity (which will get worse in Part 2) Please let me know if you have any feedback or suggestions!
Tracking issue: #8415
RFC: rust-lang/rfcs#2906
All changes were heavily inspired by #8664
A big thanks to both @yaymukund and @ParkMyCar. I pulled a lot of changes from their branches.
All changes have been made according to the RFC as well as @alexcrichton's comment.
This makes it so that typically a workspace is found before parsing any manifests into packages. Allowing for reduced searching for a root workspace to pull things from. When looking for a path dependency's version (if not specified) there are situations where you have to find the root workspace to find the version but this appears to be very rare.
There are a few problem areas in cargo/src/cargo/util/toml/mod.rs where things get a bit clunky and verbose. I wasn't quite sure how to fix those spots and any help to those areas is appreciated. I also pulled how to join relative paths from this repo. I couldn't find a way within Cargo to join the relative paths correctly so I used that.
Please let me know if this isn't how you expected the changes to be made. I'll be happy to rework anything necessary.