-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop doing manual string formatting of output #655
Conversation
Use tera for all rendering. There are things we will want to change, both about the output, and how it's computed, but this is an incremental step in more legible, testable code. Also, add some hermetic unit tests for rendering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good to me.
One thing I'd like to see is a more declarative implementation of each component. I quite like how cargo-raze is shaking out to be.
This may be my own lack of rust experience but I've definitely found imperative styles to be difficult to test and harder to follow. No action for this PR but something I'd love to either learn more about (if you have some recommended links) or keep in mind for future changes.
let output = String::from_utf8(output).expect("Non-UTF8 output"); | ||
|
||
// TODO: Have a single function with kwargs to enable each kind of dep, rather than multiple functions. | ||
let expected_repository_rule = r###"CRATE_TARGET_NAMES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feeling nervous about the volume of dumped text in the tests right now. I don't think this is a blocker for this PR, but I think it'd be better to implement #653 and have more asserts around data instead of massive string comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I think that everything except the renderer should be tested using much more data-y ways, but I think the renderer fundamentally kind of has to be checking raw strings against each other.
For the renderer tests themselves, making the things being tested be much more targeted (e.g. single crates), and hermetic (hard-coded inputs), and potentially using better assertions libraries, helps here, but at the end of the day these tests test what we're really outputting the right thing. I also find it pretty useful to be able to look at the diffs in PRs to see what "really" changed. I'm hoping that these small number of better tests will be sufficiently maintainable, and that we'll delete the huge unhermetic ones when we have suitable targeted replacements for each pipeline stage :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, sorry, I didn't realize I'd put that comment in the renderer code. I agree the renderer should be the place to do these kinds of tests. This is my first time really looking at the tests so just wanted to reiterate that issue.
I suspect we have similar aims and ideals here! If you ignore a bit of the noise (which we should totally clean up), the core of let opt = Opt::from_args();
let config = ...;
let mut resolver = config.preprocess()?;
if lockfile_up_to_date {
copy_lockfile(...);
return Ok(());
}
let consolidator = resolver.resolve()?;
let renderer = consolidator.consolidate()?;
renderer.render(&mut output_file)?; The real difference compared with the I imagine you (and I!) may prefer this to be written as: let opt = Opt::from_args();
let config = ...;
let versions = resolve(&config);
let consolidated_versions = consolidate(&config, &versions);
render(&config, &consolidated_version, &mut output_file); The main reason we ended up with the current structure, rather than returning values that we pipe into the next function manually, is that we wanted to ensure the hash generated in As we've discussed elsewhere (and it's worth writing down!), I'm very happy to change how lockfiles work, which would make me feel a lot more comfortable changing that code structure to be more "return a value and pass that as an argument to another function" rather than "return an object with a no-argument function to call", as long as we preserve the existing properties of the system - that if you're using a lockfile:
I'm imagining that will probably look like changing from "copy lockfile and early return" to encapsulating the cache key checking into the |
I need to digest your previous comment but one thing that stands out to me is that there are a bunch of environment variables going into the hash? This seems pretty undesirable and I feel effectively makes the lockfile un-sharable. rules_rust/crate_universe/src/resolver.rs Lines 187 to 193 in e744b93
Not sure if this is the appropriate place to discuss this but why is this the case? |
Probably not, but we don't have a better one at the moment, and we're here, so... 😁
Yeah, this is not super desirable, but... Anything which may affect the resolve needs to go in the hash. So imagine a (fictional) Cargo env var There are a few approaches we could take here:
There are probably other approaches too. We sided on the last of these options because:
This was the best trade-off we came up with, but as with everything, very open to other suggestions/ideas! |
Again, still need to take the time to read your thorough responses but |
If we agree (I don't know if we do) that some env vars may need to be passed to cargo, then it comes down to a user freedom trade-off: We can either allow by default (unblocking users, but potentially making cache keys over-sensitive), or deny by default (meaning some users won't be able to use |
What I mean is we should only be including environment variables that we can guarantee have the same impact on the lockfile no matter what machine/environment they're used in. I don't even think |
Yeah, I think that's reasonable. I've been taking the approach of "If you decided to run Bazel with it, and decided you didn't care about the warning, you opted in to it being included", but maybe we should switch to an explicit allow-list. Whether we should allow the user to specify it, or hard-code it in |
My goal is just to be able to share a lockfile across machines and operating systems. Consider the case of building |
The lockfile is currently cross-platform. I'm using the same lockfile across Linux and Mac machines today. Environment variables are, as far as I know, the only weird corner, and even those are only problematic if you do things like set paths in them... (Though, as you note, stripping out the |
Nice, I didn't really understand how that can be the case but glad that's the case. I think the smarter approach would still be to not include all environment variables by default if there's a chance someone could unknowingly affect the generated lockfile. I'd imagine it'd be super frustrating to not realize something was set in your environment and commit a lockfile that no one else on your team can use. |
Ok, so having understood everything written now, my general response to it all (and reiteration of things I've already said) is we should only force things to be included that affect the outputs and allow for users to explicitly request that additional environment variables get used. I don't want to have anyone be in a position where they need to open a PR to get some environment variable on the deny-list. This may be my lack of familiarity with cargo's dependency resolution but I feel it's not quite correct to have the version factor into the hash of the lockfile. Can two versions of cargo not generate the same set of dependencies when given the same lockfile? If this is true, then we shouldn't track the version. I feel this should be the case for any other tool/input to the resolver. Even if it's used for generating the current lockfile, if a version were to variable were to change but all the same dependencies get defined, then it should not be factored into the hash. and re control flow of |
I think that's a reasonable approach - given that, what should this actually look like for env vars? Should we have an allowlist in
Given the same
Yes, I agree with that. And definitely trimming down to the lockfile only containing the resolved version graph, and not the whole generated rules, will help trim down what needs to be factored into the hash a lot!
Yeah, when we stop the lockfile containing the whole rendered output, we can pass a lot less to the resolver.
I think those things should all be very easy to notice in code review at the moment. Currently the rule is very simple: Don't open any sources of external data (e.g. file paths, network addresses) unless you were constructed with them. So if someone started reading a hard-coded file path, that would be bad, but easily noticed. The two places that are allowed to violate this are:
The two corners which are trickier are:
I am very supportive of making our tests hermetic! The only real requirements we have that scrape at the edges of hermeticity are:
There's always a tension between "how accurate are my mocks" vs "how reliable are my tests", and going all the way in either direction has issues :) |
@illicitonion I split the conversation into two threads since it seems to be getting quite lengthy and don't want it to distract from this PR 😄 |
Very good call - thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving rubber stamp approval from vacation in case you want to merge quickly.
Use tera for all rendering. There are things we will want to change, both about the output, and how it's computed, but this is an incremental step in more legible, testable code. Also, add some hermetic unit tests for rendering.
Use tera for all rendering.
There are things we will want to change, both about the output, and how
it's computed, but this is an incremental step in more legible, testable
code.
Also, add some hermetic unit tests for rendering.