Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce that Task.Sources cannot have upstream task dependencies #4121

Open
lihaoyi opened this issue Dec 12, 2024 · 11 comments
Open

Enforce that Task.Sources cannot have upstream task dependencies #4121

lihaoyi opened this issue Dec 12, 2024 · 11 comments
Milestone

Comments

@lihaoyi
Copy link
Member

lihaoyi commented Dec 12, 2024

This would simplify the data model considerably, make things easier to reason about internally, and avoid common mistakes (e.g. #4120). e.g. for watch-and-rerun or selective execution, it would mean we can just inspect the hashes of the input tasks in isolation, rather than running through an entire evaluation workflow to get their values

I wanted to do this a while back, but in the past the fact that T and T.sources were different types meant that you had to override T.sources with T.sources, so sources needed to support depending on tasks to allow the common override-and-add workflow. Now, we can simply override Task.Sources with Task, so we no longer need Task.Sources to support upstream task dependencies

This is a breaking change and would need to wait for 0.13.0

@lihaoyi lihaoyi added this to the 0.13.0 milestone Dec 12, 2024
@lefou
Copy link
Member

lefou commented Dec 13, 2024

So how do I add new source dirs (that should be watched) to an inherited sources task then?

object Common {
  def sources = Task.Sources(millSourcePath / "src" )
}
object foo extends Common {
  def sources = Task.Sources {
    super.sources() ++ Seq(PathRef(millSourcePath / "src-foo"))
  }
}

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 13, 2024

Something like this:

object Common {
  def sources = Task.Sources(millSourcePath / "src" )
}
object foo extends Common {
  def newSources = Task.sources(millSourcePath / "src-foo")
  def sources = Task { super.sources() ++ newSources() }
}

A bit more verbose than the current setup, but not too bad

@lefou
Copy link
Member

lefou commented Dec 13, 2024

That's really inconvenient. Since this change is only motivated by the implementation impact, maybe we can restrict the super call to when super is a Task.Sources, so that the outcome can be merged easily?

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 13, 2024

We may be able to make it easier. The fundamental thing here is I want to get rid of the Task.Sources{ <arbitrary-code> } style of calling these things, and instead use Task.Sources(<static-paths>). Maybe we could add an API specifically for extending sources?

def sources = Task.Sources(super.sources().paths ++ Seq(millSourcePath / "src-foo"))

@lefou
Copy link
Member

lefou commented Dec 13, 2024

I think the whole logic behind dynamic calculation of the source dirs, e.g. src-<scala-version>-<platform-version> where those versions are setup via targets will no longer be possible then. We need <arbitrary-code> in various places, e.g. CrossScalaVersionRanges.scalaVerionDirectoryNames and ZincWorkerUtil.versionRanges.

Otherwise, users start to switch to use Task.Input and you gain exactly nothing.

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 13, 2024

The dynamism required for those can still be done with the API above I think. They just require crossValue/crossScalaVersion and the millSourcePath, all of which can be computed at resolution time without needing to evaluate any tasks.

Not sure if there are other use cases that do require task-results to do their computation, but at least those discussed here do not.

I'd like to remove the ability of Task.Inputs to depend on upstream tasks as well.

Overall these issues we'll need to sort out during implementation to see if it hits any blockers, but at least for now it still seems feasible

@lefou
Copy link
Member

lefou commented Dec 13, 2024

I'm not convinced crippling Input is a wise thing to do. Isn't there any other option to fix the watch and re-evaluate a command issue? Any setup that captures external tools output in an Input which also requires some configuration via some tasks won't work anymore.

Should we really compromise some well working existing capabilities for the sake of a seldomly used complex use case, which could be easily worked around by other means?

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 14, 2024

I think there are two possible ways forward:

  1. We remove Task.Sources(values: Result[Seq[PathRef]]) in favor of Task.Sources(values: Result[os.Path]*), and leave Task.Inputs in place. The PathRef overload of Task.Sources that takes arbitrary code is isomorphic to Task.Inputs anyway: anyone who needs it can still use that functionality, but people wont accidentally use it (which is common today, even in Mill's own codebase). This would just be tidying up without really reducing the power exposed by the Mill API, and would help reduce the error-prone-ness without allowing internal simplifications

  2. We remove Task.Sources(values: Result[Seq[PathRef]]) and prevent Task.Inputs from taking upstream task dependencies. This is stricter than above, but it's possible. We do have one callsite in MillBuildRootModule that currently relies on an input with upstream dependencies, so we would need to see if that is really necessary, and also see if there are other parts in the third-party ecosystem that rely on this functionality. This would reduce the power of the Mill API but also allow internal simplifications

We don't need to come to a decision yet, just things to think about in the run up to 0.13.0

@lefou
Copy link
Member

lefou commented Dec 14, 2024

Giving the user the choice to use a more powerful but maybe less performant solution, Task.Input, when needed, but promote a less powerful but more performant Task.Sources as the preferred solution for most of the trivial cases, sounds best to me.

I think there should be an easy way to add static paths (aka overriding a Task.Sources) while keeping nice performance characteristics.

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 14, 2024

One interesting place I found where we have a source -> source dependency is in JavaTests#sources, where we rely on the outer.sources() to take its relative path so we can mirror the relative path layout in JavaTest#sources

override def sources = Task.Sources {
for (src <- outer.sources()) yield {
PathRef(this.millSourcePath / src.path.relativeTo(outer.millSourcePath))
}
}

@lefou
Copy link
Member

lefou commented Dec 14, 2024

I think changing the user facing API to always accept os.Path in Task.Sources and Task.Source, and just handle the PathRef creation behind the scenes would make it a bit easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants