Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNG-7038] Introducing project.topdir #840

Closed

Conversation

nielsbasjes
Copy link
Contributor

MNG-7038: Introduce public property to point to a root directory of (multi-module) project.

This pull request simply introduces the new variable project.topdir which has been documented and points to the top directory of a (multi module) project.

Please double check if you agree with both this variable name and the way I have implemented it.


Following this checklist to help us incorporate your
contribution quickly and easily:

  • JIRA issue MNG-7038.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [MNG-XXX] SUMMARY, where you replace MNG-XXX
    and SUMMARY with the appropriate JIRA issue. Best practice is to use the JIRA issue
    title in the pull request title and in the first line of the commit message.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will
    be performed on your pull request automatically.
  • You have run the Core IT successfully.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@nielsbasjes
Copy link
Contributor Author

Sidenote: I found in the init script there is something called basedir that seems to be very similar to this project.topdir (it is different if you have a .mvn somewhere half way your project tree) yet it is very different from project.basedir.
Confusing!

@kwin
Copy link
Member

kwin commented Oct 21, 2022

I found in the init script there is something called basedir

IIUC then the property value of the newly introduced project.topdir relies on exactly this value (

MAVEN_PROJECTBASEDIR="`find_maven_basedir "$@"`"
and f7ad2cd#diff-5ac9f422973753d652e14ce805f3882b3de8ad6bbe158f93251c19ff1ee06a52R19). I am not sure this fuzzy logic is good enough for this purpose.

Probably this newly introduced property should rather be set from Java code based on the following logic:

  • check parent of current (non-effective) pom
  • if it has relative path being non empty, check the given path
  • check recursively until no more relative path is set/not found

@nielsbasjes
Copy link
Contributor Author

@kwin You are right.

That is probably a lot better, also because this code in init will travel outside of the project and even the homedirectory of the user to find the .mvn directory.

however...
The problem I see is that for some things, like reading the ${project.topdir}/.mvn/settings.xml (See MNG-5659) you'll need to have that available before the pom.xml files are read.
This needs more investigation on how to implement!

@nielsbasjes nielsbasjes marked this pull request as draft October 21, 2022 08:13
@nielsbasjes
Copy link
Contributor Author

This also means the (internal) maven.multiModuleProjectDirectory suffers from the same problem: It may traverse outside of the project root. This will also cause problems if people checkout a different project in a subdirectory of their current project.

@kwin
Copy link
Member

kwin commented Oct 21, 2022

This also means the (internal) maven.multiModuleProjectDirectory suffers from the same problem

This is internal for a good reason and should only be used for finding the .mvn directory belonging to the current project (and even there may fail)

@gnodet
Copy link
Contributor

gnodet commented Oct 21, 2022

I fully agree to have a new ${project.topdir} that won't go outside of the tree.
For the ${project.topdir}/.mvn/settings.xml use case, I'm not sure how that could be handled if multiple projects have different parameters. If those projects are included in a bigger build, everything inside .mvn/ becomes irrelevant I think, and all access to this directory should be done using ${maven.multiModuleDirectory}.
Note that we also have the executionRootDir which seems to reflect the upper parent path being built (mainly when using -f xxx option).
All those properties should be clearly defined and made public.

@gnodet
Copy link
Contributor

gnodet commented Oct 21, 2022

  • check parent of current (non-effective) pom
  • if it has relative path being non empty, check the given path
  • check recursively until no more relative path is set/not found

One question comes to my mind. Are all those properties system properties ? It would make sense to have some of them being computed relative to a given project (in the pom.xml or MavenProject sense). Especially if we want to support aggregating multiple projects in a single build, each subproject should have a clearly to identify its own topdir.
Given I don't think there's a simple way to find the logical root, especially when projects are used as subproject in a bigger build, we may need to actually use the fact that there is a .mvn/ directory somewhere and stop there. Also children may have a parent outside the build itself, and thus we can not only going through the parents to find the topdir.
I think that's how the current maven.multiModuleDirectory is computed, but we need one per-project, in case there's an aggregation.

Also I would keep project.xxx for things that needs to be computed per-project (especially in case the project is used inside a bigger project), and maven. for things that are unmodified during the build.

@nielsbasjes
Copy link
Contributor Author

nielsbasjes commented Oct 22, 2022

I think I ran into a chicken-egg type of problem.

  • I would like to have the project.topdir available as a variable that can be used in the pom.xml to pin point things. This implies it must be available before that point.
  • Also I would like to base loading a project specific settings.xml opon this variable (MNG-5659)
  • Yet the only viable way to determine it is after the pom.xml has been parsed and at least the parents are available.

I would like to have some suggestions on how to handle this is.

@gnodet
Copy link
Contributor

gnodet commented Oct 22, 2022

I think I ran into a chicken-egg type of problem.

  • I would like to have the project.topdir available as a variable that can be used in the pom.xml to pin point things. This implies it must be available before that point.

This would have to be computed inside the DefaultModelBuilder. The models are built in two phases. After the first phase is done, all models have been read and the needed information should be available. So the best place may be (without much thoughts) in the DefaultModelBuilder, just before the model interpolation. The parents (and all models from the reactor) should be available on the model building request. The relevant code is here.

  • Also I would like to base loading a project specific settings.xml opon this variable (MNG-5659)

For the per-project settings.xml, I don't think they should use the topdir, at least not using the parent hierarchy. I think the way the multiModelDirectory is currently computed would better fit the need. Especially if the settings.xml is to be located in the .mvnd/ directory.

  • Yet the only viable way to determine it is after the pom.xml has been parsed and at least the parents are available.

I would like to have some suggestions on how to handle this is.

@michael-o
Copy link
Member

michael-o commented Oct 23, 2022

I think before anything can be decided, I'd expect a full documentation of all possible cases where such a directory can be calculated. I think that a project property should not mutate during the build since it is bound to the current module being executed.

@gnodet
Copy link
Contributor

gnodet commented Oct 23, 2022

I think before anything can be decided, I'd expect a full documentation of all possible cases where such a directory can be calculated. It hink that a project property should not mutate during the build since it is bound to the current module being executed.

Agreed on all points. For the project specific properties, it could be either:

  • be made available for interpolation during the project building
  • or added as project property in addition to user properties defined in the pom

@nielsbasjes
Copy link
Contributor Author

@gnodet Thanks for the code pointer. This really helps in my understanding the code.

For the per-project settings.xml, I don't think they should use the topdir, at least not using the parent hierarchy. I think the way the multiModelDirectory is currently computed would better fit the need. Especially if the settings.xml is to be located in the .mvnd/ directory.

I disagree with this because if you want the settings.xml in a .mvnd then you currently must also create a .mvn with a dummy file or it will not find the correct directory.
Also the absense of a .mvn in a project can lead to unexpected effects.

@michael-o Yes, good point.

I agree with the property being immutable from an as early moment in the build as possible.

My current thoughts about the meaning of the topdir:

  • It is the filesystem directory of the top module within the project. So I intend to walk up the project/module tree instead of the filesystem directory tree.
  • Walking up stops if there is no parent defined or the parent is outside the project (relative path is empty).
  • So if there is a pom.xml without a parent, yet in the parent filesystem directory there is a pom.xml, walking up will still terminate because there is no parent relation defined.
  • This also means that in a single module project it is the same as the project.basedir.

The important difference with the multiModelDirectory is that that one walks up the file system directories until it finds a .mvn subdirectory (or /). This has a real possibility of walking out of the project. So the same project on the same machine in a directory under the same user with all environment variables the same may still be built differently only because it was checked out in a different subdirectory where one of the parent folders contains a .mvn with some kind of config file that is picked up.

So the topdir definition I have in mind does not depend on the existence of .mvn or .mvnd or anything else and as such can be used more reliably in all projects.

@gnodet
Copy link
Contributor

gnodet commented Oct 23, 2022

@gnodet Thanks for the code pointer. This really helps in my understanding the code.

For the per-project settings.xml, I don't think they should use the topdir, at least not using the parent hierarchy. I think the way the multiModelDirectory is currently computed would better fit the need. Especially if the settings.xml is to be located in the .mvnd/ directory.

I disagree with this because if you want the settings.xml in a .mvnd then you currently must also create a .mvn with a dummy file or it will not find the correct directory. Also the absense of a .mvn in a project can lead to unexpected effects.

@michael-o Yes, good point.

I agree with the property being immutable from an as early moment in the build as possible.

My current thoughts about the meaning of the topdir:

  • It is the filesystem directory of the top module within the project. So I intend to walk up the project/module tree instead of the filesystem directory tree.
  • Walking up stops if there is no parent defined or the parent is outside the project (relative path is empty).

Note that with build poms, the relative path can be kept empty and it will be inferred from the reactor. But it should be available after the first phase of the model building IIRC.

  • So if there is a pom.xml without a parent, yet in the parent filesystem directory there is a pom.xml, walking up will still terminate because there is no parent relation defined.
  • This also means that in a single module project it is the same as the project.basedir.

The important difference with the multiModelDirectory is that that one walks up the file system directories until it finds a .mvn subdirectory (or /). This has a real possibility of walking out of the project. So the same project on the same machine in a directory under the same user with all environment variables the same may still be built differently only because it was checked out in a different subdirectory where one of the parent folders contains a .mvn with some kind of config file that is picked up.

So the topdir definition I have in mind does not depend on the existence of .mvn or .mvnd or anything else and as such can be used more reliably in all projects.

It really depends on the structure of the project. Some projects may not have their ancestor as the topmost directory. We need to cover those use cases too. Also, it needs to work during a project-aggregation (or we need some properties for that). I think we need to think in terms of use-cases, not properties.

What's your first use-case for the topdir thing ?

@nielsbasjes
Copy link
Contributor Author

Cases to think about I just thought of:
Context: Multi module project, all parent modules have pointers towards their child modules.

  • A child module points to the correct parent (within the project).
  • A child module points to a different parent within the project.
  • A child module points to a parent outside the project.
  • A child module has no parent.

One of the child mentioned edge cases has settings IN the child module (i.e. ${project.topdir}/mvn/settings.xml is not really the top of the entire project).
In this example the build of such a module will yield different results if built from the real top v.s. build from within the module itself.

@nielsbasjes
Copy link
Contributor Author

@gnodet My first class of usecases is a consistent location for project level settings ( https://issues.apache.org/jira/browse/MNG-5659 ) which will solve a lot of practical problems for many of the projects where I work.

@nielsbasjes
Copy link
Contributor Author

A practical example is in this project https://github.com/nielsbasjes/yauaa I wanted to have a single set of project specific checkstyle config. I ended up with packaging these files in a jar and making that a dependency of the checkstyle plugin.
The only reason is not having a stable topdir which I could refer to.

@gnodet
Copy link
Contributor

gnodet commented Oct 23, 2022

@gnodet My first class of usecases is a consistent location for project level settings ( https://issues.apache.org/jira/browse/MNG-5659 ) which will solve a lot of practical problems for many of the projects where I work.

We need to split between build settings, and project resources I think. By build settings, I mean things that are loaded once for a given maven session. Per-project settings.xml would be one of those.
However, locating project specific resources is different, especially you want to refer to the project root on the filesystem to let maven translate that into an absolute path. I'm thinking about checkstyle rules referred to as build/checkstyles.xml or similar.
When such projects are aggregated into a bigger project, the build settings can't really be loaded per-project. However, project resources have to keep working.

@gnodet
Copy link
Contributor

gnodet commented Oct 23, 2022

A practical example is in this project https://github.com/nielsbasjes/yauaa I wanted to have a single set of project specific checkstyle config. I ended up with packaging these files in a jar and making that a dependency of the checkstyle plugin.
The only reason is not having a stable topdir which I could refer to.

Well, I don't think the topdir as defined above would work. What you need here, is a project resource, so can have a stable absolute path on the file system for the root of the project. It has nothing to do with parent/child relationship imho. The parent/child relationship defines inheritance in the project models, but is completely unrelated to their respective file system location. And what we need here is definitely a file system based property.

This top level directory is used by maven. When you invoke maven from a child, all the POMs from the reactor are loaded. This is done through the multiModuleProjectDirectory. And that one is also used to load various build settings currently, such as .mvn/maven.config and .mvn/extensions.xml.

I think this property works well when the projects are standalone.

Problems arise when projects are aggregated, or when .mvn directory does not exist for the current project, but exists in a parent directory. In which case, the computation is kinda broken because it goes outside the project's root directory.
Not sure yet how to fix that for the aggregation case, but for the second case, we could refine by checking if there's a pom.xml file in the parent folder before allowing to use it.

@gnodet
Copy link
Contributor

gnodet commented Oct 24, 2022

I'm wondering if it would make sense to add a new element in the build POM model (which would be removed in the consumer pom) to specify that a given pom is the top level pom. The heuristic to obtain the multiModuleDirectory would then be update to go up in parent directories until we find a pom flagged as root. This would allow not forcing the use of an empty .mvn directory.

I do think that this need to be thought with projects aggregation in mind.

This could be either an attribute <project type="root" ... or an element <type>root</type>. We may need another value for specifying aggregate poms if we end up having to specify them explicitly to change some behaviours. Aggregators could be identified by having one pom.xml in a parent directory flagged with type='aggregator' in which case, the multiModuleDirectory would point to the aggregator root, while project specific resources (to locate file in the source tree) would not go upper than the root project.

@merikan
Copy link

merikan commented Oct 24, 2022

I've been following this thread and I just thought I'd throw in my two cents. This is what my use case looks like:

Over the years I have built a number of multi-modular projects that have spring-boot as a parent. I have a number of artefacts in the root (folder) that I want to access from my sub-modules (pmd, spotbug, spotless, dependency-check, etc).

root
├── etc
├── module1
├── module2
├── module3
└── pom.xml

Initially I was using a build-module but since it was used as a dependency and was not always available I gave up using that approach.

The solution I've been using for the last few years is the build-helper-maven-plugin and the rootlocation goal. I don't know how it resolves the root location, but it has worked in all my projects without any problems. (The only problem with this solution is that a plugin is bound to a phase and since there is no default phase to bind to, I always have to run validate before if I want to execute the plugin goal from the command line, like this: $ mvn validate spotless:check. But that is an entirely different issue)

<configuration>
  <printFailingErrors>true</printFailingErrors>
  <failOnViolation>true</failOnViolation>
  <rulesets>
    <ruleset>${rootlocation}/etc/pmd/pmd-ruleset.xml</ruleset>
  </rulesets>
</configuration>

I guess most Maven projects probably use, for the sake of simplicity and probably the most intuitive, a direct hierarchical structure which is also reflected on the file system. So do I, with a few exceptions where I used a parent module that was next to the child modules. When using the relativePath element, we can have all sorts of parent-child relationships where submodules might have different parents and it can get very complicated. Most people probably like to think of a common ancestor in a project but it is not, as I understand it, something that maven imposes.

I understand that people can have quite complex project structures and maybe also in a mono-repo and that Maven should be able to support it. To me it sounds reasonable to assume a .mvn directory as the root when using a subfolder structured project but there may be use cases where that is not correct either.

For me, regardless of which submodule I'm building from, I want to be able to find my shared artefacts up in the hierarchy (file structure). This is what many tools does and most people are used to it and it is quite accepted. I do think that mixing it with pom parent relation makes it a lot more complicated.

One solution could be to use different strategies for resolving the project root (topdir) and one strategy, the default one, could be to traverse the directory tree searching for a marker, e.g. .mvn directory. To make it more flexible when traversing the tree, then maybe it should be possibility for the user to override the marker, for what is considered a project root, with another directory or filename.

@gnodet
Copy link
Contributor

gnodet commented Oct 24, 2022

The solution I've been using for the last few years is the build-helper-maven-plugin and the rootlocation goal. I don't know how it resolves the root location, but it has worked in all my projects without any problems. (The only problem with this solution is that a plugin is bound to a phase and since there is no default phase to bind to, I always have to run validate before if I want to execute the plugin goal from the command line, like this: $ mvn validate spotless:check. But that is an entirely different issue)

Fwiw, the build-helper-maven-plugin uses the following heuristic: it goes up the list of parent directories and at each step, checks if there's a pomx.ml, loads it and checks if the loaded project has the current project as a child module (so not in the sense of parent/child inheritance, but using a <module> element that is used to define the reactor). The benefit is that it works with a traditional project, even if the model's parent is not located in the parent's directory. The particular point is that if a project is aggregated in a bigger one, it will go up the chain, so may go out of the project tree. And it won't stop when there's a .mvn project. Another downside is that it uses maven to load the parent project so that heuristic can not be used from early shell scripts.

@merikan
Copy link

merikan commented Oct 24, 2022

Fwiw, the build-helper-maven-plugin uses the following heuristic: it goes up the list of parent directories and at each step, checks if there's a pomx.ml, loads it and checks if the loaded project has the current project as a child module (so not in the sense of parent/child inheritance, but using a <module> element that is used to define the reactor). The benefit is that it works with a traditional project, even if the model's parent is not located in the parent's directory. The particular point is that if a project is aggregated in a bigger one, it will go up the chain, so may go out of the project tree. And it won't stop when there's a .mvn project. Another downside is that it uses maven to load the parent project so that heuristic can not be used from early shell scripts.

thanks for the update. I didn't know the build-helper-maven plugin worked like that, although I probably suspected it. Good to know what the reason is, if it ever goes outside my project tree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants