Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability for registry code generation to be out of source #1975

Merged
merged 2 commits into from
Jan 17, 2024

Conversation

islas
Copy link
Collaborator

@islas islas commented Jan 5, 2024

TYPE: enhancement

KEYWORDS: registry, code generation

SOURCE: internal

DESCRIPTION OF CHANGES:
Problem:
The current registry generated code assumes generating the autogenerated code should be placed within the WRF repository alongside true source code that is committed to the repository. This leads to a cluttered source tree with hundreds of files that were autogenerated, and relies on various stopgaps (unintuitive ignores, specific path and wildcard rm commands within source folders, run code generation from specific location).

In preparation for the future cmake build, it would be ideal to not inherit these limitations of the registry code registration, allowing the source tree to be pristine while autogenerated code is placed in an alternate build directory.

Solution:
Add the potential for out-of-source code generation of the registry code. This can be achieved by adding the ability search a relative directory for the io_boiler_plate_temporary.inc. This would allow the registry generation to occur in an alternate directory but still source the Regsitry/ config files. This is because the boiler plate is generated relative to the execution path, but Registry config files are sourced via the provided command line path.

Without this change, while the makefile system would run as before registry code generation outside the WRF directory structure would place the boiler plate and Registry config files in two different locations and thus fail to find the core template file. Since the relative directory of the boiler plate output is ./Registry, by searching for that as the alternate relative directory we would find both Registry config files and the generated boiler plate. This would be able to accommodate different locations while for the makefile build system there would be no change to the way code is generated and searched as ./Registry and path to the Registry config files are the same when at the top-level WRF directory.

LIST OF MODIFIED FILES:
M tools/reg_parse.c

TESTS CONDUCTED:

  1. Out-of-source builds work for cmake (experimental at the moment) and no impact on make builds

RELEASE NOTE:
Add the ability for registry code generation to be out of source

@islas islas requested a review from a team as a code owner January 5, 2024 22:42
@weiwangncar
Copy link
Collaborator

@islas What happened to these files before this change when doing 'clean -a'? Will these files be removed after this change with 'clean -a' or else?

@weiwangncar
Copy link
Collaborator

The regression test results:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

@islas
Copy link
Collaborator Author

islas commented Jan 6, 2024

@weiwangncar ./clean -a removes these files. These changes do not affect the results of the current system (i.e. everything works exactly the same)*

This previously was done by hard-coding Registry as the directory, cd .. to top-level WRF directory and then [after some copying] pointing to Registry/Registry as the file. This functionality was entirely preserved.
A detailed explanation of how the code works is detailed below:

To roughly emulate this without changing too much, the io_boiler_plate_temporary.inc is generated at the top-level (from gen_streams.c). Since registry include files are all written as if relative to there despite the code being executed one directory up, the expectation is to search with dir + filename, dir being the split out directory of the provided Registry file (in this case now Registry/). The change to use include_file_name_dir uses the exact same logic as before prepending the dir path, but now it will first try any local include files (say for instance registry.chem exists at top-level and is being included - it would grab that first before the Registry/ located one). Since we never keep registry files outside of Registry/ this first check always fails for registry includes and basically uses the old system, save for the boiler plate code which is now in ./. I found this to be the best balance of minimally invasive and most succinct way to modify the registry code generation to get the desired result (out-of-source build possible but almost entirely leave the make build unchanged).

*io_boiler_plate_temporary.inc is currently not being deleted but if necessary I can add that to the clean script. The reason it was moved is slightly difficult to piece together due to how fragile the Registry code is but the reasoning is as follows:

  • To avoid hard-coded search paths requiring always looking in Registry (keep in mind io_boiler_plate_temporary.inc is a generated file and the point of this PR is to have the ability to avoid putting any files like that into the source code tree) we make it output in ./
  • Now all gen_* calls which operate on relative directories and the boiler plate inc will put files wherever we run the registry code - For makefiles this doesn't change, but for cmake this location is outside the WRF repo
  • Then to balance the fact that we moved the boiler plate but still need to include files from under Registry/ we add a precursory check in our current directory with the edits in reg_parse. This previously was not a problem when the boiler plate was put directly under Registry/ so the dir + file name always looked there

@weiwangncar
Copy link
Collaborator

@islas Thank you very much for the detailed explanation! Really appreciate it!

@mgduda
Copy link
Collaborator

mgduda commented Jan 10, 2024

It seems less than ideal to dump the io_boilerplate_temporary.inc file in the top-level WRF directory. And thinking about the modifications to pre_parse() to search two locations for include files, I wonder whether there may not be an opportunity for generalization here?

@islas What do you think about the idea of adding a new optional argument to registry to specify the location where io_boilerplate_temporary.inc is written (and searched for)? It seems like it shouldn't be overly difficult to add some logic to parse a new option around L.123 of registry.c. That directory could then be passed to gen_io_boilerplate() and also to pre_parse().

If the -io_boilerplate_dir option defaults to ./Registry, and if the registry program were invoked with -io_boilerplate_dir=./ when using the CMake build, then I think there would be effectively no change to where files are generated under the current WRF build system and the CMake builds wouldn't have to write to the pristine WRF source directory.

@mgduda
Copy link
Collaborator

mgduda commented Jan 10, 2024

If we were to go the route proposed in my previous comment, I suppose there would be two directory arguments to pre_parse(), which could be distinguished by naming the arguments something like pristine_registry_dir and build_dir (or whatever better names we can think of).

@islas
Copy link
Collaborator Author

islas commented Jan 11, 2024

I wonder whether there may not be an opportunity for generalization here

Do you mean generalizing search to an arbitrary number of search paths? While I think we could, I question whether it would be worth it at this stage in the registry's life where expansion is generally minimal.

adding a new optional argument

I think I had considered similar options, and my original reason was avoiding manipulation of the interfacing while still being more or less reasonable. Having both the CMake and Make builds fundamentally do the same thing (writing to ./ regardless) and not having the two builds provide different modes of operation for the registry was a primary goal.

I do think having the io_boiler_plate_temporary generate at top-level is not the neatest, but in my opinion was the best balancing of marrying the two builds seamlessly.

@mgduda
Copy link
Collaborator

mgduda commented Jan 11, 2024

I think "generalization" could be as restricted as allowing the alternate directory (i.e., non-Registry/ directory) to be provided as a command-line option rather than being hard-coded to ./. Anyway, I'd like to see if we can find a solution that doesn't place generated files in the top-level WRF directory. When building out of source, we're doing something that's fundamentally different, so I don't think we need to worry about whether the registry tool is being invoked with different options.

To start some further discussion, where are other files that are generated by the registry being written in out-of-source builds using CMake? I assume this is an inc directory; but is that inc directory created in the CMake build directory?

@islas
Copy link
Collaborator Author

islas commented Jan 11, 2024

Structurally different, but in binary execution they are 1-to-1 the same so I wouldn't necessarily say fundamentally different. I think this is a worthwhile aspect, as the concept of introducing various modes of operation to achieve the same effect that a single mode could can lead to overly complex systems.

What we could do - if leaving the boiler plate at the top-level is the primary concern - is change the secondary search path to instead be ./Registry. This would entail

  • generate boiler at ./Registry
  • still search two paths, current directory Registry (./Registry) and location of Registry config file (for Make build now the same)

This second search is still needed because registry generation of files is all relative, but the search paths are whatever comes before the Registry file. For CMake this is an absolute path and a different path than the boiler plate which is now in <cmakeBuildDir>/Registry/io_boiler_plate_temporary.inc for this example.

Benefits:

  • registry binary still runs the same between the two, only run location changes
  • command for execution only changes location and doesn't require situational flags
  • no boiler plate at top-level
  • defaulting / hard-coding to search + output boiler to ./Registry follows the model of the other generated files rather than generating at ./Registry but searching in <whatever came before Registry config>/ which may be different
    • Added benefit of removing the assumption that this was run in the top-level WRF directory altogether

Other files are written out in the same relative locations as before, ./inc, ./frame, and so on. So they do end up now in the CMake build directory which mirrors the generated paths you would see in a Make build without the original source code

@mgduda
Copy link
Collaborator

mgduda commented Jan 11, 2024

I think that sounds reasonable. The placement of generated code in the top-level WRF directory was the primary concern.

If I'm understanding correctly, then, there will be effective no modifications to gen_streams.c, and pre_parse() will check either the dir provided in its argument list or ./Registry?

@islas
Copy link
Collaborator Author

islas commented Jan 11, 2024

Yes, that will be the end result. I'll go ahead and push the changes after checking that it does in fact do what we want for both builds just to be sure.

@weiwangncar
Copy link
Collaborator

The last change passed the regression test.

@mgduda
Copy link
Collaborator

mgduda commented Jan 12, 2024

@islas The new code diff looks good to me. Can you take another look at the PR description to see if we can more clearly explain the purpose and motivation for the changes? A few points that would be worth capturing include:

  1. The registry program writes the io_boilerplate_temporary.inc file into a path relative to where it is being run
  2. Although not available in WRF at the moment, future out of source builds should ideally not modify the WRF source directory
  3. The registry program previously looked for include files in directories relative to the path of the main Registry file
  4. The registry program would therefore not find the io_boilerplate_temporary.inc file in out-of-source builds, since it would be looking in the WRF source directory
  5. The solution is to modify the registry to look for include files in a ./Registry directory relative to working directory where the registry program is being run and also in directories relative to the location of the Registry file.

(Note: I may not have gotten the above details correct, which is all the more reason to document them thoroughly in the PR description.)

@islas
Copy link
Collaborator Author

islas commented Jan 12, 2024

I updated the PR description previously, which may have gotten some of those points already, but aside from point 3, I think the further additional edits I just made should capture the rest. Point 3 is not correct as the search path is whatever is before the Registry file provided on the command line, which could be relative or absolute.

@mgduda
Copy link
Collaborator

mgduda commented Jan 12, 2024

Thanks for the edits. I think the "Problem" description (i.e., "The registry generated code clutters the source development environment with hundreds of files, requiring many .gitignore entries that seem counterintuitive (e,g. ignoring all *.f90 source files)") still seems unrelated to any of the actual changes in the PR. Could you take another pass at that?

@islas
Copy link
Collaborator Author

islas commented Jan 12, 2024

@mgduda Should flow a bit better now

@mgduda
Copy link
Collaborator

mgduda commented Jan 16, 2024

@mgduda Should flow a bit better now

Thanks!

@islas islas merged commit 715df80 into wrf-model:develop Jan 17, 2024
1 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants