Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cosmosis-campaign overriding ini file settings for output filenames #115

Closed
jessmuir opened this issue Jan 3, 2024 · 7 comments
Closed

Comments

@jessmuir
Copy link
Contributor

jessmuir commented Jan 3, 2024

I'm just starting to experiment with using the cosmosis-campaign, so correct me if I'm missing something!

Am I correct in understanding that if I use cosmosis-campaign, the output filename stuff in the [output], [test], etc sections of in ini file will be overridden? If possible, it'd be nice retain an option to not do that. Sometimes it's helpful to have some additional info included in chain filenames, e.g. the name of the fits file containing a 2pt data vector, but I wouldn't necessarily want to include this in every run name in the campaign yaml file.

Having the option to use campaigns while still manually formatting in output filenames would make it a bit easier to adapt existing workflows to the campaign setup.

@joezuntz
Copy link
Owner

joezuntz commented Jan 3, 2024

Hi Jessie - thanks for being an early user of this!

You're right that currently the output file names will be overridden, yes.

I would really recommend adapting to that if possible - would a feature to add arbitrary additional metadata in the output file header serve the same purpose?

But otherwise it's a fairly easy feature to add, but you'd need to be careful to avoid clashing names.

One other option would be a prefix to the auto-generated name that could be included - how about that?

@jessmuir
Copy link
Contributor Author

jessmuir commented Jan 3, 2024

I like the idea of a prefix or something like that. An ideal scenario could be to have some kind of optional setting in the campaign file header (where output directory is specified now?) an optional string (e.g. RUNNAME) that should be replaced with the run name when generating output files. That is to say, if the ini file output test directory name to be prefix_RUNNAME_suffix, and the campaign run is called des3x2_lcdm, you'd end up with prefix_des3x2_lcdm_suffix instead of just des3x2_lcdm.

That being said I think it makes sense to keep the automatically generated names the default, e.g. if the subtitution string isn't provided. My main thought here is that if in a project one were to run some chains using a campaign, and some outside one (I know not ideal, but it could reasonably happen), it'd be nice to be able to keep naming styles consistent. (e.g. for some DES tests we're transitioning to using the campaign setups, but have already run a bunch of things without it)

@joezuntz
Copy link
Owner

joezuntz commented Jan 4, 2024

I've added this feature to version 3.5 of cosmosis. This is out on pypi now and will upload to conda soon. Then I can set up a NERSC version.

You can now do something like this at the top level of the campaign yaml file:

output_name: xxx_{name}_yyy

then if you have a run called baseline the output file will be called xxx_baseline_yyy.txt. It will still be in the output_dir directory.

Hope this is useful!

@joezuntz joezuntz closed this as completed Jan 4, 2024
@jessmuir
Copy link
Contributor Author

jessmuir commented Jan 4, 2024

Awesome, thanks!

If feasible, having a little extra flexibility where this could be overridden for individual runs might be useful.

The use case I'm thinking of is how, for some redundant transparency we often include the data vector fits filename in output chain filenames. Previously for a lot of DES analysis, we set this up by setting up a bash variable $DATA_VECTOR as the string filename in a run-specific .sh script that gets sourced in the job submission script. That variable appear in various places in the ini file, both for setting up the 2pt likelihood and for setting up output filenames.

If one were switching over to a more self-contained campaign-style setup, we'd want to move away from having bash files specifying each run. If I wanted to set up a campaign file with runs chains with the same pipeline but different datavectors, for the actual calculations I assume one can adjust the 2PT_FILE variable in the [DEFAULT] section via lines in the campaign yml file, and add info about the fits file being used to the run name. If I wanted to have the full fits filename in the output chain name, however, I'd have to include this in run names, and the campaign file's run names would get unwieldy. It'd be nice to be able to add some additional info to the output chain filename without making the run names really long.

Is it possible to override the output settings when specifying and individual run in an output file?

None of this is crucial for running with campaigns, especially if a project is starting fresh. It'd just be helpful in context where one is trying to use a campaign to simplify the setup of some tests for an existing project that has already produced outputs without using campaigns.

@joezuntz
Copy link
Owner

joezuntz commented Jan 8, 2024

Sure, this is a one line change to allow it per-run. Testing it now.

@joezuntz joezuntz reopened this Jan 8, 2024
@joezuntz
Copy link
Owner

joezuntz commented Jan 9, 2024

Okay, version 3.5.1 now allows per-run output_name settings. It's propagating to conda-forge now.

@joezuntz joezuntz closed this as completed Jan 9, 2024
@joezuntz
Copy link
Owner

joezuntz commented Jan 9, 2024

Many thanks for being an early user of this @jessmuir ! please do let me know of any other issues you hit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants