Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path parameter: raise if path not found on parameter instantiation and add check_exists attribute #800

Merged
merged 11 commits into from
Jul 31, 2023

Conversation

maximlt
Copy link
Member

@maximlt maximlt commented Jul 26, 2023

Fixes #304

Changes introduced:

  • instantiating a Path parameter with a path that can't be resolved raises an error. This was a strange behavior as discussed today with @philippjfr
  • Path gets a new check_exists attribute that is True by default (previous behavior) and that when set to False allows to set a non materialized path

TODO:

  • Add more tests
  • Document
  • Nice to have: more validation of the default/value when check_exists is False? I'm not sure to which extent I can validate whether a path would be a valid materialized path, I've found nothing that can do that in os and this stackoverflow indicates it's not that straightforward.

I'm interested in feedback on naming that attribute and its default. Also I wonder whether __get__ should resolve?

image

@maximlt maximlt requested a review from jbednar July 26, 2023 20:12
@jlstevens
Copy link
Contributor

Having a flag to control whether the path is checked when the parameter is declared makes sense to me. Don't like the name notfound_ok though, so how about something like check_exists?

I think __get__ should resolve the path if the file exists at a later point: this seems to be one of the reasons such a parameter is useful (otherwise it is really just a string with some slashes in it).

@maximlt maximlt changed the title Path parameter: raise if path not found on parameter instantiation and add notfound_ok attribute Path parameter: raise if path not found on parameter instantiation and add check_exists attribute Jul 27, 2023
@maximlt
Copy link
Member Author

maximlt commented Jul 27, 2023

Alright I have renamed the attribute as check_exists with a default of True (backwards compatible behavior) in 0b1825b.

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I posted this a day ago, but I don't see it, so posting again.

Nice to have: more validation of the default/value when notfound_ok is True? I'm not sure to which extent I can validate whether a path would be a valid materialized path, I've found nothing that can do that in os and this stackoverflow indicates it's not that straightforward.

Yes, it would be nice to error if the provided path is not acceptable, even if it doesn't have to exist. The is_path_exists_or_creatable() function in that SO post seems like it would do it, or we could have a simpler one that simply strips off the filename part and checks that the basepath already exists.

notfound_ok

I find that spelling pretty confusing. What about "allow_not_yet_existing_file"? Also, should we support "allow_existing_file?", for balance? I.e. a user may want the file not to exist (at least not when first checked) so that it can be used as an output file, or they may want the file to exist, so that it can be used as an input file, or they may not care (if it's e.g. an output file that they expect to overwrite each time). With that in mind you could also consider making it a keyword instead (exists='allowed', 'required', 'not_allowed').

I think most people will use this type for input files (since dashboards always need inputs and only sometimes have filesystem outputs), so a default of 'required' makes sense to me.

I wonder whether get should resolve?

The comments in param/__init__.py :

  • use resolve_path(path_to_file=True) for paths to existing files to be read,
  • use resolve_path(path_to_file=False) for paths to existing folders to be read, and normalize_path() for paths to new files to be written.

I think that means __get__ should use 'resolve_path' when exists='required' and otherwise 'normalize_path'.

@maximlt
Copy link
Member Author

maximlt commented Jul 28, 2023

The attribute had been already updated to the better check_exists name, maybe it'll change again!

Thinking about the various options, it seems to me we cover all the cases by adding the new option check_exists=False:

  • with check_exists=True - the default value that preserves backwards compatibility - you can create a Path parameter for an input with a default existing path param.Path('/path/to/existing') or without a default if you don't have any to provide with param.Path().
  • with check_exists=False you can create a Path parameter for an output with a default non-existing path param.Path('/path/to/nonexisting') or without a default if you don't have any to provide with param.Path().

user may want the file not to exist (at least not when first checked) so that it can be used as an output file

Presumably if you declare that you want a file not to exist, that is so that you can create it at some point later, in which case after you create it accessing the Parameter value will fail:

import param

class P(param.Parameterized):

    file = param.Filename('does/not/exist', exists='not_happy_when_it_exists')

    def save(self):
        with open(self.file, 'w') as f:
            f.write('holoviz')
        logger.log('File saved in %s', self.file)  # BOOM!

I don't really want to put more into this PR than what was required in the issue, which was to allow a non existing filename, unless we have some concrete use case in mind (I'm trying to release Param 2.0!). In the case we'd add support for more use cases in the future, we could indeed rename the attribute exists and start with just two options: 'always' (current, avoiding 'required' as Param may introduce the notion of required argument on instantiation #724) and 'ignore/optional'.


Interestingly I have found out that Path actually supports pathlib.Path objects!

@maximlt
Copy link
Member Author

maximlt commented Jul 30, 2023

Well I tried a few things and circled back to Jean-Luc's suggestion with check_exists that handles the two simple cases being 1) the path must exist (True, default and backwards compatible) and 2) the path can exist (False). I think it'd still be possible to extend the values accepted by this attribute in the future if that is requested.

As for validating the path in the check_exists=False case, I'm not super enthusiastic about adding even part of the code shared on the SO post. I guess I can be convinced otherwise...

What I've done though is add some basic type validation (whatever the value of check_exists), expecting the value to be of type str or pathlib.Path (the latter being surprisingly supported!).

This is ready for review again @jbednar and @jlstevens.

param/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@philippjfr philippjfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and thanks for all the tests. I agree with the goal of getting this merged asap and perhaps opening an issue about improving the validation in the check_exists=False case.

@philippjfr philippjfr merged commit f5208aa into main Jul 31, 2023
@philippjfr philippjfr deleted the path_notfoundok branch July 31, 2023 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filename parameter should allow non-existent files
4 participants