-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime network loading of FIELDS_JSON_URL and mixed type enum #400
Comments
@volaya @lossyrob Apologies for not thinking through this more during the PR review, but I think I'm in agreement that we should use a fields.json file that is included with this library by default. Using the remote file means that there's a chance that it could be updated in a way that either breaks or changes functionality in this library (e.g. if it is updated for a new extension version that we have not yet added support for). I think we are also starting to get more users for whom internet connectivity can be an issue. It would be good to allow them to do as much work as possible offline and to make it very transparent where we are making network requests in our code to make debugging easier. |
I thought we were pointing to an immutable release of the file? If not, I think we should ask the project to publish releases and point to a specific released version, as we are with schemas. We're already pulling in json schemas for the validation functionality, so I don't think we'd be buying too much in the case of connectivity by pulling in this file and maintaining it separately - which would increase maintenance burden. I do think being clear about when network access occurs and keeping it out of core functionality is a good practice. Also this is a single file as opposed to the linked set of files for json schemas, so it wouldn't be too bad to bake the released json file into the library. But I'm not sure the duplication and distribution of external files is a good pattern to start using generally. |
The fields schema URL is
We are, and in that case it would be totally unreasonable to try to maintain those files within PySTAC. As you mentioned, it wouldn't be as much of a burden here since it's just one file, but I agree it's not ideal to replicate an external file. I'm not sure there's a clear best path here, both options seem less than ideal for different reasons. I think versioning and minifying the schema would make me +0 on either approach. One other possibility would be to create a local file cache for these requests. This would only work if the fields schema was versioned, but the code could check that the cache version exists and matches the version in the URL and skip the network request if it does. We might also consider using this option for validation to help folks with poor connectivity. Downsides of this are the added complexity and increased local storage, but maybe we could make this optional? |
Maybe I did not understand the PR correctly, but wasn't it implemented in a way that it's simple to replace the online version with a local one? fields.json will be versioned soon, see stac-utils/stac-fields#12, but I need to reach 1.0.0 first. Are the white spaces in the JSON file really an issue, @schwehr? |
I don't care about empty spaces. I do care that the version for this file doesn't move with pystac and the pystac likes to do surprise network access. I was only thinking that keeping the essential parts makes it simpler for folks to follow, but having a full copy in git seems reasonable. I currently patch out the code that does network access in my local copy of pystac. The idea of doing a fetch by hand of the file and patching that into my repo isn't appealing. |
@m-mohr @schwehr My suggestion of minifying the JSON file was based on this point in the original issue description and the fact that removing whitespace reduces the file size from 24089 bytes to 12319 bytes. If that is not a concern, then I do not see a need for removing whitespace and I can strike that from stac-utils/stac-fields#12. |
Back to the loading: Didn't the PR had some code that allowed to override the fields.json with a custom one? In that case @schwehr could just set that and no request would be made and keeping the file up to date would need to be achieved on the user side, which should then avoid surprises. That's how I understood it... |
There are a number of issues not having the file in the pystac gut repo. To pick just one: This doesn't help someone who is trying bisect a bug. If they use even a local cache of the file, they will then have pystac behavior that doesn't change with where they are in the git history. If the bug or testing behavior ends up being dependent on that file, bisect tools may be unable to find when the bug started. |
It would make sense to me to pull this json blob in on release and then package it up as part of pystac. If that sounds like a reasonable approach then I can take a crack at implementation. |
@volaya @duckontheweb @lossyrob and #264
Introduced here pystac/summaries.py#L40, FIELDS_JSON_URL causes the default path to go load the field definitions from a remote file. I am not a fan of having an additional code path (there was already the validation schema fetches) outside of testing:
I would much rather have a script to rebuild a python file that contains just the resulting table that is only the parts used here.
Currently it's this:
Additionally, having an enum of mixed types is likely to bite us down the road. An importer should translate all of those strategies into single character strings. Especially with True vrs 3 other strings that eval to
True
. It's not obvious here what the default even is.Also, the above uses
DONT
fordon't
. That's hard on some users. Please useDO_NOT_SUMMARIZE
or something else more readable. I generally suggest avoiding contractions in code even when it is just in comments.The text was updated successfully, but these errors were encountered: