-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow more configuration with validate_all #345
Comments
Just to circle back here. I opened #1141 which covers the scenario described in this issue of limiting the amount of objects that are iterated over. I ended up taking a slightly different path mostly because I couldn't quite figure out whether count the items contained within a child using the original proposal. So instead I added a single max number of objects and a recursive flag. Here is what it looks like: In [1]: from pystac_client import Client
...:
...: URL = "https://planetarycomputer.microsoft.com/api/stac/v1"
...: catalog = Client.open(URL)
In [2]: catalog.validate_all(max_n=10)
Out[2]: 10
In [3]: catalog.validate_all(recursive=False, max_n=10)
Out[3]: 10 It would be good to get some 👍s on approach though. |
I think Phil's complex suggestion is the best, with one modification -- I don't think there should be a distinction between def validate_all(self, max_children: Optional[int] = None, max_items: Optional[int] = None):
"""Validate this catalog and all of its children and items, recursively.
- If `max_children=None`, all children will be validated
- If `max_children=0`, no children will be validated (they will still be read because they are needed to get the items)
- If `max_items=None`, all items will be resolved and validated
- If `max_items=0`, no items will be resolved and validated
If both `max_children=0` and `max_items=0`, you should be using `validate` instead.
"""
... |
Agree that descendent catalogs and collections should be handled as one count.
I do have a doubt about what the semantics of max_children should be -- should be the max number of C&Cs to validate, or should be the max number to traverse in the DAG? Reason being is that I had mind that the intention of this was to limit the number of network requests that are made (to something lower than "all of them"), rather than limiting the number that get validated after being retrieved (which is relatively wall clock time cheap operation compared to network I/O). |
Is this solved if I keep the def validate_all(self, max_children: Optional[int] = None, max_items: Optional[int] = None, recursive: bool = True):
"""Validate this catalog and all of its children and items, recursively (as long as recursive is True).
- If `max_children=None`, all children will be validated
- If `max_children=0`, no children will be validated (they will still be read because they are needed to get the items)
- If `max_items=None`, all items will be resolved and validated
- If `max_items=0`, no items will be resolved and validated
- If `recursive=False`, children will not be recursed into- only the child object itself will be validated
If both `max_children=0` and `max_items=0`, you should be using `validate` instead.
"""
... |
I'm not convinced that it is valuable to limit the number of |
Good point, so as long as that's clearly documented. Is there any reason to limit the number of c&c's validated at all in that case? |
Good point, maybe not. This seems nice and clean: def validate_all(self, max_items: Optional[int] = None, recursive: bool = True):
"""Validate this catalog and all of its items.
- If `max_items=None`, all items will be resolved and validated.
- If `max_items=0`, no items will be resolved and validated.
- If `max_items=n`, the first `n` items will be resolved and validated.
- If `recursive=True`, `validate_all` will traverse this Catalog's children and validate them and their items as well.
- If `recursive=True` and `max_items` is a non-zero integer, recursion will stop after `max_items` Items have been validated.
""" |
|
Ok, I like that solution (getting closer to my PR 😄), but I think the number of items within the c&cs should count towards the |
Agreed, I think that's what I'm saying with "If |
I think that is clear |
Thanks for the feedback! I adjusted the PR. |
I am wondering now though if the default should be non-null. If the goal is to prevent an accidental total catalog fetch then a non-null default seems necessary. |
We did this with pystac-client and it made folks (incl Tom) unhappy: stac-utils/pystac-client#237 (comment) I think We could have a warning if we hit some sort of limit but that feels like squeezing a bit too much juice out of the orange to me. |
The Catalog method
validate_all
currently attempts to iterate all child catalogs, collections, and items. For large catalogs, this can take a huge amount of time. I think it would be useful to provide additional parameters to this method to allow more configurability.One possibility is to simply allow a user to only validate certain types of objects, e.g., only catalogs and collections, with:
catalog: bool = True
collection: bool = True
items: bool = True
A user could then do
validate_all(catalog = True, collection = True, items = False)
to only recursively validate the catalogs and collections.A more complex option would be to provide a max count of each type to validate:
catalog: Optional[int] = None
collection: Optional[int] = None
items: Optional[int] = None
This way a user could say validate_all(catalog = None, collection = None, items = 100), and it would validate all their catalog and collection objects, but only validate 100 items. This would give some good validation that the small-c catalog is implemented correctly, but actually finish when there are more than thousands of items.
The text was updated successfully, but these errors were encountered: