-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacement functionality for get_all_items #237
Comments
If a user wants a items = list(item_search.items()) If they want un-PySTAC'd dictionaries: items = list(item_search.items_as_dicts()) After discussion with yourself, @lossyrob, @matthewhanson, and @pjhartzell, it seems like the consensus is that
There isn't a formal policy. I don't know if I agree that
To me, the changes implemented in #206 are API changes, not application changes. To me, an "application change" would be a change to CLI behavior (e.g.). The docs for
While notebooks blur the line between "Python developers" and "application users", you're still writing Python code in a notebook, so I'd say those users are "Python developers." Very happy to be argued out of this position, especially from the "what people actually see via the default warning filters" side of things. |
xref #258, but one thing I didn't appreciate about the 0.4.0 release was that the default And if users are explicitly saying |
@gadomski, @philvarner do either of you have thoughts on the I'm happy to submit a PR updating the behavior of |
I'm still of two minds about which I prefer, but since you have a concrete use-case I'd say a PR would be good, and we can use that PR's discussion to come to a decision (I expect I'll be 👍 because you've got a use-case :-) ). I'll also open PR(s) for my other two proposed changes, and we can use those for discussion on those (since they're a little separate):
and
|
xref stac-utils#237 (comment) Doesn't close that issue, which also deals with the replacement for the deprecated methods. This just reverts the change to the default `max_items` to return all matching items by default.
I think what happened here is we (mostly "I") made two distinct changes that didn't work well together. The overall goal was to change the default behavior seen in get_all_items so that, by default, it wouldn't, for example, try to retrieve millions of items and put them all into memory before returning a value to the caller. I would propose two changes:
|
Thanks for the context @philvarner!
I don't think I agree with this goal: it goes against what I would expect (and presumably what @duckontheweb or @matthewhanson expected when they initially wrote this). If I execute a query I expect to get back all the matching items. Having the ability to easily limit the result set is great, but is a surprising default (to me). If we're worried about the UX of returning a potentially large result set, then there are perhaps other ways to indicate that to the user (logging progress or using tqdm come to mind). If we want to take inspiration from other, perhaps similar contexts, then APIs like https://peps.python.org/pep-0249/ and libraries like https://www.psycopg.org/docs/ come to mind. In that SQL context, the APIs provide separate methods So the analogy to pystac-client would be:
It's worth emphasizing that this library will often be used interactively, and that should inform the design. Doing I'd personally find |
I think fetchall is not a good design. A call to I forgot max_items was in
|
Just to confirm, you're proposing adding a new parameter to
It's not quite the same. |
|
Actually, did the return type of
Close, but to avoid #49 you'd also need to pass In the case of |
get_all_items still returns an ItemCollection, doesn't it? Maybe I'm looking at the wrong code too. So something like:
which does apply clone_items=False. get_all_items would still be deprecated in favor of this method, that's more clearly named and explicit in it's danger. |
Yeah, I was confused.
Makes sense. I can take care of that PR. I really don't think that If I make a Last thing regarding the API That said, perhaps the fact that other parameters to |
Adds an ItemSearch.item_collection method as a replacement for the deprecated `get_all_items()`. A few other notes: 1. I haven't implemented any of the `max_items` changes being worked out in stac-utils#237. That can be done before / after this PR once things are ironed out 2. I changed the `get_all_items` method to use Sphinx's builtin `deprecated` directive. 3. I updated the tests (a bit) I'd like to enable pytest's `-W error` flag to fail on warnings. But it looks like the `cli` is using the (deprecated) `get_all_items_as_dict` method. I haven't touched that in this PR, but I don't think the CLI should be using a deprecated function.
Adds an ItemSearch.item_collection method as a replacement for the deprecated `get_all_items()`. A few other notes: 1. I haven't implemented any of the `max_items` changes being worked out in stac-utils#237. That can be done before / after this PR once things are ironed out 2. I changed the `get_all_items` method to use Sphinx's builtin `deprecated` directive. 3. I updated the tests (a bit) I'd like to enable pytest's `-W error` flag to fail on warnings. But it looks like the `cli` is using the (deprecated) `get_all_items_as_dict` method. I haven't touched that in this PR, but I don't think the CLI should be using a deprecated function.
Voicing strong support for this perspective. The previous behavior matches what users expect; making things unexpected to real use cases in order to protect users from themselves isn't good design pattern to me. Unless there are others who want to voice strong support that we need required protections in this case, I suggest that we move forward with Tom's suggested implementation and revert to the original max_items behavior. |
I’m also in favor of reverting to the old behavior of not limiting the number of Items returned. I got bitten by this recently when I was aggregating response geometries and it took me a long time to figure out why they were not adding up. I think the possibility of using up a bunch of resources by blindly calling this on a large response set it just one of the dangers of working with big data and is best managed by the user. |
(1) use of ItemCollection in odcstac and stackstac Earlier Tom said:
The odc.stac.load method takes an Interable[Item] (https://github.com/opendatacube/odc-stac/blob/1db2b1a7a10fb1a1fbd1bfee0a44b50fee22e911/odc/stac/_load.py#L203):
An ItemCollection is also an Interable[Item] because it implements I checked and there aren't any uses of ItemCollection in odc_stac. In stackstac, the stack() method accepts a lot of types, but not Sequence[pystac.Item]:
I've filed an issue requesting support for that: (2) ItemCollection mangles Items in default constructor Filed an issue to change the default to cloning rather than not, since correct behavior is better than speed by default, and this has bitten nearly everyone who's used it. (3) moving max_items out of search() method One other issue we have with the method is that the search() method has a bunch of parameters that configure things that aren't actually part of the search query. For example, max_items is a behavior on getting the results of the search rather than the query. We should move those out into the accessor methods rather than having them in search |
Adds an ItemSearch.item_collection method as a replacement for the deprecated `get_all_items()`. A few other notes: 1. I haven't implemented any of the `max_items` changes being worked out in #237. That can be done before / after this PR once things are ironed out 2. I changed the `get_all_items` method to use Sphinx's builtin `deprecated` directive. 3. I updated the tests (a bit) I'd like to enable pytest's `-W error` flag to fail on warnings. But it looks like the `cli` is using the (deprecated) `get_all_items_as_dict` method. I haven't touched that in this PR, but I don't think the CLI should be using a deprecated function.
Filed an issue to default max_items back to None |
* Restore default max_items behavior xref #237 (comment) Doesn't close that issue, which also deals with the replacement for the deprecated methods. This just reverts the change to the default `max_items` to return all matching items by default. * vcr * alias
I think everything here is addressed on main. Thanks everyone. |
(Followup to the deprecations from #206 which I haven't tested yet, so apologies if I missed something)
Over in https://github.com/microsoft/planetarycomputerexamples we use
search.get_all_items()
to get a singleItemCollection
, which is then handed off tostackstac
orodc.stac
. Our use case requires having a concrete sequence of items ahead of time, not an iterator / iterable. What's the recommended replacement API? Something likeI have a couple concerns with this replacement, if it's the recommended alternative:
clone_items=False
and now our users will get slower results until they profile things.Given how common this use case is (at least in my corner of the world), I think it makes sense to keep a method on the
search
object that returns a concreteItemCollection
.As an aside, I don't actually see a warning when running the old code on the Planetary Computer Hub using pystac 0.4.0, since pystac-client is emitting a
DeprecationWarning
. It's changed over the years, butDeprecationWarning
isn't always visible to the end user. I don't know if pystac-client has a formal policy, but pandas uses either aFutureWarning
for deprecations that will affect user-code, andDeprecationWarning
for code that will likely affect library code (which is then upgraded to aFutureWarning
in a future release).The text was updated successfully, but these errors were encountered: