Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default ItemSearch max_items to 100, and make its use more prominent #208

Merged
merged 3 commits into from
Jun 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 38 additions & 30 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,26 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased] - TBD

### Added

- lru_cache to several methods [#167](https://github.com/stac-utils/pystac-client/pull/167)
- Direct item GET via ogcapi-features, if conformant [#166](https://github.com/stac-utils/pystac-client/pull/166)
- `py.typed` for downstream type checking [#163](https://github.com/stac-utils/pystac-client/pull/163)

### Changed

- Item Search no longer defaults to returning an unlimited number of result Items from
its "items" methods. The `max_items` parameter now defaults to 100 instead of None.
Since the `limit` parameter also defaults to 100, in an ideal situation, only one request
will be made to the server to retrieve all 100 items. Both of these parameters can be
carefully adjusted upwards to align with the server's capabilities and the expected
number of search results. [#208](https://github.com/stac-utils/pystac-client/pull/208)
gadomski marked this conversation as resolved.
Show resolved Hide resolved
- Better error message when trying to search a non-item-search-conforming catalog [#164](https://github.com/stac-utils/pystac-client/pull/164)
- Search `filter-lang` defaults to `cql2-json` instead of `cql-json` [#169](https://github.com/stac-utils/pystac-client/pull/169)
- Search `filter-lang` will be set to `cql2-json` if the `filter` is a dict, or `cql2-text` if it is a string [#169](https://github.com/stac-utils/pystac-client/pull/169)
- Search parameter `intersects` is now typed to only accept a str, dict, or object that implements `__geo_interface__` [#169](https://github.com/stac-utils/pystac-client/pull/169)


### Deprecated

- Item Search methods `get_items()` and `get_item_collections()` have been renamed to
Expand All @@ -19,29 +39,17 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
exhaustion of all available memory. The iterator methods `items()` or
`item_collections()` should be used instead. [#206](https://github.com/stac-utils/pystac-client/pull/206)

### Added

- lru_cache to several methods [#167](https://github.com/stac-utils/pystac-client/pull/167)
- Direct item GET via ogcapi-features, if conformant [#166](https://github.com/stac-utils/pystac-client/pull/166)
- `py.typed` for downstream type checking [#163](https://github.com/stac-utils/pystac-client/pull/163)

### Changed
## Removed

- Better error message when trying to search a non-item-search-conforming catalog [#164](https://github.com/stac-utils/pystac-client/pull/164)
- Search `filter-lang` defaults to `cql2-json` instead of `cql-json` [#169](https://github.com/stac-utils/pystac-client/pull/169)
- Search `filter-lang` will be set to `cql2-json` if the `filter` is a dict, or `cql2-text` if it is a string [#169](https://github.com/stac-utils/pystac-client/pull/169)
- Search parameter `intersects` is now typed to only accept a str, dict, or object that implements `__geo_interface__` [#169](https://github.com/stac-utils/pystac-client/pull/169)
- Client parameter `require_geojson_link` has been removed. [#169](https://github.com/stac-utils/pystac-client/pull/169)

### Fixed

- Search sortby parameter now has correct typing and correctly handles both GET and POST JSON parameter formats. [#175](https://github.com/stac-utils/pystac-client/pull/175)
- Search fields parameter now has correct typing and correctly handles both GET and POST JSON parameter formats. [#184](https://github.com/stac-utils/pystac-client/pull/184)
- Use pytest configuration to skip benchmarks by default (instead of a `skip` mark) [#168](https://github.com/stac-utils/pystac-client/pull/168)

## Removed

- Client parameter `require_geojson_link` has been removed. [#169](https://github.com/stac-utils/pystac-client/pull/169)

## [v0.3.5] - 2022-05-26

### Fixed
Expand Down Expand Up @@ -80,18 +88,18 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [v0.3.1] - 2021-11-17

### Changed
- Update min PySTAC version to 1.2
- Default page size limit set to 100 rather than relying on the server default
- Fetch single collection directly from endpoint in API rather than iterating through children (Issue #114)[https://github.com/stac-utils/pystac-client/issues/114]

### Added

- Adds `--block-network` option to all test commands to ensure no network requests are made during unit tests
[#119](https://github.com/stac-utils/pystac-client/pull/119)
- `parameters` argument to `StacApiIO`, `Client.open`, and `Client.from_file` to allow query string parameters to be passed to all requests
[#118](https://github.com/stac-utils/pystac-client/pull/118)

### Changed
- Update min PySTAC version to 1.2
- Default page size limit set to 100 rather than relying on the server default
- Fetch single collection directly from endpoint in API rather than iterating through children (Issue #114)[https://github.com/stac-utils/pystac-client/issues/114]

### Fixed

- `Client.get_collections` raised an exception when API did not publish `/collections` conformance class instead of falling back to using child links
Expand Down Expand Up @@ -144,13 +152,6 @@ are in a single HTTP session, handle pagination and respects conformance
[#79](https://github.com/stac-utils/pystac-client/pull/79)
- Improved logging for GET requests (prints encoded URL)

### Fixed

- Running `stac-client` with no arguments no longer raises a confusing exception [#52](https://github.com/stac-utils/pystac-client/pull/52)
- `Client.get_collections_list` [#44](https://github.com/stac-utils/pystac-client/issues/44)
- The regular expression used for datetime parsing [#59](https://github.com/stac-utils/pystac-client/pull/59)
- `Client.from_file` now works as expected, using `Client.open` is not required, although it will fetch STAC_URL from an envvar

### Removed

- `get_pages` and `simple_stac_resolver` functions from `pystac_client.stac_io` (The new StacApiIO class understands `Link` objects)
Expand All @@ -164,22 +165,29 @@ are in a single HTTP session, handle pagination and respects conformance
- STAC_URL environment variable in Client.open(). url parameter in Client is now required
- STAC_URL environment variable in CLI. CLI now has a required positional argument for the URL

### Fixed

- Running `stac-client` with no arguments no longer raises a confusing exception [#52](https://github.com/stac-utils/pystac-client/pull/52)
- `Client.get_collections_list` [#44](https://github.com/stac-utils/pystac-client/issues/44)
- The regular expression used for datetime parsing [#59](https://github.com/stac-utils/pystac-client/pull/59)
- `Client.from_file` now works as expected, using `Client.open` is not required, although it will fetch STAC_URL from an envvar

## [v0.1.1] - 2021-04-16

### Added

- `ItemSearch.items_as_collection` [#37](https://github.com/stac-utils/pystac-client/pull/37)
- Documentation [published on ReadTheDocs](https://pystac-client.readthedocs.io/en/latest/) [#46](https://github.com/stac-utils/pystac-client/pull/46)

### Changed

- CLI: pass in heades as list of KEY=VALUE pairs

### Fixed

- Include headers in STAC_IO [#38](https://github.com/stac-utils/pystac-client/pull/38)
- README updated to reflect actual CLI behavior

### Changed

- CLI: pass in heades as list of KEY=VALUE pairs

## [v0.1.0] - 2021-04-14

Initial release.
Expand Down
7 changes: 5 additions & 2 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,13 +97,16 @@ specific STAC API (use the root URL):

from pystac_client import Client

catalog_client = Client.open("https://earth-search.aws.element84.com/v0")
client = Client.open("https://earth-search.aws.element84.com/v0")

Create a search:

.. code-block:: python

my_search = catalog_client.search(collections=['sentinel-s2-l2a-cogs'], bbox=[-72.5,40.5,-72,41], max_items=10)
my_search = client.search(
max_items=10,
collections=['sentinel-s2-l2a-cogs'],
bbox=[-72.5,40.5,-72,41])
print(f"{mysearch.matched()} items found")

The ``items()`` generator method can be used to iterate through all resulting items.
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/cql2-filter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,10 @@
"}\n",
"\n",
"params = {\n",
" \"max_items\": 100,\n",
" \"collections\": \"landsat-8-c2-l2\",\n",
" \"intersects\": geom,\n",
" \"datetime\": \"2018-01-01/2020-12-31\",\n",
" \"max_items\": 100,\n",
"}\n",
"\n",
"import hvplot.pandas\n",
Expand Down
8 changes: 3 additions & 5 deletions docs/tutorials/pystac-client-introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -166,17 +166,15 @@
"\n",
"# limit sets the # of items per page so we can see multiple pages getting fetched\n",
"search = cat.search(\n",
" max_items = 15,\n",
" limit = 5,\n",
" collections = \"aster-l1t\",\n",
" intersects = geom,\n",
" datetime = \"2000-01-01/2010-12-31\",\n",
" max_items = 15,\n",
" limit = 5\n",
")\n",
"\n",
"# PySTAC ItemCollection\n",
"items = search.get_all_items()\n",
"items = list(search.items())\n",
"\n",
"# Dictionary (GeoJSON FeatureCollection)\n",
"item_json = items.to_dict()\n",
"\n",
"len(items)"
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/stac-metadata-viz.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -103,13 +103,13 @@
"\n",
"# limit sets the # of items per page so we can see multiple pages getting fetched\n",
"search = cat.search(\n",
" max_items = 50,\n",
" collections = \"aster-l1t\",\n",
" intersects = geom,\n",
" datetime = \"2000-01-01/2010-12-31\",\n",
" max_items = 50\n",
")\n",
"\n",
"items = search.get_all_items()\n",
"items = list(search.items())\n",
"len(items)"
]
},
Expand Down
2 changes: 1 addition & 1 deletion docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,9 @@ requests to a service's "search" endpoint. This method returns a
>>> from pystac_client import Client
>>> api = Client.from_file('https://planetarycomputer.microsoft.com/api/stac/v1')
>>> results = api.search(
... max_items=5
... bbox=[-73.21, 43.99, -73.12, 44.05],
... datetime=['2019-01-01T00:00:00Z', '2019-01-02T00:00:00Z'],
... max_items=5
... )

Instances of :class:`~pystac_client.ItemSearch` have 2 methods for iterating
Expand Down
93 changes: 47 additions & 46 deletions pystac_client/item_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@

OP_MAP = {">=": "gte", "<=": "lte", "=": "eq", ">": "gt", "<": "lt"}

DEFAULT_LIMIT_AND_MAX_ITEMS = 100


# from https://gist.github.com/angstwad/bf22d1822c38a92ec0a9#gistcomment-2622319
def dict_merge(
Expand Down Expand Up @@ -108,37 +110,52 @@ class ItemSearch:
`STAC API - Item Search spec
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search>`__.

No request is sent to the API until a function is called to fetch or iterate
through the resulting STAC Items,
either the :meth:`ItemSearch.item_collections` or :meth:`ItemSearch.items`
method is called and iterated over.
No request is sent to the API until a method is called to iterate
through the resulting STAC Items, either :meth:`ItemSearch.item_collections`,
:meth:`ItemSearch.items`, or :meth:`ItemSearch.items_as_dicts`.

All "Parameters", with the exception of ``max_items``, ``method``, and
``url`` correspond to query parameters
All parameters except `url``, ``method``, ``max_items``, ``stac_io``, and ``client``
correspond to query parameters
described in the `STAC API - Item Search: Query Parameters Table
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search#query-parameter-table>`__
docs. Please refer
to those docs for details on how these parameters filter search results.

Args:
url : The URL to the item-search endpoint
url: The URL to the root / landing page of the STAC API
implementing the Item Search endpoint.
method : The HTTP method to use when making a request to the service.
This must be either ``"GET"``, ``"POST"``, or
``None``. If ``None``, this will default to ``"POST"`` if the
``intersects`` argument is present and ``"GET"``
if not. If a ``"POST"`` request receives a ``405`` status for
the response, it will automatically retry with a
``"GET"`` request for all subsequent requests.
max_items : The maximum number of items to return from the search. *Note
that this is not a STAC API - Item Search
parameter and is instead used by the client to limit the total number
of returned items*.
limit : The maximum number of items to return *per page*. Defaults to
``None``, which falls back to the limit set
by the service.
bbox: May be a list, tuple, or iterator representing a bounding box of 2D
or 3D coordinates. Results will be filtered
``None``. If ``None``, this will default to ``"POST"``.
If a ``"POST"`` request receives a ``405`` status for
the response, it will automatically retry with
``"GET"`` for all subsequent requests.
max_items : The maximum number of items to return from the search, even
if there are more matching results. This client to limit the
total number of Items returned from the :meth:`items`,
:meth:`item_collections`, and :meth:`items_as_dicts methods`. The client
will continue to request pages of items until the number of max items is
reached. This parameter defaults to 100. Setting this to ``None`` will
allow iteration over a possibly very large number of results.
stac_io: An instance of StacIO for retrieving results. Normally comes
from the Client that returns this ItemSearch client: An instance of a
root Client used to set the root on resulting Items.
client: An instance of Client for retrieving results. This is normally populated
by the client that returns this ItemSearch instance.
limit: A recommendation to the service as to the number of items to return
*per page* of results. Defaults to 100.
ids: List of one or more Item ids to filter on.
collections: List of one or more Collection IDs or :class:`pystac.Collection`
instances. Only Items in one
of the provided Collections will be searched
bbox: A list, tuple, or iterator representing a bounding box of 2D
or 3D coordinates. Results will be filtered
to only those intersecting the bounding box.
intersects: A string or dictionary representing a GeoJSON geometry, or
an object that implements a
``__geo_interface__`` property, as supported by several libraries
including Shapely, ArcPy, PySAL, and
geojson. Results filtered to only those intersecting the geometry.
datetime: Either a single datetime or datetime range used to filter results.
You may express a single datetime using a :class:`datetime.datetime`
instance, a `RFC 3339-compliant <https://tools.ietf.org/html/rfc3339>`__
Expand Down Expand Up @@ -174,17 +191,7 @@ class ItemSearch:
``2017-06-01T00:00:00Z/2017-07-31T23:59:59Z``
- ``2017-06-10/2017-06-11`` expands to
``2017-06-10T00:00:00Z/2017-06-11T23:59:59Z``
intersects: A string or dictionary representing a GeoJSON geometry, or
an object that implements a
``__geo_interface__`` property as supported by several libraries
including Shapely, ArcPy, PySAL, and
geojson. Results filtered to only those intersecting the geometry.
ids: List of Item ids to return. All other filter parameters that further
restrict the number of search results
(except ``limit``) are ignored.
collections: List of one or more Collection IDs or :class:`pystac.Collection`
instances. Only Items in one
of the provided Collections will be searched

query: List or JSON of query parameters as per the STAC API `query` extension
filter: JSON of query parameters as per the STAC API `filter` extension
filter_lang: Language variant used in the filter body. If `filter` is a
Expand All @@ -194,33 +201,27 @@ class ItemSearch:
fields: A list of fields to include in the response. Note this may
result in invalid STAC objects, as they may not have required fields.
Use `items_as_dicts` to avoid object unmarshalling errors.
max_items: The maximum number of items to get, even if there are more
matched items.
method: The http method, 'GET' or 'POST'
stac_io: An instance of StacIO for retrieving results. Normally comes
from the Client that returns this ItemSearch client: An instance of a
root Client used to set the root on resulting Items
"""

def __init__(
self,
url: str,
*,
limit: Optional[int] = 100,
bbox: Optional[BBoxLike] = None,
datetime: Optional[DatetimeLike] = None,
intersects: Optional[IntersectsLike] = None,
method: Optional[str] = "POST",
max_items: Optional[int] = DEFAULT_LIMIT_AND_MAX_ITEMS,
stac_io: Optional[StacIO] = None,
client: Optional["Client"] = None,
limit: Optional[int] = DEFAULT_LIMIT_AND_MAX_ITEMS,
ids: Optional[IDsLike] = None,
collections: Optional[CollectionsLike] = None,
bbox: Optional[BBoxLike] = None,
intersects: Optional[IntersectsLike] = None,
datetime: Optional[DatetimeLike] = None,
query: Optional[QueryLike] = None,
filter: Optional[FilterLike] = None,
filter_lang: Optional[FilterLangLike] = None,
sortby: Optional[SortbyLike] = None,
fields: Optional[FieldsLike] = None,
max_items: Optional[int] = None,
method: Optional[str] = "POST",
stac_io: Optional[StacIO] = None,
client: Optional["Client"] = None,
):
self.url = url
self.client = client
Expand Down