Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalise_hrefs causing unwanted Item asset href changes. #801

Closed
MitchellPaff opened this issue May 2, 2022 · 7 comments · Fixed by #984
Closed

normalise_hrefs causing unwanted Item asset href changes. #801

MitchellPaff opened this issue May 2, 2022 · 7 comments · Fixed by #984
Assignees
Labels
bug Things which are broken
Milestone

Comments

@MitchellPaff
Copy link

Hello.
I am seeing some weird behaviour that I hope you can help me with.

I've made an example in this repository here.
In this example I am attempting to add a STAC catalog 'catalog_1' as a child to the root catalog.json.

If you clone that repo and run main.py, you will see the files are modified and the new child and parent relationship is added to the links object of each STAC file successfully.

However if you look at line 27 of the modified Item STAC file '/example/catalog_1/collection_1/item_1/item_1.json'.
You will see that the href to the asset has been updated and now points to a non-existent asset outside of this directory.

Any help on this issue would be greatly appreciated.
Thanks.

@duckontheweb
Copy link
Contributor

Thanks for the report and the excellent reproducible example @MitchellPaff ! I was able to reproduce this using your example and I'll try to have a fix in the 1.5.0 release that we're putting out soon.

@duckontheweb duckontheweb added the bug Things which are broken label May 12, 2022
@duckontheweb duckontheweb added this to the 1.5.0 milestone May 12, 2022
@duckontheweb duckontheweb self-assigned this May 12, 2022
@duckontheweb
Copy link
Contributor

So, in this case the root cause is an invalid self link in example/catalog_1/collection_1/item_1/item_1.json. The self link in that file is:

{
  "rel": "self",
  "href": "feature.json",
  "type": "application/json"
}

Per the STAC spec docs on Relation Types self links must be an absolute URL to the Item. When PySTAC resolves the link to that Item from the parent collection, it calls Item.set_self_href on the Item. As part of this, it loops over all of the Item's Assets and updates the href property to be relative to the Item's new self HREF, if it is not already absolute. The first thing it does is get an absolute HREF for the Asset based on the Item's self HREF, which is where the problem is coming in here.

The Item has a self HREF of feature.json and PySTAC interprets this as an absolute HREF (e.g. /feature.json on my Mac). PySTAC then constructs an absolute HREF for the Asset based on the existing relative HREF (in this case asset_1). This absolute HREF ends up being something like /asset_1. It then gets the relative HREF to the asset from the Item's new self HREF that it got from the item link here. On my system, that path is something like /Users/my_user/Code/stac_example/example/catalog_1/collection_1/item_1/item1.json, so the relative path from that Item to what PySTAC thinks is the Asset HREF is something like ../../../../../../../../asset_1.

To be honest, I'll need to think a bit more about what the right course of action is. On the one hand, PySTAC is correctly handling that Item's self HREF according to the spec. On the other hand, this kind of issue is really hard to debug and it would be nice if we at least had some mechanism for reporting when the self HREF is clearly incorrect. I'm just not sure how to detect that reliably...

MitchellPaff pushed a commit to linz/geostore that referenced this issue May 26, 2022
In order to fix pystac issue stac-utils/pystac#801

Multi-line description of commit,
feel free to be detailed.

[Ticket: X]
kodiakhq bot pushed a commit to linz/geostore that referenced this issue Jun 1, 2022
* fix: Remove absolute self url's from STAC objects in Geostore

In order to fix pystac issue stac-utils/pystac#801

Multi-line description of commit,
feel free to be detailed.

[Ticket: X]

* refactor: reduce test and optimise copy job

Co-authored-by: Victor Engmark <[email protected]>
@duckontheweb
Copy link
Contributor

@gadomski @lossyrob Do you have any opinions on what the best course of action would be here?

@gadomski
Copy link
Member

gadomski commented Jun 8, 2022

PySTAC interprets this as an absolute HREF (e.g. /feature.json on my Mac)

I think this is incorrect. When creating hrefs, a relative self href should either:

  1. cause an error, or
  2. become absolute via make_absolute_path (which uses the current working directory via abspath), probably with a warning

This goes back to the question of "is PySTAC a strict implementation of the specification, or a tool to work with the specification in a user-friendly way"? If it's a strict implementation, option 1; otherwise, option 2.

Interested in @philvarner's thoughts as well.

@duckontheweb
Copy link
Contributor

This goes back to the question of "is PySTAC a strict implementation of the specification, or a tool to work with the specification in a user-friendly way"? If it's a strict implementation, option 1; otherwise, option 2.

There are a few places in the code (e.g. accepting single lists as bboxes in Collections extents) that suggest to me that we are leaning towards the latter approach. I'll implement approach 2. and see if there are any unintended consequences that we need to work out.

@duckontheweb
Copy link
Contributor

I think I'll need a bit more time to put together a solution and some tests to make sure we're actually getting the behavior we want out of this. I'm going to move this into a future release for now so we don't hold up 1.5.

@gadomski
Copy link
Member

Thanks again for the excellent reproducible example! I used it to confirm that #984 fixes the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things which are broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants