-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update wms harvester to be able to save to geonode #884
Update wms harvester to be able to save to geonode #884
Conversation
@meomancer this work is more complicated than what it may seem, since the ultimate goal here is to replace the current remote services functionality. This means that we will need to modify Please interact with @afabiani and @giohappy in order to plan how this may be done before implementing the code, since @afabiani is likely to have already an idea of how this should be done |
74b5d4e
to
84cdcbb
Compare
) -> typing.Optional[base.HarvestedResourceInfo]: | ||
resource_unique_identifier = harvestable_resource.unique_identifier | ||
data = self._get_data() | ||
for layer in data['layers']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would make the code more readable if instead of this nested for
loop with an if
inside of it (adds a lot of visual noise due to the indentation) you would get the relevant layer first and then process it. Something like:
try:
relevant_layer = [layer for layer in data["layers"] if layer["name"] == resource_unique_identifier][0]
except IndexError:
result = None
logger.exception(f"Could not find resource {resource_unique_identifier!r}")
else:
# now do stuff with the `relevant_layer`
Also, in my opinion (but I guess this is a hot topic) it would be preferable to use a singe return
statement, instead of multiple returns - I think it makes the code easier to read.
root = etree.fromstring(get_capabilities_response.content, parser=XML_PARSER) | ||
nsmap = _get_nsmap(root.nsmap) | ||
useful_layers_elements = [] | ||
|
||
# get layers from element tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this type of comment from the code - it is not adding any value, since it is already clear what is going on below.
Comments are very useful to explain some context about the code or explain some harder to understand particularities. But if they are over used they end up being just visual noise.
bbox_elements = get_xpath_elements( | ||
layer_element, "wms:EX_GeographicBoundingBox", nsmap)[0] | ||
left_x = get_xpath_value( | ||
bbox_elements, "wms:westBoundLongitude", nsmap) | ||
right_x = get_xpath_value( | ||
bbox_elements, "wms:eastBoundLongitude", nsmap) | ||
lower_y = get_xpath_value( | ||
bbox_elements, "wms:southBoundLatitude", nsmap) | ||
upper_y = get_xpath_value( | ||
bbox_elements, "wms:northBoundLatitude", nsmap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XPath expressions can be nested, so you could simply do
left_x = get_xpath_value(
layer_element, "wms:EX_GeographicBoundingBox/wms:westBoundLongitude", nsmap)
# similar for right_x, lower_y, upper_y
It seems there is no need for using the get_xpath_elements()
function in this case
def get_xpath_value( | ||
element: etree.Element, | ||
xpath_expression: str, | ||
nsmap: typing.Optional[dict] = None | ||
) -> typing.Optional[str]: | ||
if not nsmap: | ||
nsmap = element.nsmap | ||
values = element.xpath(f"{xpath_expression}//text()", namespaces=nsmap) | ||
return "".join(values).strip() or None | ||
|
||
|
||
def get_xpath_elements( | ||
element: etree.Element, | ||
xpath_expression: str, | ||
nsmap: typing.Optional[dict] = None | ||
) -> typing.List[etree.Element]: | ||
if not nsmap: | ||
nsmap = element.nsmap | ||
elements = element.xpath(f"{xpath_expression}", namespaces=nsmap) | ||
return elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated above, it seems like the get_xpath_elements()
function may not be necessary in this case. If you are only using it as an intermediary in order to then call get_xpath_value()
on its result, then there is no need for that, as XPath allows querying elements nested deep inside the node tree. Otherwise, if you have a valid use case for it I won't mind having it.
I'm just trying to see if you can remove code instead of adding more - the more code we have, the harder it is for people to maintain it in the future.
keywords = get_xpath_elements( | ||
get_xpath_elements(layer_element, "wms:KeywordList", nsmap)[0], | ||
"wms:Keyword/text()", nsmap | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be made simpler by relying on XPath expressions a bit more (their indexing starts at 1 instead of zero though):
keywords = layer_element.xpath("wms:KeywordList[1]/wms:Keyword/text()", namespaces=nsmap)
It seems there is no need for using the get_xpath_elements()
function in this case
service = get_xpath_elements(element, "wms:Service", nsmap)[0] | ||
contact = get_xpath_elements(service, "wms:ContactInformation", nsmap)[0] | ||
contact_person = get_xpath_elements(contact, "wms:ContactPersonPrimary", nsmap)[0] | ||
address = get_xpath_elements(contact, "wms:ContactAddress", nsmap)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn't seem to be a need to use get_xpath_elements()
here - the simpler element.xpath()
provided by etree
would suffice, i.e.:
service = get_xpath_elements(element, "wms:Service", nsmap)[0] | |
contact = get_xpath_elements(service, "wms:ContactInformation", nsmap)[0] | |
contact_person = get_xpath_elements(contact, "wms:ContactPersonPrimary", nsmap)[0] | |
address = get_xpath_elements(contact, "wms:ContactAddress", nsmap)[0] | |
contact = element.xpath("wms:Service/wms:ContactInformation", nsmap)[0] | |
contact_person = contact.xpath("wms:ContactPersonPrimary", nsmap)[0] | |
address = contact.xpath("wms:ContactAddress", nsmap)[0] |
except (IndexError, KeyError): | ||
legend_url = '' | ||
params = self._base_wms_parameters | ||
wms_url = self.remote_url + '?' + urlencode(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me yet just what should be the value of wms_url
.
This is something to be discussed together with @afabiani
I think we could probably generate a GetMap URL here, so something like:
f"{self.remote_url}?service=WMS&version=1.3.0&request=GetMap&..."
Alternatively we could just keep the base_url
and then later build the GetMap
URL, when it is time to show the layer.
Personally I think it makes more sense to build the full GetMap URL here and then have a way to store it in the GeoNode layer that is created (perhaps in its blob
property) - this would lead to a more decoupled GeoNode layer, where it would not need to query its related harvestable_resource
for its parent harvester
to retrieve the base URL of the WMS server when the time comes to show it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @ricardogsilva
i checked the geonode and the wms_url that returned by geonode is like that
which is will be same with geonode harvester, i'm not sure
https://somaliland.gep.kartoza.com/api/layers/118/
{"extension": "html", "link_type": "OGC:WMS", "mime": "text/html", "name": "OGC WMS: geonode Service", "url": "https://somaliland.gep.kartoza.com/geoserver/ows"}, {"extension": "html", "link_type": "OGC:WFS", "mime": "text/html", "name": "OGC WFS: geonode Service", "url": "https://somaliland.gep.kartoza.com/geoserver/ows"}
|
Everything updated @ricardogsilva |
this fix geosolutions-it/nexus-geonode#183
There are a lot of data that are not provided by wms.
The most important one is UUID, that wms does not provide it.
Should we generate uuid when harvested and save it to harvestingResource?
Because when update, it will always error "UUID does not match the one found on the remote resource"
CC @ricardogsilva