Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update wms harvester to be able to save to geonode #884

Closed

Conversation

meomancer
Copy link

this fix geosolutions-it/nexus-geonode#183

There are a lot of data that are not provided by wms.
The most important one is UUID, that wms does not provide it.

Should we generate uuid when harvested and save it to harvestingResource?
Because when update, it will always error "UUID does not match the one found on the remote resource"

CC @ricardogsilva

@ricardogsilva
Copy link

@meomancer this work is more complicated than what it may seem, since the ultimate goal here is to replace the current remote services functionality. This means that we will need to modify geonode.layers.models.Layer in order to not depend on services.models.Service and implement the same features in an alternative way.

Please interact with @afabiani and @giohappy in order to plan how this may be done before implementing the code, since @afabiani is likely to have already an idea of how this should be done

@meomancer meomancer force-pushed the 183-wms-harvester branch from 74b5d4e to 84cdcbb Compare July 8, 2021 07:26
) -> typing.Optional[base.HarvestedResourceInfo]:
resource_unique_identifier = harvestable_resource.unique_identifier
data = self._get_data()
for layer in data['layers']:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would make the code more readable if instead of this nested for loop with an if inside of it (adds a lot of visual noise due to the indentation) you would get the relevant layer first and then process it. Something like:

try:
    relevant_layer = [layer for layer in data["layers"] if layer["name"] == resource_unique_identifier][0]
except IndexError:
    result = None
    logger.exception(f"Could not find resource {resource_unique_identifier!r}")
else:
    # now do stuff with the `relevant_layer`

Also, in my opinion (but I guess this is a hot topic) it would be preferable to use a singe return statement, instead of multiple returns - I think it makes the code easier to read.

root = etree.fromstring(get_capabilities_response.content, parser=XML_PARSER)
nsmap = _get_nsmap(root.nsmap)
useful_layers_elements = []

# get layers from element tree

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this type of comment from the code - it is not adding any value, since it is already clear what is going on below.

Comments are very useful to explain some context about the code or explain some harder to understand particularities. But if they are over used they end up being just visual noise.

Comment on lines +280 to +289
bbox_elements = get_xpath_elements(
layer_element, "wms:EX_GeographicBoundingBox", nsmap)[0]
left_x = get_xpath_value(
bbox_elements, "wms:westBoundLongitude", nsmap)
right_x = get_xpath_value(
bbox_elements, "wms:eastBoundLongitude", nsmap)
lower_y = get_xpath_value(
bbox_elements, "wms:southBoundLatitude", nsmap)
upper_y = get_xpath_value(
bbox_elements, "wms:northBoundLatitude", nsmap)
Copy link

@ricardogsilva ricardogsilva Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XPath expressions can be nested, so you could simply do

left_x = get_xpath_value(
    layer_element, "wms:EX_GeographicBoundingBox/wms:westBoundLongitude", nsmap)
# similar for right_x, lower_y, upper_y

It seems there is no need for using the get_xpath_elements() function in this case

Comment on lines +57 to +76
def get_xpath_value(
element: etree.Element,
xpath_expression: str,
nsmap: typing.Optional[dict] = None
) -> typing.Optional[str]:
if not nsmap:
nsmap = element.nsmap
values = element.xpath(f"{xpath_expression}//text()", namespaces=nsmap)
return "".join(values).strip() or None


def get_xpath_elements(
element: etree.Element,
xpath_expression: str,
nsmap: typing.Optional[dict] = None
) -> typing.List[etree.Element]:
if not nsmap:
nsmap = element.nsmap
elements = element.xpath(f"{xpath_expression}", namespaces=nsmap)
return elements
Copy link

@ricardogsilva ricardogsilva Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated above, it seems like the get_xpath_elements() function may not be necessary in this case. If you are only using it as an intermediary in order to then call get_xpath_value() on its result, then there is no need for that, as XPath allows querying elements nested deep inside the node tree. Otherwise, if you have a valid use case for it I won't mind having it.

I'm just trying to see if you can remove code instead of adding more - the more code we have, the harder it is for people to maintain it in the future.

Comment on lines +254 to +257
keywords = get_xpath_elements(
get_xpath_elements(layer_element, "wms:KeywordList", nsmap)[0],
"wms:Keyword/text()", nsmap
)
Copy link

@ricardogsilva ricardogsilva Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made simpler by relying on XPath expressions a bit more (their indexing starts at 1 instead of zero though):

keywords = layer_element.xpath("wms:KeywordList[1]/wms:Keyword/text()", namespaces=nsmap)

It seems there is no need for using the get_xpath_elements() function in this case

Comment on lines +224 to +227
service = get_xpath_elements(element, "wms:Service", nsmap)[0]
contact = get_xpath_elements(service, "wms:ContactInformation", nsmap)[0]
contact_person = get_xpath_elements(contact, "wms:ContactPersonPrimary", nsmap)[0]
address = get_xpath_elements(contact, "wms:ContactAddress", nsmap)[0]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a need to use get_xpath_elements() here - the simpler element.xpath() provided by etree would suffice, i.e.:

Suggested change
service = get_xpath_elements(element, "wms:Service", nsmap)[0]
contact = get_xpath_elements(service, "wms:ContactInformation", nsmap)[0]
contact_person = get_xpath_elements(contact, "wms:ContactPersonPrimary", nsmap)[0]
address = get_xpath_elements(contact, "wms:ContactAddress", nsmap)[0]
contact = element.xpath("wms:Service/wms:ContactInformation", nsmap)[0]
contact_person = contact.xpath("wms:ContactPersonPrimary", nsmap)[0]
address = contact.xpath("wms:ContactAddress", nsmap)[0]

except (IndexError, KeyError):
legend_url = ''
params = self._base_wms_parameters
wms_url = self.remote_url + '?' + urlencode(params)
Copy link

@ricardogsilva ricardogsilva Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me yet just what should be the value of wms_url.

This is something to be discussed together with @afabiani

I think we could probably generate a GetMap URL here, so something like:

f"{self.remote_url}?service=WMS&version=1.3.0&request=GetMap&..."

Alternatively we could just keep the base_url and then later build the GetMap URL, when it is time to show the layer.

Personally I think it makes more sense to build the full GetMap URL here and then have a way to store it in the GeoNode layer that is created (perhaps in its blob property) - this would lead to a more decoupled GeoNode layer, where it would not need to query its related harvestable_resource for its parent harvester to retrieve the base URL of the WMS server when the time comes to show it

Copy link
Author

@meomancer meomancer Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @ricardogsilva
i checked the geonode and the wms_url that returned by geonode is like that
which is will be same with geonode harvester, i'm not sure
https://somaliland.gep.kartoza.com/api/layers/118/
{"extension": "html", "link_type": "OGC:WMS", "mime": "text/html", "name": "OGC WMS: geonode Service", "url": "https://somaliland.gep.kartoza.com/geoserver/ows"}, {"extension": "html", "link_type": "OGC:WFS", "mime": "text/html", "name": "OGC WFS: geonode Service", "url": "https://somaliland.gep.kartoza.com/geoserver/ows"}

@afabiani afabiani closed this Jul 8, 2021
@afabiani afabiani deleted the branch geosolutions-it:nexus_master July 8, 2021 17:25
@afabiani
Copy link
Member

afabiani commented Jul 8, 2021

geosolutions-it:nexus_master deleted, please rebase this PR against GeoNode/master

@meomancer
Copy link
Author

Everything updated @ricardogsilva
please check here : GeoNode#7759

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants