Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add region to default labels if different from city #43

Merged
merged 2 commits into from
Feb 10, 2021

Conversation

orangejulius
Copy link
Member

This adds the region to the default labels, but only if the region name is different from the city name (defined as locality or localadmin name).

The intent is to handle major world cities like Berlin, Sao Paulo, Paris, etc that are contained within an administrative region of the same name, and are so well known that they do not require any additional specifiers.

In the more common case where the region and city names are different, the region abbreviation is preferred, with the region name being returned only if the abbreviation is not available.

@orangejulius
Copy link
Member Author

I think this can serve as a replacement for #41, with a bit of refinement.

It more closely follows our existing logic to use region_a if available, then falling back to region. It also has the logic for skipping the region name (or abbreviation) when it would duplicate the locality name.

@orangejulius orangejulius force-pushed the add-region-to-default-schema branch from d46e1a1 to ea52ea2 Compare February 9, 2021 00:52
@missinglink
Copy link
Member

missinglink commented Feb 9, 2021

A couple of caveats to this approach, just putting them out there in case they are easy fixes:

  • In some parts of the world the WOF data includes suffixes such as "Auckland" being in "Auckland CBD" in "Auckland City" in "Auckland Region", in these cases we wouldn't detect the redundancy. It might be worth checking .startsWith() in some form, to at least catch some easier version of this, such as removing any where the leftmost name is a prefix string of any other names.
  • There is no normalisation being performed. It's an issue with all the string matching in this repo but worth mentioning again, we could perform some basic string normalisation before comparing equality. Something simple like lowercase, trim, remove-punctuation would help a lot, ascii_folding would also be a cherry-on-top.

Happy to punt those topics, just need to add some Issues for them.

@orangejulius orangejulius force-pushed the add-region-to-default-schema branch from ea52ea2 to 5003348 Compare February 9, 2021 17:40
@orangejulius
Copy link
Member Author

Ok, after some testing we definitely needed some basic normalization. I borrowed some code from the API that looks perfect for the task:

https://github.com/pelias/api/blob/0a964c2420dcf2bfc60279916ddbf4fa5d9c4c90/helper/diffPlaces.js#L230-L232

orangejulius added a commit to pelias/api that referenced this pull request Feb 9, 2021
@orangejulius orangejulius force-pushed the add-region-to-default-schema branch from 5003348 to 89c56f5 Compare February 9, 2021 17:55
Tests should verify behavior, not implementation details like the number
of properties on an object
This adds the region to the default labels, but only if the _region
name_ is different from the _city name_ (defined as locality or
localadmin name).

The intent is to handle major world cities like Berlin, Sao Paulo,
Paris, etc that are contained within an administrative region of the
same name, and are so well known that they do not require any additional
specifiers.

Differences such as capitalization and accents are ignored for
comparison purposes, since they often exist in real data.

In the more common case where the region and city names are different,
the region abbreviation is preferred, with the region name being
returned only if the abbreviaton is not available.
@orangejulius orangejulius force-pushed the add-region-to-default-schema branch from 92d8af6 to 85c36e4 Compare February 9, 2021 22:06
orangejulius added a commit that referenced this pull request Feb 10, 2021
As a small country, the regions in Singapore are not particularly
useful, and they do not appear to be used in address labels.

By adding a Singapore specific label schema, we can avoid regions
appearing in Singapore labels after we enable them by default in
#43.
@orangejulius
Copy link
Member Author

orangejulius commented Feb 10, 2021

Some quick examples of how this looks now:
image
Here the well known São Paulo does not have a region in the label, but the others do.

Likewise for Berlin. Some other cities with berlin in the name in Berlin Brandenburg have an abbreviation:
image
I'm not sure this is 100% great, we should check to see what the most common system in Germany is. Looking at Nominatim's list, the region is not included.

orangejulius added a commit that referenced this pull request Feb 10, 2021
As a small country, the regions in Singapore are not particularly
useful, and they do not appear to be used in address labels.

By adding a Singapore specific label schema, we can avoid regions
appearing in Singapore labels after we enable them by default in
#43.
@missinglink
Copy link
Member

The classic testcase for Germany is that Frankfurt am Main and Frankfurt an der Oder produce different labels.

@missinglink
Copy link
Member

I think this is a much better default to have, there may be some improvements to be made on a per-country level but this raises the bar for anything using the generic label generator 👍

@orangejulius orangejulius merged commit 7b26f4e into master Feb 10, 2021
@orangejulius orangejulius deleted the add-region-to-default-schema branch February 10, 2021 14:21
orangejulius added a commit that referenced this pull request Feb 10, 2021
As a small country, the regions in Singapore are not particularly
useful, and they do not appear to be used in address labels.

By adding a Singapore specific label schema, we can avoid regions
appearing in Singapore labels after we enable them by default in
#43.
orangejulius added a commit to pelias/api that referenced this pull request Feb 10, 2021
This includes improvements for:
- the default label format, adding region, including handling major
  cities in a region of the same name (pelias/labels#43
- a new label format for Singapore, handling current and future WOF data
  (pelias/labels#45)
orangejulius added a commit to pelias/api that referenced this pull request Feb 10, 2021
This includes improvements for:
- the default label format, adding region, including handling major
  cities in a region of the same name (pelias/labels#43)
- a new label format for Singapore, handling current and future WOF data
  (pelias/labels#45)
orangejulius added a commit to pelias/acceptance-tests that referenced this pull request Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants