Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between autocomplete and search #795

Closed
Platzii opened this issue Feb 8, 2017 · 2 comments
Closed

Difference between autocomplete and search #795

Platzii opened this issue Feb 8, 2017 · 2 comments

Comments

@Platzii
Copy link

Platzii commented Feb 8, 2017

Hi

Why is there such a big difference between certain searches on the autocomplete and search calls?

Example: "Oude Westen, Rotterdam" (= neighbourhood in The Netherlands)
https://search.mapzen.com/v1/search?text=Oude%20Westen%2C%20Rotterdam
https://search.mapzen.com/v1/autocomplete?text=Oude%20Westen%2C%20Rotterdam

Please note that removing 1 letter from the search call returns a 'good' result (https://search.mapzen.com/v1/search?text=Oude%20Westen%2C%20Rotterda)

Any idea why the API's behaviour is like this?

Kind regards
Simon

@missinglink
Copy link
Member

hey @Platzii

The differences in search results between /v1/search and /v1/autocomplete are due to the two endpoints using different query parsers, different token matching algorithms and even different query logic under the hood!

Autocomplete is a difficult beast to tame, not knowing if the final token is 'complete' or not leads to issues that aren't a problem when you know that what you've been given is (mostly) literally what the user wants.

For instance a search term like 'Londo' could refer to a place in Mozambique or it could be just the start of 'London', so we have to use data structures that are aware of the token prefix permutations.

For /v1/search we use a query parser called libpostal which is based on dictionaries of known words, and for this reason it's not very good at working with partial token inputs.

For /v1/autocomplete we use a much simpler parser called addressit which is based on regular expressions, so it's not so fussed about tokens being a dictionary word or not.

You can see the results of each parser in the head of the response document:

/v1/search (libpostal)

  "text": "Oude Westen, Rotterdam",
  "parsed_text": {
    "street": "oude westen",
    "city": "rotterdam"
  }
/v1/autocomplete (addresssit)

  "text": "Oude Westen, Rotterdam",
  "parsed_text": {
    "name": "Oude Westen",
    "regions": [
      "Oude Westen",
      "Rotterdam"
    ],
    "admin_parts": "Rotterdam"
  }

so.. in this case it seems like libpostal has got it wrong, it's decided that the token pair 'oude westen' is the name of a street, when it's actually a neighborhood.

That's resulted in the query builder building an incorrect query and since we didn't find a street by that name in Rotterdam we've just returned the best thing we could find, which in this case was the city.

For autocomplete the query builder has built the correct query and got the correct result.

The last query (ending in 'Rotterda') would have been sent to libpostal and would have resulted in a failed parse, in this case libpostal would not have found either token in it's dictionary and so it would have simply returned nothing.

In this case we fall back to using addressit and so the correct answer is returned :)

@Platzii
Copy link
Author

Platzii commented Apr 22, 2017

Seems like this is fixed in libpostal 1.0 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants