Difference between autocomplete and search #795

Platzii · 2017-02-08T16:07:35Z

Hi

Why is there such a big difference between certain searches on the autocomplete and search calls?

Example: "Oude Westen, Rotterdam" (= neighbourhood in The Netherlands)
https://search.mapzen.com/v1/search?text=Oude%20Westen%2C%20Rotterdam
https://search.mapzen.com/v1/autocomplete?text=Oude%20Westen%2C%20Rotterdam

Please note that removing 1 letter from the search call returns a 'good' result (https://search.mapzen.com/v1/search?text=Oude%20Westen%2C%20Rotterda)

Any idea why the API's behaviour is like this?

Kind regards
Simon

missinglink · 2017-02-13T17:17:25Z

hey @Platzii

The differences in search results between /v1/search and /v1/autocomplete are due to the two endpoints using different query parsers, different token matching algorithms and even different query logic under the hood!

Autocomplete is a difficult beast to tame, not knowing if the final token is 'complete' or not leads to issues that aren't a problem when you know that what you've been given is (mostly) literally what the user wants.

For instance a search term like 'Londo' could refer to a place in Mozambique or it could be just the start of 'London', so we have to use data structures that are aware of the token prefix permutations.

For /v1/search we use a query parser called libpostal which is based on dictionaries of known words, and for this reason it's not very good at working with partial token inputs.

For /v1/autocomplete we use a much simpler parser called addressit which is based on regular expressions, so it's not so fussed about tokens being a dictionary word or not.

You can see the results of each parser in the head of the response document:

/v1/search (libpostal)

  "text": "Oude Westen, Rotterdam",
  "parsed_text": {
    "street": "oude westen",
    "city": "rotterdam"
  }

/v1/autocomplete (addresssit)

  "text": "Oude Westen, Rotterdam",
  "parsed_text": {
    "name": "Oude Westen",
    "regions": [
      "Oude Westen",
      "Rotterdam"
    ],
    "admin_parts": "Rotterdam"
  }

so.. in this case it seems like libpostal has got it wrong, it's decided that the token pair 'oude westen' is the name of a street, when it's actually a neighborhood.

That's resulted in the query builder building an incorrect query and since we didn't find a street by that name in Rotterdam we've just returned the best thing we could find, which in this case was the city.

For autocomplete the query builder has built the correct query and got the correct result.

The last query (ending in 'Rotterda') would have been sent to libpostal and would have resulted in a failed parse, in this case libpostal would not have found either token in it's dictionary and so it would have simply returned nothing.

In this case we fall back to using addressit and so the correct answer is returned :)

Platzii · 2017-04-22T17:20:40Z

Seems like this is fixed in libpostal 1.0 !

missinglink mentioned this issue Feb 13, 2017

'oude westen' incorrectly classified as a street openvenues/libpostal#162

Closed

dianashk added the outreach label Feb 24, 2017

missinglink mentioned this issue Feb 27, 2017

Autocomplete can't handle apartments, suites, units, etc. pelias/pelias#510

Closed

Platzii closed this as completed Apr 22, 2017

ghost removed the outreach label Apr 22, 2017

missinglink mentioned this issue Mar 10, 2020

Better documentation of what parser is? pelias/parser#82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between autocomplete and search #795

Difference between autocomplete and search #795

Platzii commented Feb 8, 2017 •

edited by dianashk

Loading

missinglink commented Feb 13, 2017

Platzii commented Apr 22, 2017

Difference between autocomplete and search #795

Difference between autocomplete and search #795

Comments

Platzii commented Feb 8, 2017 • edited by dianashk Loading

missinglink commented Feb 13, 2017

Platzii commented Apr 22, 2017

Platzii commented Feb 8, 2017 •

edited by dianashk

Loading