Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KDtree in scipy.spatial is using Minkowski distance which is not suitable for latitude/longitude #59

Open
zhouchanghai opened this issue Dec 20, 2020 · 2 comments

Comments

@zhouchanghai
Copy link

For example the real distance between (0N, 0E) (0N, 1E) is 112Km, and the real distance between (80N, 0E) (80N, 1E) is 19Km (https://www.movable-type.co.uk/scripts/latlong.html). But they are the same in Minkowski distance. BTW, the latitude range is -90 to 90, but the longitude range is -180 to 180, different scale.

@BoZenKhaa
Copy link

Do you have some dataset where this causes issues? I have wondered about this as well, while what you are saying is true, the distance is used to find the nearest neighbours. I don't have data on how many mislabeled points this causes, but since the labels are approximate anyway (since each district is described only by a single point), the slowdown from using some more complex metric might make it not worth the effort to use a different metric...

@Dobatymo
Copy link

Dobatymo commented Dec 16, 2022

You could try to replace it with sklearn.neighbors.KDTree(..., metric="haversine").

EDIT: oh it seems haversine is not supported.

>>> KDTree.valid_metrics
['euclidean', 'l2', 'minkowski', 'p', 'manhattan', 'cityblock', 'l1', 'chebyshev', 'infinity']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants