Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplification using a %-like value #11

Open
mattijn opened this issue Apr 28, 2020 · 5 comments
Open

simplification using a %-like value #11

mattijn opened this issue Apr 28, 2020 · 5 comments

Comments

@mattijn
Copy link

mattijn commented Apr 28, 2020

Thanks for your work on these simplification algorithms!

Line simplification can be done using the Douglas-Pecker or Visvalingam-Whyatt algorithm. They both operate through an epsilon/tolerance parameter. Depending on the algorithm it represent distance (DP) or area (VW).
Therefor the required tolerance in each situation is hard to decide since it is implicitly depending on the projection as well (eg. meters or degrees).

Mapshaper provide the option to reduce the linestring by a percentage of points. It would be nice to have this possibility as well for these algorithms.

(you probably also like this blog: http://martinfleischmann.net/line-simplification-algorithms/ which incorporate your package).

@mariuswallraff
Copy link

I second this; for our use case of simplifying time series data down to a given number of points, this would be an extremely useful option to have (either as percentage or absolute number of points). Currently we execute simplify_coords_vw_idx consecutively with increasing tolerance, and either do very many steps, or end up oversimplifying to a large degree.

I had a quick glance at the underlying code and believe it might be possible to "just" add an additional break criterion based on the length of the remaining heap, but to be fair I have no idea about Rust.

If it helps, I wouldn't mind if the implementation is approximate in either number of points or order of the removed points.

Either way, thank you for your work! :)

@mayuri-kulkarni
Copy link

I too have a similar requirement!

@mariuswallraff
Copy link

For those landing here for a similar reason, it might interest you that while this feature would still be very useful, in our use case, running simplify_coords_vw_idx consecutively with increasing tolerance (and applying the index to filter out unneeded data points before the next call) is actually faster than calling the function just once with the final tolerance.

I assume this is because finding the triangles that need to be recomputed in the stack becomes easier when the stack is smaller, but that's just a guess.

@urschrei
Copy link
Owner

urschrei commented May 1, 2024

For those landing here for a similar reason, it might interest you that while this feature would still be very useful, in our use case, running simplify_coords_vw_idx consecutively with increasing tolerance (and applying the index to filter out unneeded data points before the next call) is actually faster than calling the function just once with the final tolerance.

I assume this is because finding the triangles that need to be recomputed in the stack becomes easier when the stack is smaller, but that's just a guess.

That's very interesting. I'll have to dig into it in geo and see whether we can verify that guess. Do you have some example parameters?

@mariuswallraff
Copy link

mariuswallraff commented May 2, 2024

Sure, here's a small example, although it's not as good as with our real-life data; below I attached a slide illustrating my previous results.

from timeit import Timer

import numpy as np
from simplification.cutil import simplify_coords_vw_idx

np.random.seed(1337)

a = np.ascontiguousarray(
    np.vstack(
        [
            np.arange(10000),
            np.concatenate([np.random.normal(0.0, 0.1, 5000), np.random.normal(1.0, 0.1, 5000)]),
        ]
    ).T
)

# target: less than or equal 100 points
max_points = 100

epsilon = 20.48
out = simplify_coords_vw_idx(a, epsilon)  # => 91 points

print(Timer("simplify_coords_vw_idx(a, epsilon)", globals=locals()).repeat(3, number=1000))
# [4.4360130999994, 4.323799300007522, 4.425885300006485]


def iteratively(input_array, epsilon, max_points):
    selected_idx = np.arange(0, len(input_array))
    while True:
        current_idx = simplify_coords_vw_idx(input_array, epsilon)
        selected_idx = selected_idx[current_idx]

        if not max_points or len(selected_idx) <= max_points:
            break

        input_array = input_array[current_idx]
        epsilon *= 2.0
    return selected_idx


print(Timer("iteratively(a, 20.48, max_points)", globals=locals()).repeat(3, number=1000))
# [4.509853900002781, 4.511391199979698, 4.359233400027733]
print(Timer("iteratively(a, 10.24, max_points)", globals=locals()).repeat(3, number=1000))
# [4.469542200007709, 4.420092500018654, 4.369932599976892]
print(Timer("iteratively(a, 5.12, max_points)", globals=locals()).repeat(3, number=1000))
# [4.414737000013702, 4.3983672999893315, 4.3968801999872085]
print(Timer("iteratively(a, 2.56, max_points)", globals=locals()).repeat(3, number=1000))
# [4.21051350000198, 4.002777400019113, 3.9561154000111856]
print(Timer("iteratively(a, 1.28, max_points)", globals=locals()).repeat(3, number=1000))
# [3.8792780000076164, 3.972585600015009, 4.081758899992565]
print(Timer("iteratively(a, 0.64, max_points)", globals=locals()).repeat(3, number=1000))
# [3.955509699997492, 3.804218100005528, 3.783603399991989]
print(Timer("iteratively(a, 0.32, max_points)", globals=locals()).repeat(3, number=1000))
# [3.7186095000070054, 4.0281457999954, 3.7554183999891393]
print(Timer("iteratively(a, 0.16, max_points)", globals=locals()).repeat(3, number=1000))
# [3.755888199986657, 3.7749703000008594, 3.7538082000100985]
print(Timer("iteratively(a, 0.08, max_points)", globals=locals()).repeat(3, number=1000))
# [3.7066072000015993, 3.6726426999957766, 3.708958800008986]
print(Timer("iteratively(a, 0.04, max_points)", globals=locals()).repeat(3, number=1000))
# [3.9454101999872364, 3.809387200017227, 3.8627665000094566]

print(len(out))
print(len(iteratively(a, 0.04, max_points)))
print()

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants