Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage with prequantization enabled #213

Open
Zaczero opened this issue Nov 30, 2023 · 5 comments
Open

Excessive memory usage with prequantization enabled #213

Zaczero opened this issue Nov 30, 2023 · 5 comments

Comments

@Zaczero
Copy link

Zaczero commented Nov 30, 2023

I am primarily posting this issue for future people facing a similar problem.

In my case, when the prequantize option is enabled (which is the default setting), the toposimplify method consumes 25GB of memory. However, when I disable the prequantize option, memory usage peaks at just 5GB. I utilize shapely for simplification.


Reproduction steps

  1. Download both parts of the archive:
    countries1.zip
    countries2.zip

  2. Combine the archives:

cat countries1.zip countries2.zip > countries.zip
  1. Unzip it.

  2. Execute the following Python code snippet:

with open('countries.geojson', 'rb') as f:
    features = json.load(f)['features']
countries_geoms = [shape(f['geometry']) for f in features]
topo = tp.Topology(countries_geoms)
topo.toposimplify(0.00001, inplace=True)
  1. Monitor memory usage.

  2. To resolve the issue, replace topo with:

topo = tp.Topology(countries_geoms, prequantize=False)

By the way, should prequantization be enabled by default? I personally find it odd that the library performs certain calculations by default, even if they don't apply to my use case and don't provide any benefit. I can only understand such default behavior if it benefits everyone. Otherwise, this should be an opt-in operation (the same as simplification is opt-in).

@mattijn
Copy link
Owner

mattijn commented Dec 2, 2023

Thank you for raising the issue and it is great to see you find this package useful for your need!
Until now, speed has been the main bottleneck, but if we can reduce the memory footprint, that would be great too.
It's worth to profile the code to find the main culprit that is causing the memory to blow up.

@Zaczero
Copy link
Author

Zaczero commented Dec 2, 2023

🙂! If you are interested, I use this package to run https://github.com/Zaczero/osm-countries-geojson. It finally resolved the issue with overlaps/gaps produced during the simplification process. And now it's perfect!

@mattijn
Copy link
Owner

mattijn commented Dec 2, 2023

Thanks for showing your package! May I ask how the directed graph of networkx is being utilised for your use-case? That seems interesting!

I was looking to your referenced geojson and noticed at least two things that you might check.

  • it seems there is a (part of a) country missing near Morocco:
image
  • something is doing odd in the south of the Netherlands:
image

Again, thanks for reaching out!

@Zaczero
Copy link
Author

Zaczero commented Dec 2, 2023

  1. This is simply the nature of OSM data. In regions of conflict, it's common to encounter such situations. Sometimes, you might even come across two countries at the same time:
    2023-12-03_00-45-33

  2. This appears to be a bug with GitHub's GeoJSON visualizer. They seem to apply their own simplification for rendering. Here's how this location appears on OSM:
    image
    And this is how it looks when rendered locally (which is acceptable for such a high level of simplification):
    image

I understand that the documentation for the countries generator is lacking. Essentially, the directed graph is utilized to reconstruct country polygons efficiently from split and randomly ordered line segments. OSM data does not store countries in predefined shapes but rather as a collection of lines. The directed graph (compared to undirected) improves performance by reducing the number of paths simple cycles has to traverse. Each node represents an intersection (lines endpoints), and each edge represents a line segment.

image

@mattijn
Copy link
Owner

mattijn commented Dec 3, 2023

Interesting! Halfway in the computation of a topology the line segments are also split where the order is not always clear. In the hashmap-step I use a _hash_order() to determine the order. Maybe I could have used a directed graph there as well.
Regarding 1), I can understand the claim of a single place by multiple countries, but I didn't expect a place not being claimed by any country.
Regarding 2), the OSM location seems to be OK, the border is a bit messy there. Maybe it's a glitch when zooming out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants