Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_gdf after topology detection #111

Closed
YassineAbdelouadoud opened this issue Sep 14, 2020 · 4 comments
Closed

to_gdf after topology detection #111

YassineAbdelouadoud opened this issue Sep 14, 2020 · 4 comments

Comments

@YassineAbdelouadoud
Copy link
Contributor

YassineAbdelouadoud commented Sep 14, 2020

Hi @mattijn ! Thank you for your work on topojson, it is much appreciated 👍
I would like to raise the following issue : after computing a topology in which one of the polygons shares arcs with more than one other polygon, the geodataframe that is produced contains a duplicated coordinate for the polygon that is connected to several others. Below the reproducing code :

import numpy as np
from shapely.geometry import Polygon
import geopandas as gpd

poly_0 = Polygon([[0.0, 0.0], [1.0, 0.0], [2.0, 0.0], [2.0, 1.0], [1.0, 1.0], [0.0, 1.0], [0.0, 0.]])
poly_1 = Polygon([[0.0, 1.0], [1.0, 1.0], [1.0, 2.0], [0.0, 2.0], [0.0, 1.]])
poly_2 = Polygon([[1.0, 0.0], [2.0, 0.0], [2.0, -1.0], [1.0, -1.0], [1.0, 0.]])

gdf = gpd.GeoDataFrame({
    "name": ["abc", "def", "ghi"],
    "geometry": [
        poly_0,
        poly_1,
        poly_2
    ]
})
topojs=tj.Topology(gdf, prequantize=False, topology=True)
np.array(topojs.to_gdf().geometry[0].exterior.coords)

returns :
array([[0., 1.], [0., 0.], [1., 0.], [2., 0.], [2., 0.], [2., 1.], [1., 1.], [0., 1.]])

This is not an issue visually, but when using the result for further calculation, it can create problems.

@mattijn
Copy link
Owner

mattijn commented Sep 14, 2020

Thanks for trying this package! And thanks for raising the issue! It took me a long lime to realise that the [2., 0.], [2., 0.] are duplicates.. And this indeed should not happen.

A quick scan show that topojs resolves into:

Topology(
{'arcs': [[[1.0, 1.0], [2.0, 1.0], [2.0, 0.0]],
          [[1.0, 0.0], [0.0, 0.0], [0.0, 1.0]],
          [[0.0, 1.0], [0.0, 2.0], [1.0, 2.0], [1.0, 1.0]],
          [[1.0, 1.0], [0.0, 1.0]],
          [[1.0, 0.0], [2.0, 0.0]],
          [[2.0, 0.0], [2.0, -1.0], [1.0, -1.0], [1.0, 0.0]]],
 'bbox': (0.0, -1.0, 2.0, 2.0),
 'coordinates': [],
 'objects': {'data': {'geometries': [{'arcs': [[-4, 0, -5, 1]],
                                      'properties': {'name': 'abc'},
                                      'type': 'Polygon'},
                                     {'arcs': [[2, 3]],
                                      'properties': {'name': 'def'},
                                      'type': 'Polygon'},
                                     {'arcs': [[4, 5]],
                                      'properties': {'name': 'ghi'},
                                      'type': 'Polygon'}],
                      'type': 'GeometryCollection'}},
 'type': 'Topology'}
)

Where this behaviour only occurs on geometry[0] which has its arcs referenced as [[-4, 0, -5, 1]]. So this is:

  • arc 3 reversed
  • arc 0
  • arc 4 reversed
  • arc 1

I cannot see why it behaves wrongly here were the other geometries behave normally. For setting up a specific test I suspect the focus should be on this part: https://github.com/mattijn/topojson/blob/master/topojson/utils.py#L155:L160.

Thanks again for raising!

EDIT:
Example test to add in test_topology.py from where debugging can start once I'm near a developing machine:

def test_topology_geojson_duplicates():
    p0 = wkt.loads('POLYGON ((0 0, 0 1, 1 1, 2 1, 2 0, 1 0, 0 0))')
    p1 = wkt.loads('POLYGON ((0 1, 0 2, 1 2, 1 1, 0 1))')
    p2 = wkt.loads('POLYGON ((1 0, 2 0, 2 -1, 1 -1, 1 0))')
    data = gpd.GeoDataFrame({"name": ["abc", "def", "ghi"], "geometry": [p0, p1, p2]})
    topo = tp.Topology(data, prequantize=False)
    p0_wkt = topo.to_gdf().geometry[0].wkt

    assert p0_wkt == 'POLYGON ((0 1, 0 0, 1 0, 2 0, 2 1, 1 1, 0 1))'

@YassineAbdelouadoud
Copy link
Contributor Author

Thank you for prompt response ! Indeed I should have pointed out the duplicate. Sorry for that !

Thanks also for pointing out where the problem comes from.

After some debugging, here is why this fails with this particular geometry :
You do [i > 0 :] at line 157 to remove the first coordinate of an arc if it is not the first one. The problem comes when an arc is reversed while an other arc has a greater number of points. In this situation, because you reverse the arc first, you only remove the nans used as padding in the arcs' numpy array. Also, removing the element before inverting the order is not a solution because you need to remove the first element in the final order.

I ended up writing the following (I didn't manage to keep it a one liner) to replace line 155 to 161:

coord_list = []
for i, arc in enumerate(arcs):
    arc_coords = tp_arcs[arc if arc >= 0 else ~arc][:: arc >= 0 or -1]
    arc_coords = arc_coords[~np.isnan(arc_coords).any(axis=1)]
    coord_list.append(arc_coords[i > 0 :])
coords = np.concatenate(coord_list)

What do you think ?

@mattijn
Copy link
Owner

mattijn commented Sep 15, 2020

Yes, thanks for looking into it. What you propose is a good solution. If you can turn it into a Pull Request, that would be great!

@mattijn
Copy link
Owner

mattijn commented Sep 16, 2020

Fixed by #112

@mattijn mattijn closed this as completed Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants