Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] data duplication on save (maybe render?) #725

Closed
knaaptime opened this issue Jan 3, 2025 · 4 comments · Fixed by #726
Closed

[BUG] data duplication on save (maybe render?) #725

knaaptime opened this issue Jan 3, 2025 · 4 comments · Fixed by #726
Labels
bug Something isn't working

Comments

@knaaptime
Copy link

Context

if i create several independent Maps and write each out to file, each successive file contains all of the data from prior maps until the kernel is restarted

Resulting behaviour, error message or logs

every map after the first contains data from all prior maps

Environment

macos 15
geopandas : 1.0.1
geodatasets: 2024.8.0
lonboard : 0.10.3

Steps to reproduce the bug

reproducible example. This will write 5 test files, the first of which is 3.3mb, the last of which is 16.7

import geodatasets
import geopandas as gpd
from lonboard import Map, PolygonLayer, viz

gdf = gpd.read_file(geodatasets.get_path("geoda.milwaukee1"))
gdf = gdf[["HH_INC", "geometry"]]

for i in range(5):
    Map([PolygonLayer.from_geopandas(gdf)]).to_html(f'test{i}.html')

maybe this is anywidget related? I'm pretty sure this happens each time a map is rendered, not necessarily each time its written to file. If you recreate a bunch of maps in the same notebook (as in re-run the cells, not generate a billion maps) it will blow up the RAM and eventually crash the kernel

@knaaptime knaaptime added the bug Something isn't working label Jan 3, 2025
@mgax
Copy link

mgax commented Jan 6, 2025

Even creating PolygonLayer instances has an effect. Running this code as-is generates a 3.3M file, and with the loop uncommented, the file is 5M:

import geodatasets
import geopandas as gpd

from lonboard import Map, PolygonLayer

gdf = gpd.read_file(geodatasets.get_path("geoda.milwaukee1"))
gdf = gdf[["HH_INC", "geometry"]]

## uncomment to get larger file:
# for i in range(5):
#     PolygonLayer.from_geopandas(gdf)

Map([PolygonLayer.from_geopandas(gdf)]).to_html("test.html")

@mgax
Copy link

mgax commented Jan 6, 2025

So what I think is causing the issue is that Map.to_html() calls embed_minimal_html from ipywidgets, which calls embed_data. But embed_data is called with no state, so it calls Widget.get_manager_state, which returns the state of all widgets ever created.

@kylebarron
Copy link
Member

Thanks for tracking that down! Would you be able to test out #726?

@mgax
Copy link

mgax commented Jan 6, 2025

Would you be able to test out #726?

@kylebarron I've run the same code on your branch and it the file size stays consistent at 3MB 👍

If a program generates a lot of these files in a loop, it's still going to leak memory, but that seems like a known ipywidgets issue (jupyter-widgets/ipywidgets#1345). The ticket suggests a workaround – explicitly closing down widgets when we're done with them – but that's tedious and error-prone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants