-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New OpenStreetMap Carto release, v4.25.0 #264
Comments
Just wanted to double check, if there's something wrong with this release? I've seen a number of reports where people are complaining about rendering issues and gray tiles. https://munin.openstreetmap.org/openstreetmap.org/rhaegal.openstreetmap.org/mod_tile_fresh.html shows a fairly large amount of "old tile, attempted render". The other bits of the rendering infrastructure seem fairly busy as well: https://munin.openstreetmap.org/mod_tile-week.html This release has a few new places where ST_PointOnSurface is being used. Does it cause performance issues by an increased CPU load? |
No idea - what do the graphs show? |
I mean obviously it's normal that tiles need rerendering after a change. |
In absolute numbers those figures are probably not that meaningful. I was trying to compare current ones to how the systems behaved during the last update, and it seems that things like latency or the already mentioned "old tile, attempted render" are unusually large. https://munin.openstreetmap.org/openstreetmap.org/odin.openstreetmap.org/mod_tile_latency.html |
Well it's been so long from the last release that it's hard to compare. That's just disk I/O time you're looking at so I don't see how that would be impacted? |
odin is maxed out on CPU since this new Carto release has been deployed: https://munin.openstreetmap.org/openstreetmap.org/odin.openstreetmap.org/cpu.html I think this box has never been this busy during the last year. |
odin is running |
We will be moving to the latest postgresql and postgis when we do the reload which I believe is now expected with the next carto release? Should we rollback in the meantime? |
I think a rollback is the safest and quickest option. Based on the assumption that ST_PointOnSurface() performance issues are the cause of this (which is plausible - see gravitystorm/openstreetmap-carto#4009) this is not a bug that can be easily and fully fixed with a minor release but requires either re-thinking the strategic decision for moving to ST_PointOnSurface() for polygon labels/icons or a PostGIS update solving the issue. What seems a bit weird is that this release caused trouble while the previous ones did not - because the biggest uses of ST_PointOnSurface() were already in there in the previous release. As @pnorman says in #211 (comment) the next step would be for us in OSM-Carto to decide to either make a new major release for you to do a system upgrade with or to roll back the move to ST_PointOnSurface() in a way that makes it suitable for the current infrastructure. |
I don’t think it’s a good idea to operate a service where we want fast response times at 95% cpu utilization. Although the throughput may be comparable, dropped tiles and largely delayed tiles impact the user experience. |
Maxed out CPUs is totally normal when a new style is deployed. Indeed on the slower machines like rhaegal it's normall most days. |
Another quick update, collecting some user feedback on different channels: Now that the new style is in production for about 13 days, users both on the forum and now also on Telegram channels keep on complaining about poor performance, gray tiles, timeouts. One user remarked that the bigger tile servers seem to have managed to reduce the backlog in the meantime. However, 4 of the smaller ones still seem to struggle quite a bit and they keep dropping tiles. So no matter what Munin throughput stats say, user feedback seems to indicate that performance degraded quite a bit. |
Well maybe but we have no idea how much of that is the caches and how much is the render servers - many of the caches are fairly overloaded and cause those kinds of effects anyway. |
Many of the users are longtime osm users, and I assume they have quite a bit of experience what to expect in terms of response times and gray tiles. Some of them even acknowledged that switching styles has caused some issues in the past, but reportedly it's never been this bad in recent times and the system recovered much quicker in the past. The situation on the tile caches probably hasn't changed much in the last three weeks. |
We are stilling fighting significant issues with the squid 4 migration - just this morning I have found two caches that are in a degraded state and have been fixing them and there are likely others. |
Ah, that's good to know. It's hard to reason about this, if all you get from users is more or less "it doesn't work, it's slow, there's timeouts". I'm thinking if it might be worthwhile adding some diagnostics code to osm.org to get a breakdown of tile performance per cache/rendering server. We have all relevant data in some HTTP X fields, but no way to have users report those. |
A new version of OpenStreetMap Carto, v4.25.0, has been released.
I believe there are no major changes required for deployment.
The text was updated successfully, but these errors were encountered: