Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Geoserver store cache management when tables are deleted / overwritten #8

Open
giohappy opened this issue Jul 13, 2022 · 2 comments

Comments

@giohappy
Copy link
Contributor

When a resource, and related tables, are deleted Geoserver's store cache must be updated to keep track of the change to the underlying DB.
At the moment everything goes fine because (apparently) when a new import is performed using a previous table, the imported asks the store the cache for that table.

Now GeoNode itself manages the table's lifecycle and it should inform Geoserver's store once the table is deleted or changed.

Unfortunately, GS REST API only allows wiping either all the store cache or nothing (with an empty PUT request): This is not a viable solution because it would extremely inefficient with many layers.

@aaime says that exposing an API to make a selective wiping of the store cache shouldn't be a big deal but, before opening an issue for that, we must make some tests and verify how GeoNode behaves now (from what I remember it doesn't do anything directly with the store) and confirm that we don't have any solution in place yet.

@mattiagiupponi I would ask you to get the required insights.

@etj please take control of this requirement.

@mattiagiupponi
Copy link
Contributor

@giohappy I made some investigation on the master branch about how geonode behaves and (opposite about what we thought) is geonode that deletes the cache of all stores on each upload/replace.

The catalog used inside GeoNode (which is a kind of SDK to communicate with geoserver) exposes this endpoint:

    def reset(self):
        url = f"{self.service_url}/reset"
        resp = self.http_request(url, method='post')
        self._cache.clear()
        return resp

which will clear the cache for all stores available as described in the geoserver documentation:
https://docs.geoserver.org/stable/en/user/rest/api/reset.html

Resets all store, raster, and schema caches. This operation is used to force GeoServer to drop all caches and store connections and reconnect to each of them the next time they are needed by a request

Inside geonode, we are calling it when the concrete_manager performs the upload/replace:

        gs_catalog.reset()
        # Let's now try the new ingestion
        import_session = gs_uploader.start_import(
            import_id=upload_session.id,
            name=_name,
            target_store=_target_store
        )

So we are already deleting the whole cache before starting a new upload/replace.

We can reimplement this in the new importer, but as we discuss it may be not the best option for a performance POW.
Maybe just have the possibility to decide which store we want to delete it will be a better option

@giohappy
Copy link
Contributor Author

"it may be not the best option for a performance" to say the least!
I've created geosolutions-it/geoserver#339 to estimate the implementation of granular store cache removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants