-
Notifications
You must be signed in to change notification settings - Fork 14.3k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-27] Proposal for Paranoid Deletes #8639
Comments
Issue-Label Bot is automatically applying the label Links: app homepage, dashboard and code for this bot. |
@john-bodley are you thinking that the cascade deletes will also be soft/paranoid? In general I would be very in favor of moving to soft-deletes. |
@willbarrett I think it depends on the entity type. For charts, dashboards, and datasources I think these should be soft-deletes, however cascade deletes are valid for removing columns/metrics once/if the datasource/table is hard deleted. I think there's merit in enumerating which tables should support soft and cascading deletes. |
+1 for this proposal, we are thinking of implementing this @ dropbox as well either through APIs & some heuristics. It would be great to have 1st class support for this in Superset |
+1 for this proposal
will either of you be interested in collaborating on this feature? @john-bodley @bkyryliuk? |
Replying here @junlincc (although i'm not John or Bogdan 😛 ): I think the thought is to have the time between soft delete and hard delete be configured by the Superset admin. Note that the discussion here is more from the admin perspective than the user perspective. From the user's viewpoint, once something is soft deleted, they won't be able to find it in the Superset application (or maybe will only be able to find it in a trash can page). But if it's right after the deletion occurs, they could talk to the admin or self serve recover it from the trash can. But once the time between soft and hard delete expires, the entity is gone for good |
+1 to @etr2460 point time to live should be configurable in the superset config or per database and yeah it should be set by admins As for implementation, it is unlikely that we would be able to commit to it in the next couple quarters. |
thanks both! @etr2460 and @bkyryliuk
I'm not very familiar with the role permission in Superset tbh. By reading the documentation, seems like we should change the logic of can_delete, and add something like can_restore? Model & Action: models are entities like Dashboard, Slice, or User. Each model has a fixed set of permissions, like can_edit, can_show, can_delete, can_list, can_add, and so on. For example, you can allow a user to delete dashboards by adding can_delete on Dashboard entity to a role and granting this user that role. Please educate me or lmk if I missed anything 😅 |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
[SIP] Proposal for Paranoid Deletes
Motivation
At Airbnb we have a vast number of entities housed within Superset. Our deployment has tens of thousands of charts (both manually and procedurally generated), thousands of dashboards, and tens of thousands of registered datasources and tables (both physical and virtual).
In a recent analysis of a specific Druid NoSQL (native) cluster, from a sample of ~ 5k charts only 34% of charts rendered, i.e., returned a 200 status code from the
/supserset/slice_json
route.The following chart shows the renderability of charts as a function of last saved, which shows that a chart's viability often decays over time due to creep in the datasource metadata and the saved chart parameters.
Ideally we would like to have a mechanism to clean up obsolete resources (charts, dashboards, or datasources) in a somewhat paranoid manner, i.e., using soft deletes. This should help keep our deployment at a manageable size and improve the perceived reliability and quality (from a usability standpoint) of Superset assets.
Proposed Change
The proposed solution was originally mentioned by @etr2460 but I thought it was worthwhile formalizing this as a SIP. This borrows an idea from Ruby where we first soft delete records my marking them as deleted (with an associated timestamp) before performing a hard delete (deleting the record n-days later). Users could be prompted that their charts were being deleted and they can take corrective action to undelete it if they see fit.
There's actually a Python package sqla-paranoid which brings transparent soft deletes to SQLAlchemy which we could use or replicate. The TL;DR is this would add a
deleted_at
(ordeleted_on
for consistency) column which would track soft deleted records. Records which are soft deleted wouldn't show up in the CRUD views by default unless the filter was enabled (not unlike how SQL Lab Views are ignored by default in thetablemodelview
).Records could be marked using a hook, trigger, or cron as deletable based on various criterion using cascading context:
Charts
† Note this could leverage a cron (or similar) and be based on customizable rules, i.e., x of the last n-days.
Dashboards
Tables/Datasources
New or Changed Public Interfaces
We would need to updated the data model and leverage
sql-paranoid
(or similar) for enabling the soft-deletes. We would also need to update the FAB views to handle filtering/exclusion of soft deleted records. Finally we would need to implement triggers or similar to i) soft delete records, and ii) hard delete records.New dependencies
The only new dependency would be
sqla-paranoid
(no public license) if we decided not to write this ourself. Note the package only contains several hundred lines of codes.Migration Plan and Compatibility
We would need to update the schema to include the
deleted_at
(ordeleted_on
) column for certain tables. Note I think we only need this for charts, dashboards, and datasources (the cascade deletes should handle the cleanup of columns and metrics).Rejected Alternatives
None.
to: @etr2460 @mistercrunch @villebro @willbarrett
cc: @vylc
The text was updated successfully, but these errors were encountered: