Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improving handling for tag relationship when deleting assets #29117

Merged
merged 1 commit into from
Jun 10, 2024

Conversation

Vitor-Avila
Copy link
Contributor

SUMMARY

The relationship between the tagged_objects table and the asset table (for example, dashboard) is a combination of the object_id and object_type. While this relationship works properly when displaying tags/etc, when deleting an asset SQLAlchemy was trying to run DELETE FROM tagged_object WHERE tagged_object.tag_id = ? AND tagged_object.object_id = ? which would delete more rows then it should (it's missing another filter for object_type), causing the operation to fail with an error:

sqlalchemy.orm.exc.StaleDataError: DELETE statement on table 'tagged_object' expected to delete 3 row(s); Only 10 were matched.

This PR adds the passive_deletes=True argument to the relationship, so that the cascade delete is handled by the DB (based on the constraints).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

No UI changes.

TESTING INSTRUCTIONS

  1. Make sure you have a dashboard and a chart with the same ID.
  2. Assign a tag for both assets. Also add owners to both assets.
  3. Validate that you're able to delete one of the assets.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@dosubot dosubot bot added dashboard Namespace | Anything related to the Dashboard sqllab Namespace | Anything related to the SQL Lab labels Jun 7, 2024
@Vitor-Avila
Copy link
Contributor Author

I'm checking if I can add some tests around this flow to prevent a regression

Copy link

codecov bot commented Jun 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.65%. Comparing base (76d897e) to head (9ad1ead).
Report is 1094 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #29117      +/-   ##
==========================================
+ Coverage   60.48%   64.65%   +4.16%     
==========================================
  Files        1931      518    -1413     
  Lines       76236    37492   -38744     
  Branches     8568        0    -8568     
==========================================
- Hits        46114    24239   -21875     
+ Misses      28017    13253   -14764     
+ Partials     2105        0    -2105     
Flag Coverage Δ
hive 48.92% <ø> (-0.24%) ⬇️
javascript ?
presto 53.53% <ø> (-0.27%) ⬇️
python 64.65% <ø> (+1.16%) ⬆️
unit 59.04% <ø> (+1.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Vitor-Avila
Copy link
Contributor Author

Adding tests here would be complex, since it requires having assets (dashboards and charts) with the same ID and forcing the ID during the asset creation is atypical and could impact other tests.

I wanted to be extra sure the passive_deletes argument would handle this correctly, specially since the FK set on TaggedObject.object_id (which is currently mapped to multiple tables) is only applied in the SQLAlchemy level (couldn't find any migration to implement this relationship in the DB level, and AFAIK the DB doesn't support a single column being a FK to multiple tables):

object_id = Column(
    Integer,
    ForeignKey("dashboards.id"),
    ForeignKey("slices.id"),
    ForeignKey("saved_query.id"),
)

I did some tests running Superset with a PSQL DB with query logging enabled, and the deletion queries look correct (they're including a filter for the object_type):

2024-06-10 21:25:13 UTC:189.6.252.197(13164):preset@superset_local:[3611]:LOG: statement: DELETE FROM dashboard_slices WHERE dashboard_slices.dashboard_id = 1 AND dashboard_slices.slice_id = 1
2024-06-10 21:25:13 UTC:189.6.252.197(13164):preset@superset_local:[3611]:LOG: duration: 0.211 ms
2024-06-10 21:25:13 UTC:189.6.252.197(13164):preset@superset_local:[3611]:LOG: statement: DELETE FROM dashboards WHERE dashboards.id = 1
2024-06-10 21:25:13 UTC:189.6.252.197(13164):preset@superset_local:[3611]:LOG: duration: 0.958 ms
2024-06-10 21:25:13 UTC:189.6.252.197(13164):preset@superset_local:[3611]:LOG: statement: DELETE FROM tagged_object WHERE tagged_object.object_type = 'dashboard' AND tagged_object.object_id = 1
2024-06-10 21:40:46 UTC:189.6.252.197(13148):preset@superset_local:[4186]:LOG: statement: DELETE FROM saved_query WHERE saved_query.id = 1
2024-06-10 21:40:46 UTC:189.6.252.197(13148):preset@superset_local:[4186]:LOG: duration: 0.334 ms
2024-06-10 21:40:46 UTC:189.6.252.197(13148):preset@superset_local:[4186]:LOG: statement: DELETE FROM tagged_object WHERE tagged_object.object_type = 'query' AND tagged_object.object_id = 1
2024-06-10 21:41:07 UTC:189.6.252.197(13150):preset@superset_local:[4188]:LOG: statement: DELETE FROM slices WHERE slices.id = 1
2024-06-10 21:41:07 UTC:189.6.252.197(13150):preset@superset_local:[4188]:LOG: duration: 0.709 ms
2024-06-10 21:41:07 UTC:189.6.252.197(13150):preset@superset_local:[4188]:LOG: statement: DELETE FROM tagged_object WHERE tagged_object.object_type = 'chart' AND tagged_object.object_id = 1

We could also assess a bigger alternative which would be changing the tagged_object table to have one column per FK (chart_id, dashboard_id, query_id) which should make the relationship simpler and work with the SQLAlchemy default cascade behavior (but would require a DB migration).

@mistercrunch
Copy link
Member

Given that the DELETE statements look good, and the complexity of the alternative solutions, I think this is solid for now. Let's merge to fix the issue at hand and we can reconsider other approaches in the future.

@mistercrunch mistercrunch merged commit dd67772 into apache:master Jun 10, 2024
39 of 40 checks passed
@john-bodley
Copy link
Member

@Vitor-Avila I'm not sure this is the right approach, i.e., historically errors of the form,

sqlalchemy.orm.exc.StaleDataError: DELETE statement on table 'tagged_object' expected to delete 3 row(s); Only 10 were matched.

typically suggest there's something borked within our ORM.

@mistercrunch
Copy link
Member

It seemed like the right longer term solution might be to have one bridge table for each tag relationship (?)

@Vitor-Avila
Copy link
Contributor Author

@john-bodley 100% agreed! This is currently happening because the statement to delete entries from the tagged_object table does not include a filter for the object_type column, so SQLAlchemy was expecting to delete x rows, but the operation would actually delete more than that.

I worked on this as an attempt to have a fast fix in place (the issue is pretty simple to reproduce, make sure you have a dashboard and a chart with the same ID, assign tags/owners to both and try to delete one).

Since it seems this solution didn't fully solve the issue just yet, I just submitted an alternative approach that I believe is better: #29229

eschutho pushed a commit that referenced this pull request Jun 23, 2024
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.1.0 labels Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels dashboard Namespace | Anything related to the Dashboard size/S sqllab Namespace | Anything related to the SQL Lab 🚢 4.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants