-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new dataset/table/column models #17543
Conversation
0d8ed9c
to
20a5c90
Compare
❗ Please consider rebasing your branch to avoid db migration conflicts. |
d4fa1e0
to
9b3a015
Compare
Codecov Report
@@ Coverage Diff @@
## master #17543 +/- ##
==========================================
+ Coverage 66.21% 66.33% +0.12%
==========================================
Files 1633 1638 +5
Lines 63210 63454 +244
Branches 6409 6409
==========================================
+ Hits 41852 42092 +240
- Misses 19698 19702 +4
Partials 1660 1660
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
a0e65cb
to
4ace8da
Compare
❗ Please consider rebasing your branch to avoid db migration conflicts. |
superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @betodealmeida for making this change—and adding a slew of unit tests to boot. I think your dual write approach is the right approach and allows us (the community) to iron out any kinks in the system before making the switch.
name = sa.Column(sa.Text) | ||
type = sa.Column(sa.Text) | ||
|
||
# Columns are defined by expressions. For tables, these are the actual columns names, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one thing I find somewhat atypical, i.e., that a physical table/view column would require require and expression which needs to match the name. Why not just make expression
nullable in that case and thus the column would only represent actual SQL expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping that having expression
even for physical columns and tables will make things easier because of the consistency. One thing the new model does is automatically quote table/column names that are also identifiers; as an example, we'd have:
column = Column(
name="select",
expression="`select`", # or "select", depending on the DB
...
)
So to select that column we just need to use its expression
. Otherwise, we'd have to check if expression
is null, and then potentially encode the name in order to select that column.
❗ Please consider rebasing your branch to avoid db migration conflicts. |
1 similar comment
❗ Please consider rebasing your branch to avoid db migration conflicts. |
3f234fb
to
b1e9f18
Compare
a7ff90e
to
e0c616f
Compare
e0c616f
to
553aa89
Compare
7b92605
to
2fcdeef
Compare
142d625
to
92b76ef
Compare
This is impressive work @betodealmeida ! 🚀 Talk about swapping out the airplane's engines mid flight! |
# not exist in the migrations. The reason it does not physically exist is MySQL, | ||
# PostgreSQL, etc. have a different interpretation of uniqueness when it comes to NULL | ||
# which is problematic given the catalog and schema are optional. | ||
__table_args__ = (UniqueConstraint("database_id", "catalog", "schema", "name"),) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betodealmeida It's also not possible to apply this constraint in MySQL because TEXT cannot be used in index keys without specifying a length: https://stackoverflow.com/questions/1827063/mysql-error-key-specification-without-a-key-length
SUMMARY
This PR implements new models for SIP-68 (#14909):
Column
Table
Dataset
The associated tables are created with a migration, and the existing datasets (
SqlaTable
,SqlMetric
,TableColumn
) are migrated to the new models:Table
instance and aDataset
instance.Dataset
instance.Column
instances (in the new model it represents physical columns, derived columns, and metrics).The models are kept up-to-date via hooks added to
SqlaTable
,SqlMetric
, andTableColumn
. Every time these models are created, deleted, or updated, the new models are updated correspondingly. The sync is unidirectional, from the old models to the new ones.The next step is to modify the backend to read from the new models. Initially the API will remain unmodified, but hopefully in the future we might be able to clean it up and modernize it (in some places we haven't migrated to the v1 API yet).
After that, the backend will be modified to write to the new models. Once the backend is reading and writing from the new models we can get rid of the old models (including the Druid models).
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
No visual changes.
TESTING INSTRUCTIONS
Reviewers can run
superset load-examples
and see the new models being populated:The PR has unit tests checking that the sync works on creation, deletion, and updates.
ADDITIONAL INFORMATION