Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new dataset/table/column models #17543

Merged
merged 22 commits into from
Feb 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions superset/columns/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
99 changes: 99 additions & 0 deletions superset/columns/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""
Column model.

This model was introduced in SIP-68 (https://github.com/apache/superset/issues/14909),
and represents a "column" in a table or dataset. In addition to a column, new models for
tables, metrics, and datasets were also introduced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the connectors, don't we normally put the model files in the superset/models/* folders?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to have this aligned with #9077. superset/models/ came before that (I think). But I have no strong preference here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar @john-bodley any thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like @willbarrett has models at the top level in that Sip, but I don't have much more context than that.


These models are not fully implemented, and shouldn't be used yet.
"""

import sqlalchemy as sa
from flask_appbuilder import Model

from superset.models.helpers import (
AuditMixinNullable,
ExtraJSONMixin,
ImportExportMixin,
)


class Column(
Model, AuditMixinNullable, ExtraJSONMixin, ImportExportMixin,
):
"""
A "column".

The definition of column here is overloaded: it can represent a physical column in a
database relation (table or view); a computed/derived column on a dataset; or an
aggregation expression representing a metric.
"""

__tablename__ = "sl_columns"
betodealmeida marked this conversation as resolved.
Show resolved Hide resolved

id = sa.Column(sa.Integer, primary_key=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker, but more of a question.. when do we want to start using uuids instead of auto-incrementing integers for public-facing ids?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would need a SIP and a major release for that, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new model, I don't see any reason why we couldn't add it in here since it wouldn't break anything, but yes I agree on a SIP regarding an overall change toward that methodology. If that's the blocker, then that makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point. But the idea is that at least initially the APIs would be unchanged, we'd just reroute the backend to read from the new models instead of the old ones.


# We use ``sa.Text`` for these attributes because (1) in modern databases the
# performance is the same as ``VARCHAR``[1] and (2) because some table names can be
# **really** long (eg, Google Sheets URLs).
#
# [1] https://www.postgresql.org/docs/9.1/datatype-character.html
name = sa.Column(sa.Text)
type = sa.Column(sa.Text)

# Columns are defined by expressions. For tables, these are the actual columns names,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one thing I find somewhat atypical, i.e., that a physical table/view column would require require and expression which needs to match the name. Why not just make expression nullable in that case and thus the column would only represent actual SQL expressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping that having expression even for physical columns and tables will make things easier because of the consistency. One thing the new model does is automatically quote table/column names that are also identifiers; as an example, we'd have:

column = Column(
    name="select",
    expression="`select`",  # or "select", depending on the DB
    ...
)

So to select that column we just need to use its expression. Otherwise, we'd have to check if expression is null, and then potentially encode the name in order to select that column.

# and should match the ``name`` attribute. For datasets, these can be any valid SQL
# expression. If the SQL expression is an aggregation the column is a metric,
# otherwise it's a computed column.
expression = sa.Column(sa.Text)

# Does the expression point directly to a physical column?
is_physical = sa.Column(sa.Boolean, default=True)

# Additional metadata describing the column.
description = sa.Column(sa.Text)
warning_text = sa.Column(sa.Text)
unit = sa.Column(sa.Text)

# Is this a time column? Useful for plotting time series.
is_temporal = sa.Column(sa.Boolean, default=False)

# Is this a spatial column? This could be leveraged in the future for spatial
# visualizations.
is_spatial = sa.Column(sa.Boolean, default=False)

# Is this column a partition? Useful for scheduling queries and previewing the latest
# data.
is_partition = sa.Column(sa.Boolean, default=False)

# Is this column an aggregation (metric)?
is_aggregation = sa.Column(sa.Boolean, default=False)

# Assuming the column is an aggregation, is it additive? Useful for determining which
# aggregations can be done on the metric. Eg, ``COUNT(DISTINCT user_id)`` is not
# additive, so it shouldn't be used in a ``SUM``.
is_additive = sa.Column(sa.Boolean, default=False)

# Is an increase desired? Useful for displaying the results of A/B tests, or setting
# up alerts. Eg, this is true for "revenue", but false for "latency".
is_increase_desired = sa.Column(sa.Boolean, default=True)

# Column is managed externally and should be read-only inside Superset
is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
external_url = sa.Column(sa.Text, nullable=True)
40 changes: 40 additions & 0 deletions superset/columns/schemas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""
Schema for the column model.

This model was introduced in SIP-68 (https://github.com/apache/superset/issues/14909),
and represents a "column" in a table or dataset. In addition to a column, new models for
tables, metrics, and datasets were also introduced.

These models are not fully implemented, and shouldn't be used yet.
"""

from marshmallow_sqlalchemy import SQLAlchemyAutoSchema

from superset.columns.models import Column


class ColumnSchema(SQLAlchemyAutoSchema):
betodealmeida marked this conversation as resolved.
Show resolved Hide resolved
"""
Schema for the ``Column`` model.
"""

class Meta: # pylint: disable=too-few-public-methods
model = Column
load_instance = True
include_relationships = True
Loading