Skip to content

Commit

Permalink
fix: Aggregating indices now allow multiple field names in the key co…
Browse files Browse the repository at this point in the history
…lumn (#51)

* Updated build plan on external tables to they default to drop table.

* Added exceptions for missing or misspelled fields in external tables.

* As name is a required field for columns in external tables, removed some logic to error out with missing column names.

* Updated changelog and version number.

* Small edit to error messages for missing fields on external tables.

* Changed default behavior on external tables: Firebolt external tables only drop and rebuild if explicity told to using a variable on dbt run-operation stage_external_sources.

* Added allowable prefixes to PR title description.

* Updated firebolt.dbtspec file.

* Formatted integration_tests.yml to match that of code-check.yml. Added integration tests to pull_request.yml.

* Now, actually added integration-tests to pull_request.yml. Fixed directory path issue in integration-tests.yml.

* Some reformatting for legibility. Added docstrings.

* Reordered index table names, tiny edits in error msgs, renamed type field to index_type for clarity.

* Now passing secrets from pull_requests.yml to integration_tests.yml.

* Commenting out integration tests from PR workflow.

* Fixed misspelled variable FIREBOLT_USERNAME.

* Edited pull_request.yml to use this branch for testing. Will rever later.

* Bumped version number to 1.0.2.

* Reverted pull_request.yml so that future integration tests will frun from main branch.

* Bumped version number and added a little text to the changelog.

* Cleaned up some missing index_type variable names. Bumped required python-sdk version to 0.5.2.

* Cleanup on switch to index_type. Edited types for key and join columns, and added upper() to ensure match in errors in FireboltIndexConfig.

* Moved todos out of comments and into dedicated Confluence doc. And yes, I know that's not actually relevant to the job at hand.

* Added logic to deal with having a list of keys in aggregate indexes, and for correctly naming the indexes if there is a list of keys.

* Edited changelog to remove a minor version change, as three PRs will likely be merged at once.

* Changed version number to track with changelog.

* Added breaking change field to changelog.

* Removed two extra log statements.

* Tried to add errors for instances of spaces in field names in join and agg indexes, but parsing is done in an external library, so I ended up just making minor changes in the error messages.

* Rewrote validation conditionals for external table fields for better code style.

* Added allowable prefixes to PR title description.

* Updated firebolt.dbtspec file.

* Formatted integration_tests.yml to match that of code-check.yml. Added integration tests to pull_request.yml.

* Now, actually added integration-tests to pull_request.yml. Fixed directory path issue in integration-tests.yml.

* Now passing secrets from pull_requests.yml to integration_tests.yml.

* Fixed misspelled variable FIREBOLT_USERNAME.

* Commenting out integration tests from PR workflow.

* Edited changelog to remove a minor version change, as three PRs will likely be merged at once.

* Pulled into a rebased branched. Fixed auto-merge.

* Some reformatting for legibility. Added docstrings.

* Reordered index table names, tiny edits in error msgs, renamed type field to index_type for clarity.

* Edited version number and changelog.

* Cleaned up some missing index_type variable names. Bumped required python-sdk version to 0.5.2.

* Cleanup on switch to index_type. Edited types for key and join columns, and added upper() to ensure match in errors in FireboltIndexConfig.

* Moved todos out of comments and into dedicated Confluence doc. And yes, I know that's not actually relevant to the job at hand.

* Added logic to deal with having a list of keys in aggregate indexes, and for correctly naming the indexes if there is a list of keys.

* Added breaking change field to changelog.

* Removed two extra log statements.

* Tried to add errors for instances of spaces in field names in join and agg indexes, but parsing is done in an external library, so I ended up just making minor changes in the error messages.

* Updated pull request template to request integration tests are run, since we're no longer going to make them a PR action.

* Cleaned up error conditionals in impl.py FireboltIndexConfig.parse().

* Removed a coupe of backticks that our security checks were complaining about.

* Now removed apostrophes. Sigh.
  • Loading branch information
ericf-firebolt authored Mar 11, 2022
1 parent 96b7eda commit 6b203b9
Show file tree
Hide file tree
Showing 10 changed files with 101 additions and 73 deletions.
20 changes: 17 additions & 3 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,20 @@
<!---
The PR title should indicate what has changed and should be in the active voice,
e.g. "Added new macro to enable x" or "Fixed run twice bug".
This makes auto-generated release notes more valuable, as
they use PR titles, not descriptions.
This makes auto-generated release notes more valuable, as they use PR titles,
not descriptions. Note that PR titles must be prefixed with specific _lowercased_
prefixes followed by a colon. Allowable prefixes are:
• feat: A new feature
• fix: A bug fix
• docs: Documentation-only changes
• style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
• refactor: A code change that neither fixes a bug nor adds a feature
• perf: A code change that improves performance
• test: Adding missing tests or correcting existing tests
• build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
• ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
• chore: Other changes that don't modify src or test files
• revert: Reverts a previous commit
-->

<!---
Expand All @@ -24,5 +36,7 @@ Resolves #
- [ ] This PR includes tests, or tests are not required/relevant for this PR.
- [ ] I have updated `CHANGELOG.md` and added information about my change.
- [ ] If this PR requires a new PyPI release I have bumped the version number.
- [ ] I have pulled/merged from the main branch if there are merge conflicts.
- [ ] I have verified that this PR contains only code changes relevant to this PR.
- [ ] If further integration tests are now passing I've edited firebolt.dbtspec to account for this.
- [ ] I have pulled/merged from the main branch if there are merge conflicts.
- [ ] After pulling/merging main I have run pytest on the included or updated firebolt.dbtspec.
4 changes: 3 additions & 1 deletion .github/workflows/integration-tests.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
name: Run integration tests

on:
workflow_dispatch:
workflow_call:
Expand All @@ -7,6 +8,7 @@ on:
required: true
FIREBOLT_PASSWORD:
required: true

jobs:
tests:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -42,5 +44,5 @@ jobs:
API_ENDPOINT: "api.dev.firebolt.io"
ACCOUNT_NAME: "firebolt"
run: |
pytest -o log_cli=true -o log_cli_level=INFO test/integration/firebolt.dbtspec
pytest -o log_cli=true -o log_cli_level=INFO tests/integration/firebolt.dbtspec
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@

### Under the hood

- dbt-firebolt now supports aggregating indexes with multiple-column keys.
- Bug fix to check for `data_type` and `regex` fields when necessary on external tables.
- Changed default behavior on external table insert to `DROP IF EXISTS`.

### Breaking changes

- Models with aggregating and join indexes now require `index_type` field in index config blocks rather than `type`.

## v.1.0.1

### Under the hood
Expand Down
2 changes: 0 additions & 2 deletions dbt/adapters/firebolt/connections.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,6 @@ def exception_handler(self, sql: str):
self.release()
raise RuntimeException(str(e))

# TODO: Decide how much metadata we want to return.
@classmethod
def get_response(cls, cursor) -> AdapterResponse:
"""
Expand All @@ -104,7 +103,6 @@ def get_response(cls, cursor) -> AdapterResponse:
the rows_affected, which I suspect isn't working properly.
"""
return AdapterResponse(
# TODO: get an actual status message and "code" from the cursor
_message='OK',
# code=code,
rows_affected=cursor.rowcount,
Expand Down
63 changes: 32 additions & 31 deletions dbt/adapters/firebolt/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,28 @@

@dataclass
class FireboltIndexConfig(dbtClassMixin):
type: str
join_column: Optional[str] = None
key_column: Optional[str] = None
index_type: str
join_column: Optional[Union[str, List[str]]] = None
key_column: Optional[Union[str, List[str]]] = None
dimension_column: Optional[Union[str, List[str]]] = None
aggregation: Optional[Union[str, List[str]]] = None

def render(self, relation):
def render_name(self, relation):
"""
Name the index according to the following format, joined by `_`:
relation name, key/join column, index type, timestamp (unix & UTC)
example index name: my_model_customer_id_join_1633504263.
Name an index according to the following format, joined by `_`:
index type, relation name, key/join column, timestamp (unix & UTC)
example index name: join_my_model_customer_id_1633504263.
"""
now_unix = time.mktime(datetime.utcnow().timetuple())
spine_col = self.key_column if self.key_column else self.join_column
inputs = [relation.identifier, spine_col, str(self.type), str(int(now_unix))]
string = '__'.join(inputs)[0:254]
spine_col = '_'.join(self.key_column if self.key_column else self.join_column)
inputs = [
str(self.index_type),
'idx',
relation.identifier,
spine_col,
str(int(now_unix)),
]
string = '__'.join(inputs)[:254]
return string

@classmethod
Expand All @@ -48,40 +54,35 @@ def parse(cls, raw_index) -> Optional['FireboltIndexConfig']:
return None
try:
cls.validate(raw_index)

index_config = cls.from_dict(raw_index)

if index_config.type.upper() not in ['JOIN', 'AGGREGATING']:
if index_config.index_type.upper() not in ['JOIN', 'AGGREGATING']:
dbt.exceptions.raise_compiler_error(
f'Invalid index type:\n'
f' Got: {index_config.type}\n'
f' type should be either: "join" or "aggregating"'
'Invalid index type:\n'
f' Got: {index_config.index_type}.\n'
' Type should be either: "join" or "aggregating."'
)
elif index_config.type == 'join' and not (
if index_config.index_type.upper() == 'JOIN' and not (
index_config.join_column and index_config.dimension_column
):

dbt.exceptions.raise_compiler_error(
f'Invalid join index definition:\n'
f' Got: {index_config}\n'
f' join_column and dimension_column must be specified '
' for join indexes'
'Invalid join index definition:\n'
f' Got: {index_config}.\n'
' join_column and dimension_column must be specified '
'for join indexes.'
)
elif index_config.type == 'aggregating' and not (
if index_config.index_type.upper() == 'AGGREGATING' and not (
index_config.key_column and index_config.aggregation
):

dbt.exceptions.raise_compiler_error(
f'Invalid aggregating index definition:\n'
f' Got: {index_config}\n'
f' key_column and aggregation must be specified'
' for join indexes'
'Invalid aggregating index definition:\n'
f' Got: {index_config}.\n'
' key_column and aggregation must be specified '
'for aggregating indexes.'
)
else:
return index_config
return index_config
except ValidationError as exc:
msg = dbt.exceptions.validator_error_message(exc)
dbt.exceptions.raise_compiler_error(f'Could not parse index config: {msg}')
dbt.exceptions.raise_compiler_error(f'Could not parse index config: {msg}.')


@dataclass
Expand Down
50 changes: 26 additions & 24 deletions dbt/include/firebolt/macros/adapters.sql
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,6 @@
{%- endmacro %}


--TODO
---add column to table
-- ALTER TABLE {{ relation }}
-- ADD COLUMN modify column {{ adapter.quote(column_name) }}
--- {{ new_column_type }} {{ on_cluster_clause(label="on cluster") }}
{% macro firebolt__alter_column_type(relation, column_name, new_column_type) -%}
{# stub #}
{% call statement('alter_column_type') %}
Expand All @@ -73,14 +68,19 @@
create_statement,
spine_col,
other_col) -%}
{{ create_statement }} "{{ index_name }}" ON {{ relation }} (
{{ spine_col }},
{% if other_col is iterable and other_col is not string -%}
{{ other_col | join(', ') }}
{%- else -%}
{{ other_col }}
{%- endif -%}
);
{# Write SQL for generating a join or aggregating index. #}
{{ create_statement }} "{{ index_name }}" ON {{ relation }} (
{% if spine_col is iterable and spine_col is not string -%}
{{ spine_col | join(', ') }},
{% else -%}
{{ spine_col }},
{% endif -%}
{% if other_col is iterable and other_col is not string -%}
{{ other_col | join(', ') }}
{%- else -%}
{{ other_col }}
{%- endif -%}
);
{%- endmacro %}


Expand All @@ -92,18 +92,22 @@


{% macro firebolt__get_create_index_sql(relation, index_dict) -%}
{# Parse index inputs and send parsed input to make_create_index_sql #}
{%- set index_config = adapter.parse_index(index_dict) -%}
{%- set index_name = index_config.render(relation) -%}
{%- set index_type = index_config.type | upper -%}

{%- set index_name = index_config.render_name(relation) -%}
{%- set index_type = index_config.index_type | upper -%}
{%- if index_type == "JOIN" -%}
{{ make_create_index_sql(
relation, index_name, "CREATE JOIN INDEX",
index_config.join_column, index_config.dimension_column) }}
{{ make_create_index_sql(relation,
index_name,
"CREATE JOIN INDEX",
index_config.join_column,
index_config.dimension_column) }}
{%- elif index_type == "AGGREGATING" -%}
{{ make_create_index_sql(
relation, index_name, "CREATE AND GENERATE AGGREGATING INDEX",
index_config.key_column, index_config.aggregation) }}
{{ make_create_index_sql(relation,
index_name,
"CREATE AND GENERATE AGGREGATING INDEX",
index_config.key_column,
index_config.aggregation) }}
{%- endif -%}
{%- endmacro %}

Expand All @@ -124,8 +128,6 @@


{% macro firebolt__list_relations_without_caching(schema_relation) %}
{# TODO: schema_relation is ??
What does "without caching" mean in this context? #}
{% call statement('list_tables_without_caching', fetch_result=True) -%}
SELECT '{{ schema_relation.database }}' AS "database",
table_name AS "name",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@
schema = source_node.schema,
identifier = source_node.identifier
) %}
{# var(variable, Boolean) defaults to Boolean value if `variable` isn't set.
Firebolt doesn't do refresh because we don't need to—external tables will always
{# var(variable, Boolean) defaults to Boolean value if variable isnt set.
Firebolt doesnt do refresh because we dont need to—external tables will always

pull most recent data from S3, so the default action below is to skip. #}
{% if old_relation is none or var('ext_full_refresh', false) %}
{% set build_plan = build_plan + [dropif(source_node),
Expand Down
2 changes: 0 additions & 2 deletions dbt/include/firebolt/macros/materializations/seed.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
{# TODO: write down what all of this is doing #}

{% macro firebolt__get_binding_char() %}
{# Override the wildcard character in prepared SQL statements. #}
{{ return('?') }}
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ project_urls =
[options]
packages = find_namespace:
install_requires =
firebolt-sdk>=0.4
firebolt-sdk>=0.5.2
python_requires = >=3.7
include_package_data = True
package_dir =
Expand Down
21 changes: 14 additions & 7 deletions tests/integration/firebolt.dbtspec
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,19 @@ target:
threads: 1

sequences:
# These are passing.
test_dbt_empty: empty
# test_dbt_base: base
# test_dbt_ephemeral: ephemeral
# test_dbt_incremental: incremental
# test_dbt_snapshot_strategy_timestamp: snapshot_strategy_timestamp
# test_dbt_snapshot_strategy_check_cols: snapshot_strategy_check_cols
test_dbt_schema_test: schema_test
test_dbt_data_test: data_test
# test_dbt_schema_test: schema_test
# test_dbt_ephemeral_data_tests: data_test_ephemeral_models


# These aren't passing yet.
# test_dbt_base: base
# test_dbt_ephemeral: ephemeral
# test_dbt_ephemeral_data_tests: data_test_ephemeral_models
# test_dbt_incremental: incremental

# These are only needed for saving state in the DB, so maybe not
# necessary for v1.
# test_dbt_snapshot_strategy_timestamp: snapshot_strategy_timestamp
# test_dbt_snapshot_strategy_check_cols: snapshot_strategy_check_cols

0 comments on commit 6b203b9

Please sign in to comment.