fix: Aggregating indices now allow multiple field names in the key co…

…lumn (#51) * Updated build plan on external tables to they default to drop table. * Added exceptions for missing or misspelled fields in external tables. * As name is a required field for columns in external tables, removed some logic to error out with missing column names. * Updated changelog and version number. * Small edit to error messages for missing fields on external tables. * Changed default behavior on external tables: Firebolt external tables only drop and rebuild if explicity told to using a variable on dbt run-operation stage_external_sources. * Added allowable prefixes to PR title description. * Updated firebolt.dbtspec file. * Formatted integration_tests.yml to match that of code-check.yml. Added integration tests to pull_request.yml. * Now, actually added integration-tests to pull_request.yml. Fixed directory path issue in integration-tests.yml. * Some reformatting for legibility. Added docstrings. * Reordered index table names, tiny edits in error msgs, renamed type field to index_type for clarity. * Now passing secrets from pull_requests.yml to integration_tests.yml. * Commenting out integration tests from PR workflow. * Fixed misspelled variable FIREBOLT_USERNAME. * Edited pull_request.yml to use this branch for testing. Will rever later. * Bumped version number to 1.0.2. * Reverted pull_request.yml so that future integration tests will frun from main branch. * Bumped version number and added a little text to the changelog. * Cleaned up some missing index_type variable names. Bumped required python-sdk version to 0.5.2. * Cleanup on switch to index_type. Edited types for key and join columns, and added upper() to ensure match in errors in FireboltIndexConfig. * Moved todos out of comments and into dedicated Confluence doc. And yes, I know that's not actually relevant to the job at hand. * Added logic to deal with having a list of keys in aggregate indexes, and for correctly naming the indexes if there is a list of keys. * Edited changelog to remove a minor version change, as three PRs will likely be merged at once. * Changed version number to track with changelog. * Added breaking change field to changelog. * Removed two extra log statements. * Tried to add errors for instances of spaces in field names in join and agg indexes, but parsing is done in an external library, so I ended up just making minor changes in the error messages. * Rewrote validation conditionals for external table fields for better code style. * Added allowable prefixes to PR title description. * Updated firebolt.dbtspec file. * Formatted integration_tests.yml to match that of code-check.yml. Added integration tests to pull_request.yml. * Now, actually added integration-tests to pull_request.yml. Fixed directory path issue in integration-tests.yml. * Now passing secrets from pull_requests.yml to integration_tests.yml. * Fixed misspelled variable FIREBOLT_USERNAME. * Commenting out integration tests from PR workflow. * Edited changelog to remove a minor version change, as three PRs will likely be merged at once. * Pulled into a rebased branched. Fixed auto-merge. * Some reformatting for legibility. Added docstrings. * Reordered index table names, tiny edits in error msgs, renamed type field to index_type for clarity. * Edited version number and changelog. * Cleaned up some missing index_type variable names. Bumped required python-sdk version to 0.5.2. * Cleanup on switch to index_type. Edited types for key and join columns, and added upper() to ensure match in errors in FireboltIndexConfig. * Moved todos out of comments and into dedicated Confluence doc. And yes, I know that's not actually relevant to the job at hand. * Added logic to deal with having a list of keys in aggregate indexes, and for correctly naming the indexes if there is a list of keys. * Added breaking change field to changelog. * Removed two extra log statements. * Tried to add errors for instances of spaces in field names in join and agg indexes, but parsing is done in an external library, so I ended up just making minor changes in the error messages. * Updated pull request template to request integration tests are run, since we're no longer going to make them a PR action. * Cleaned up error conditionals in impl.py FireboltIndexConfig.parse(). * Removed a coupe of backticks that our security checks were complaining about. * Now removed apostrophes. Sigh.
firebolt-db · Mar 11, 2022 · 6b203b9 · 6b203b9
1 parent 96b7eda
commit 6b203b9
Show file tree

Hide file tree

Showing 10 changed files with 101 additions and 73 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -1,8 +1,20 @@
 <!---
   The PR title should indicate what has changed and should be in the active voice,
   e.g. "Added new macro to enable x" or "Fixed run twice bug".
-  This makes auto-generated release notes more valuable, as
-  they use PR titles, not descriptions.
+  This makes auto-generated release notes more valuable, as they use PR titles, 
+  not descriptions. Note that PR titles must be prefixed with specific _lowercased_
+  prefixes followed by a colon. Allowable prefixes are:
+  • feat: A new feature
+  • fix: A bug fix
+  • docs: Documentation-only changes
+  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
+  • refactor: A code change that neither fixes a bug nor adds a feature
+  • perf: A code change that improves performance
+  • test: Adding missing tests or correcting existing tests
+  • build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
+  • ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
+  • chore: Other changes that don't modify src or test files
+  • revert: Reverts a previous commit
 -->
 
 <!---
@@ -24,5 +36,7 @@ Resolves #
 - [ ] This PR includes tests, or tests are not required/relevant for this PR.
 - [ ] I have updated `CHANGELOG.md` and added information about my change.
 - [ ] If this PR requires a new PyPI release I have bumped the version number.
-- [ ] I have pulled/merged from the main branch if there are merge conflicts.
 - [ ] I have verified that this PR contains only code changes relevant to this PR.
+- [ ] If further integration tests are now passing I've edited firebolt.dbtspec to account for this.
+- [ ] I have pulled/merged from the main branch if there are merge conflicts.
+- [ ] After pulling/merging main I have run pytest on the included or updated firebolt.dbtspec.
diff --git a/.github/workflows/integration-tests.yml b/.github/workflows/integration-tests.yml
@@ -1,4 +1,5 @@
 name: Run integration tests
+
 on:
   workflow_dispatch:
   workflow_call:
@@ -7,6 +8,7 @@ on:
         required: true
       FIREBOLT_PASSWORD:
         required: true
+
 jobs:
   tests:
     runs-on: ubuntu-latest
@@ -42,5 +44,5 @@ jobs:
           API_ENDPOINT: "api.dev.firebolt.io"
           ACCOUNT_NAME: "firebolt"
         run: |
-          pytest -o log_cli=true -o log_cli_level=INFO test/integration/firebolt.dbtspec
+          pytest -o log_cli=true -o log_cli_level=INFO tests/integration/firebolt.dbtspec
         
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,9 +4,14 @@
 
 ### Under the hood
 
+- dbt-firebolt now supports aggregating indexes with multiple-column keys. 
 - Bug fix to check for `data_type` and `regex` fields when necessary on external tables.
 - Changed default behavior on external table insert to `DROP IF EXISTS`.
 
+### Breaking changes
+
+- Models with aggregating and join indexes now require `index_type` field in index config blocks rather than `type`.
+
 ## v.1.0.1
 
 ### Under the hood

diff --git a/dbt/adapters/firebolt/connections.py b/dbt/adapters/firebolt/connections.py
@@ -92,7 +92,6 @@ def exception_handler(self, sql: str):
             self.release()
             raise RuntimeException(str(e))
 
-    # TODO: Decide how much metadata we want to return.
     @classmethod
     def get_response(cls, cursor) -> AdapterResponse:
         """
@@ -104,7 +103,6 @@ def get_response(cls, cursor) -> AdapterResponse:
         the rows_affected, which I suspect isn't working properly.
         """
         return AdapterResponse(
-            # TODO: get an actual status message and "code" from the cursor
             _message='OK',
             # code=code,
             rows_affected=cursor.rowcount,

diff --git a/dbt/adapters/firebolt/impl.py b/dbt/adapters/firebolt/impl.py
@@ -19,22 +19,28 @@
 
 @dataclass
 class FireboltIndexConfig(dbtClassMixin):
-    type: str
-    join_column: Optional[str] = None
-    key_column: Optional[str] = None
+    index_type: str
+    join_column: Optional[Union[str, List[str]]] = None
+    key_column: Optional[Union[str, List[str]]] = None
     dimension_column: Optional[Union[str, List[str]]] = None
     aggregation: Optional[Union[str, List[str]]] = None
 
-    def render(self, relation):
+    def render_name(self, relation):
         """
-        Name the index according to the following format, joined by `_`:
-        relation name, key/join column, index type, timestamp (unix & UTC)
-        example index name: my_model_customer_id_join_1633504263.
+        Name an index according to the following format, joined by `_`:
+        index type, relation name, key/join column, timestamp (unix & UTC)
+        example index name: join_my_model_customer_id_1633504263.
         """
         now_unix = time.mktime(datetime.utcnow().timetuple())
-        spine_col = self.key_column if self.key_column else self.join_column
-        inputs = [relation.identifier, spine_col, str(self.type), str(int(now_unix))]
-        string = '__'.join(inputs)[0:254]
+        spine_col = '_'.join(self.key_column if self.key_column else self.join_column)
+        inputs = [
+            str(self.index_type),
+            'idx',
+            relation.identifier,
+            spine_col,
+            str(int(now_unix)),
+        ]
+        string = '__'.join(inputs)[:254]
         return string
 
     @classmethod
@@ -48,40 +54,35 @@ def parse(cls, raw_index) -> Optional['FireboltIndexConfig']:
             return None
         try:
             cls.validate(raw_index)
-
             index_config = cls.from_dict(raw_index)
-
-            if index_config.type.upper() not in ['JOIN', 'AGGREGATING']:
+            if index_config.index_type.upper() not in ['JOIN', 'AGGREGATING']:
                 dbt.exceptions.raise_compiler_error(
-                    f'Invalid index type:\n'
-                    f'  Got: {index_config.type}\n'
-                    f'  type should be either: "join" or "aggregating"'
+                    'Invalid index type:\n'
+                    f'  Got: {index_config.index_type}.\n'
+                    '  Type should be either: "join" or "aggregating."'
                 )
-            elif index_config.type == 'join' and not (
+            if index_config.index_type.upper() == 'JOIN' and not (
                 index_config.join_column and index_config.dimension_column
             ):
-
                 dbt.exceptions.raise_compiler_error(
-                    f'Invalid join index definition:\n'
-                    f'  Got: {index_config}\n'
-                    f'  join_column and dimension_column must be specified '
-                    '  for join indexes'
+                    'Invalid join index definition:\n'
+                    f'  Got: {index_config}.\n'
+                    '  join_column and dimension_column must be specified '
+                    'for join indexes.'
                 )
-            elif index_config.type == 'aggregating' and not (
+            if index_config.index_type.upper() == 'AGGREGATING' and not (
                 index_config.key_column and index_config.aggregation
             ):
-
                 dbt.exceptions.raise_compiler_error(
-                    f'Invalid aggregating index definition:\n'
-                    f'  Got: {index_config}\n'
-                    f'  key_column and aggregation must be specified'
-                    '  for join indexes'
+                    'Invalid aggregating index definition:\n'
+                    f'  Got: {index_config}.\n'
+                    '  key_column and aggregation must be specified '
+                    'for aggregating indexes.'
                 )
-            else:
-                return index_config
+            return index_config
         except ValidationError as exc:
             msg = dbt.exceptions.validator_error_message(exc)
-            dbt.exceptions.raise_compiler_error(f'Could not parse index config: {msg}')
+            dbt.exceptions.raise_compiler_error(f'Could not parse index config: {msg}.')
 
 
 @dataclass

diff --git a/dbt/include/firebolt/macros/adapters.sql b/dbt/include/firebolt/macros/adapters.sql
@@ -46,11 +46,6 @@
 {%- endmacro %}
 
 
---TODO
----add column to table
---  ALTER TABLE {{ relation }}
---  ADD COLUMN modify column {{ adapter.quote(column_name) }}
----    {{ new_column_type }} {{ on_cluster_clause(label="on cluster") }}
 {% macro firebolt__alter_column_type(relation, column_name, new_column_type) -%}
     {# stub #}
     {% call statement('alter_column_type') %}
@@ -73,14 +68,19 @@
                                create_statement,
                                spine_col,
                                other_col) -%}
-  {{ create_statement }} "{{ index_name }}" ON {{ relation }} (
-      {{ spine_col }},
-      {% if other_col is iterable and other_col is not string -%}
-          {{ other_col | join(', ') }}
-      {%- else -%}
-          {{ other_col }}
-      {%- endif -%}
-      );
+    {# Write SQL for generating a join or aggregating index. #}
+    {{ create_statement }} "{{ index_name }}" ON {{ relation }} (
+        {% if spine_col is iterable and spine_col is not string -%}
+            {{ spine_col | join(', ') }},
+        {% else -%}
+            {{ spine_col }},
+        {% endif -%}
+        {% if other_col is iterable and other_col is not string -%}
+            {{ other_col | join(', ') }}
+        {%- else -%}
+            {{ other_col }}
+        {%- endif -%}
+    );
 {%- endmacro %}
 
 
@@ -92,18 +92,22 @@
 
 
 {% macro firebolt__get_create_index_sql(relation, index_dict) -%}
+    {# Parse index inputs and send parsed input to make_create_index_sql #}
     {%- set index_config = adapter.parse_index(index_dict) -%}
-    {%- set index_name = index_config.render(relation) -%}
-    {%- set index_type = index_config.type | upper -%}
-
+    {%- set index_name = index_config.render_name(relation) -%}
+    {%- set index_type = index_config.index_type | upper -%}
     {%- if index_type == "JOIN" -%}
-        {{ make_create_index_sql(
-          relation, index_name, "CREATE JOIN INDEX",
-          index_config.join_column, index_config.dimension_column) }}
+        {{ make_create_index_sql(relation,
+                                 index_name,
+                                 "CREATE JOIN INDEX",
+                                 index_config.join_column,
+                                 index_config.dimension_column) }}
     {%- elif index_type == "AGGREGATING" -%}
-        {{ make_create_index_sql(
-          relation, index_name, "CREATE AND GENERATE AGGREGATING INDEX",
-          index_config.key_column, index_config.aggregation) }}
+        {{ make_create_index_sql(relation,
+                                 index_name,
+                                 "CREATE AND GENERATE AGGREGATING INDEX",
+                                 index_config.key_column,
+                                 index_config.aggregation) }}
     {%- endif -%}
 {%- endmacro %}
 
@@ -124,8 +128,6 @@
 
 
 {% macro firebolt__list_relations_without_caching(schema_relation) %}
-    {# TODO: schema_relation is ??
-             What does "without caching" mean in this context? #}
     {% call statement('list_tables_without_caching', fetch_result=True) -%}
         SELECT '{{ schema_relation.database }}' AS "database",
                table_name AS "name",

diff --git a/dbt/include/firebolt/macros/dbt_external_tables/get_external_build_plan.sql b/dbt/include/firebolt/macros/dbt_external_tables/get_external_build_plan.sql
@@ -5,8 +5,9 @@
         schema = source_node.schema,
         identifier = source_node.identifier
     ) %}
-    {# var(variable, Boolean) defaults to Boolean value if `variable` isn't set.
-       Firebolt doesn't do refresh because we don't need to—external tables will always
+    {# var(variable, Boolean) defaults to Boolean value if variable isnt set.
+       Firebolt doesnt do refresh because we dont need to—external tables will always
+
        pull most recent data from S3, so the default action below is to skip. #}
     {% if old_relation is none or var('ext_full_refresh', false) %}
         {% set build_plan = build_plan + [dropif(source_node),

diff --git a/dbt/include/firebolt/macros/materializations/seed.sql b/dbt/include/firebolt/macros/materializations/seed.sql
@@ -1,5 +1,3 @@
-{# TODO: write down what all of this is doing #}
-
 {% macro firebolt__get_binding_char() %}
   {# Override the wildcard character in prepared SQL statements. #}
   {{ return('?') }}

diff --git a/setup.cfg b/setup.cfg
@@ -23,7 +23,7 @@ project_urls =
 [options]
 packages = find_namespace:
 install_requires =
-    firebolt-sdk>=0.4
+    firebolt-sdk>=0.5.2
 python_requires = >=3.7
 include_package_data = True
 package_dir =

diff --git a/tests/integration/firebolt.dbtspec b/tests/integration/firebolt.dbtspec
@@ -11,12 +11,19 @@ target:
   threads: 1
 
 sequences:
+# These are passing.
   test_dbt_empty: empty
-#  test_dbt_base: base
-#  test_dbt_ephemeral: ephemeral
-#  test_dbt_incremental: incremental
-#  test_dbt_snapshot_strategy_timestamp: snapshot_strategy_timestamp
-#  test_dbt_snapshot_strategy_check_cols: snapshot_strategy_check_cols
+  test_dbt_schema_test: schema_test
   test_dbt_data_test: data_test
-#  test_dbt_schema_test: schema_test
-#  test_dbt_ephemeral_data_tests: data_test_ephemeral_models
+
+
+# These aren't passing yet.
+ # test_dbt_base: base
+ # test_dbt_ephemeral: ephemeral
+ # test_dbt_ephemeral_data_tests: data_test_ephemeral_models
+ # test_dbt_incremental: incremental
+
+# These are only needed for saving state in the DB, so maybe not
+# necessary for v1.
+  # test_dbt_snapshot_strategy_timestamp: snapshot_strategy_timestamp
+  # test_dbt_snapshot_strategy_check_cols: snapshot_strategy_check_cols