-
Notifications
You must be signed in to change notification settings - Fork 30
Updates catalog queries to use strict equality #40
Updates catalog queries to use strict equality #40
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @clausherther!!
I'd request two changes:
- make the comparisons case-insensitive
- remove the
presto_ilike
macro, to cut down surface area we're no longer using
{% if relation.schema %} | ||
and {{ presto_ilike('table_schema', relation.schema) }} | ||
and table_schema = '{{ relation.schema }}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll still want these comparisons to be case-insensitive:
and table_schema = '{{ relation.schema }}' | |
and lower(table_schema) = lower('{{ relation.schema }}') |
Lower or upper, whichever you prefer, and the same for the ones below. I did find an interesting note in the docs:
The
lower()
andupper()
functions do not perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish and Azeri.
I don't think that's something we should have to worry about right here & right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, funny about that case insensitivity: on our (rather large) cluster, wrapping table_schema
in upper
also fails. I tried submitting a fix for catalog.sql
as well to make dbt docs generate
work and ran into that. I have an internal support thread going but don't think there's a good fix there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, doing something like
where upper(table_schema) = '{{ relation.schema | upper }}'
which compiles to
where upper(table_schema) = 'CLAUSH_DEV'
fails with
PrestoExternalError(type=EXTERNAL, name=HIVE_METASTORE_ERROR, message="Internal error processing get_all_tables", query_id=20210504_212530_12029_bz8fk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What! Oy... I'm guessing that's why we initially went the regex route. This still feels like a net improvement, we just may want to document a recommendation against mixed-case catalog/schema names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, at least on our cluster it seems that table_schema
is stored in lower-case letters even if you created the schema upper case.
For example:
create schema "CLAUSH_TEST";
create table "CLAUSH_TEST".test as (select 1 as col1)
select *
from "iceberg".INFORMATION_SCHEMA.tables
where
table_schema != 'information_schema'
and
table_schema = 'claush_test'
returns 1 row:
table_catalog table_schema table_name table_type
iceberg claush_test test BASE TABLE
So, how do you feel something like:
where table_schema = '{{ relation.schema | lower }}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use lower case on right side of comparison and updated README accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if, in fact, Presto identifiers are never case-sensitive... (prestodb/presto#2863, trinodb/trino#17)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Although the comparison operator certainly is case-sensitive. So,
select * from iceberg.INFORMATION_SCHEMA.tables where table_schema = 'claush'
works, while
select * from iceberg.INFORMATION_SCHEMA.tables where table_schema = 'CLAUSH'
does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clausherther Sorry I lost track of this one! Thanks for the contribution, this LGTM.
FYI @iknox-fa I'm gong to merge into master
now, so that we can include in 0.20.0rc1
👏 yay! |
Fixes #34 (hopefully)
This PR updates the following macros in
adapter.sql
to use strict equality instead ofilike
and make filter parameters lower case on right-hand side of filter:Also, updates
get_catalog
incatalog.sql
to:table_schema
predicate up for performance reasonsPresto catalogs seem to store values lower case (at least on hive). Adding a function like
upper
to the left side of a catalog filter causes some catalogs to not be able to push filter predicates down to the catalog provider, causing resource problems.To avoid that, we assume values are stored in lower case and force incoming schema parameters to lower case as well.