-
Notifications
You must be signed in to change notification settings - Fork 30
Remove regex from metadata queries #34
Comments
thanks ~ about 4000 - 5000 metadata table in hive and about 100 database information when i need to query :) |
I'm pretty sure the culprit here is Several months ago, we made the decision to switch dbt-snowflake from performing ilike queries of Presto supports both As in dbt-snowflake, it will require a bit of extra handling to first figure out the right quoting and casing of available schemas. |
I modified the adapters.sql and catalog.sql files。 adapters.sql : use '=' replace 'ilike' It now takes about 2 minutes to execute。so , thansk a lot . For now, it's settled 。 thank you vary much .and finally ,snapshot materialization not implemented for presto。It's such a pity @@/(ㄒoㄒ)/~~ |
@WangSleep commented on Thu Nov 19 2020
When i excute "dbt docs generate" on CLI .it dosen't work .
it's error message :
PrestoQueryError(type=INTERNAL_ERROR, name=OPTIMIZER_TIMEOUT, message="The optimizer exhausted the time limit of 180000 ms", query_id=20201110_082631_11221_7vjyv)
i can excute "dbt deug" , "dbt test" ,"dbt run" on CLI . but "dbt docs generate" alone can't
Can you solve my problem?
thanks a lot ~~~😊
@WangSleep commented on Thu Nov 19 2020
I found out when I performed "dbt docs generate" , it will excute this sql
select table_catalog as database, table_name as name, table_schema as schema, case when table_type = 'BASE TABLE' then 'table' when table_type = 'VIEW' then 'view' else table_type end as table_type from "hive".INFORMATION_SCHEMA.tables where regexp_like(table_schema, '(?i)\ADatebase\Z')
Too many resources are required, and execution time is too long to cause a timeout 。
Could you tell me what can be solved? thanks a lot
@jtcohen6 commented on Thu Nov 19 2020
Hey @WangSleep, which database / adapter plugin are you using? It sounds like the catalog query—which dbt uses to grab metadata, populate
catalog.json
, and thereby feed information into the docs site—may be inefficient as currently implemented on this adapter.@WangSleep commented on Thu Nov 19 2020
yeh .
We query by configuring presto-python-client, which I know is for docs, but python_presto query "hive". INFORMATION_SCHEMA is slow. This causes a timeout. Later I found out that I used lore-zeppelin, which was based on jdbc, and I could query. It takes about 5 minutes.
thanks a lot ~
@jtcohen6 commented on Fri Nov 20 2020
I'm still a bit confused. Are you using dbt-presto? What's the output when you run
dbt --version
?@WangSleep commented on Sun Nov 22 2020
hi , bro !
yes .
I'm using it . dbt --version : 0.18.1
@jtcohen6 commented on Mon Nov 23 2020
We haven't released dbt-presto v0.18.1 yet (which is on us), and
presto__get_catalog
as implemented in that plugin looks different from the query snippet you pasted above. (In fact, your snippet doesn't look like any of theget_catalog
queries familiar to me.)So... I'm still confused as to which dbt plugin you're using :)
@WangSleep commented on Mon Nov 23 2020
hi . @jtcohen6
FYI
This is the version I installed :)
@jtcohen6 commented on Tue Nov 24 2020
Ok, this makes much more sense now. Appreciate the clarification!
The snippet you included comes from
presto__list_relations_without_caching
. I'm not sure why it only times out ondbt docs generate
, since that's a query that should run at the start of all invocations.I'm going to transfer this issue to the dbt-presto repo, where we can discuss if there is a more performant way to write that metadata introspection. Out of curiosity, how many objects are in your
"hive"
catalog?The text was updated successfully, but these errors were encountered: