Allow CustomQuery to take an optional dict of Query objects #814

jc-harrison · 2019-05-18T12:48:06Z

There have been discussions recently regarding the desire for users who prefer to write their own SQL queries to be able to make more use of FlowMachine's caching.

It seems to me that we could go some way towards supporting this use case by making it possible to pass query objects as sub-queries of a CustomQuery.

We could do this by adding a new, optional argument to CustomQuery (e.g. dependencies) which accepts a dict of Query objects, which could be matched to placeholders in the SQL string.

Something like:

CustomQuery(
    sql="""
        SELECT ...
        FROM {another_custom_query}
        INNER JOIN {daily_location}
        ON ...
    """,
    dependencies={
        "another_custom_query": CustomQuery(...),
        "daily_location": daily_location(...)
    },
    columns=...,
)

The dependencies could then be included in query.dependencies(), and query._make_query would return something along the lines of

self.sql.format({name: query.get_query() for name, query in dependencies.items()})

which would then make use of the cached dependencies where they exist.

It woul be good to hear from FlowMachine users / potential users as to whether this is something that would be useful in practice.

The text was updated successfully, but these errors were encountered:

greenape · 2019-05-20T10:02:29Z

A legit reason to use **kwargs? 😲

I like this idea but I think it would break the hashing of custom query, so we'd need a different solution there.

greenape · 2021-11-09T15:37:33Z

Expanding on the above - custom query hashing is dependent on being able to normalise the input SQL, which we do by parsing it against the PostgreSQL parser. Obviously that's going to fail if we give it a template string instead of valid SQL, but waiting on the string to be generated is a blocking op, which we don't want to do in the __init__ - at that point, you might as well just generate the sql and pass it in to the existing class.

Maybe one could fill the gaps for hashing purposes with valid-but-meaningless SQL? e.g. sub in (select 1) as another_custom_query?

jc-harrison · 2021-11-09T15:55:06Z

We could perhaps fill the gaps with select * from {another_custom_query.fully_qualified_table_name} for hashing (i.e. for hashing purposes, assume all sub-queries are already cached)?

jc-harrison added enhancement New feature or request FlowMachine Issues related to FlowMachine labels May 18, 2019

greenape mentioned this issue Nov 16, 2021

Query.query_id can iterate over results of dependencies #4505

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow CustomQuery to take an optional dict of Query objects #814

Allow CustomQuery to take an optional dict of Query objects #814

jc-harrison commented May 18, 2019

greenape commented May 20, 2019

greenape commented Nov 9, 2021

jc-harrison commented Nov 9, 2021

Allow CustomQuery to take an optional dict of Query objects #814

Allow CustomQuery to take an optional dict of Query objects #814

Comments

jc-harrison commented May 18, 2019

greenape commented May 20, 2019

greenape commented Nov 9, 2021

jc-harrison commented Nov 9, 2021