Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CustomQuery to take an optional dict of Query objects #814

Open
jc-harrison opened this issue May 18, 2019 · 3 comments
Open

Allow CustomQuery to take an optional dict of Query objects #814

jc-harrison opened this issue May 18, 2019 · 3 comments
Labels
enhancement New feature or request FlowMachine Issues related to FlowMachine

Comments

@jc-harrison
Copy link
Member

There have been discussions recently regarding the desire for users who prefer to write their own SQL queries to be able to make more use of FlowMachine's caching.

It seems to me that we could go some way towards supporting this use case by making it possible to pass query objects as sub-queries of a CustomQuery.

We could do this by adding a new, optional argument to CustomQuery (e.g. dependencies) which accepts a dict of Query objects, which could be matched to placeholders in the SQL string.

Something like:

CustomQuery(
    sql="""
        SELECT ...
        FROM {another_custom_query}
        INNER JOIN {daily_location}
        ON ...
    """,
    dependencies={
        "another_custom_query": CustomQuery(...),
        "daily_location": daily_location(...)
    },
    columns=...,
)

The dependencies could then be included in query.dependencies(), and query._make_query would return something along the lines of

self.sql.format({name: query.get_query() for name, query in dependencies.items()})

which would then make use of the cached dependencies where they exist.

It woul be good to hear from FlowMachine users / potential users as to whether this is something that would be useful in practice.

@jc-harrison jc-harrison added enhancement New feature or request FlowMachine Issues related to FlowMachine labels May 18, 2019
@greenape
Copy link
Member

A legit reason to use **kwargs? 😲

I like this idea but I think it would break the hashing of custom query, so we'd need a different solution there.

@greenape
Copy link
Member

greenape commented Nov 9, 2021

Expanding on the above - custom query hashing is dependent on being able to normalise the input SQL, which we do by parsing it against the PostgreSQL parser. Obviously that's going to fail if we give it a template string instead of valid SQL, but waiting on the string to be generated is a blocking op, which we don't want to do in the __init__ - at that point, you might as well just generate the sql and pass it in to the existing class.

Maybe one could fill the gaps for hashing purposes with valid-but-meaningless SQL? e.g. sub in (select 1) as another_custom_query?

@jc-harrison
Copy link
Member Author

We could perhaps fill the gaps with select * from {another_custom_query.fully_qualified_table_name} for hashing (i.e. for hashing purposes, assume all sub-queries are already cached)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request FlowMachine Issues related to FlowMachine
Projects
None yet
Development

No branches or pull requests

2 participants