Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Union to take a list of queries #4565

Closed
jc-harrison opened this issue Nov 19, 2021 · 0 comments · Fixed by #4593
Closed

Allow Union to take a list of queries #4565

jc-harrison opened this issue Nov 19, 2021 · 0 comments · Fixed by #4593
Labels
FlowMachine Issues related to FlowMachine refactoring

Comments

@jc-harrison
Copy link
Member

jc-harrison commented Nov 19, 2021

Is your feature request related to a problem? Please describe.
Currently, Union produces a union of two queries. Taking a union of more than two queries can be achieved by recursively applying Union (i.e. a.union(b).union(c).union(d)), but this results in a deeply-nested structure (both within the SQL query and in terms of the python Query objects), which may cause performance issues and could also result in excessive duplication of results in cache (e.g.a.union(b).union(c).union(d).store(store_dependencies=True) would produce cache tables for a, b, aUb, c, aUbUc, d and aUbUcUd.

Describe the solution you'd like
It would be useful if Union could take an arbitrary number of queries as arguments (e.g. Union(a, b, c, d, all=True) would union all four queries in one shot). This could be achieved either through arbitrary *args (which would remain compatible with the current implementation provided top and bottom are always provided as positional args and all is always explicitly provided as a kwarg) or a single list argument.

Describe alternatives you've considered
We could make Union smarter, so that Union(Union(a, b), c) would flatten the inputs into a single non-nested sequence of unions. However, this would mean pre-cached unions couldn't be used as components of larger unions, which may be something we need sometimes (e.g. if a union is already computed but we later want to extend it with rows from another query, or if we have a very large number of sub-queries and the only feasible way to union them all is to do so in stages).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FlowMachine Issues related to FlowMachine refactoring
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant