-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
windowing functions: partial:false - option to exclude incomplete window #3618
Comments
Related discussion in duckdb can be found here: duckdb/duckdb#8340 |
Hi, this is a good observation and something I noticed as well when answering your other question. It's not too difficult to add this in the PRQL query but I agree with you that it might be nice for PRQL to add this as an option as you suggested. In the meantime you could work around it with the following: let data = [
{id=1, value=2},
{id=2, value=3},
{id=3, value=4},
{id=6, value=2},
{id=5, value=3},
{id=4, value=4},
]
from data
window rolling:3 (
sort id
derive {rn = row_number this, sma3 = average value}
filter rn>=3
select {data.*, sma3}
) |
Your example won't exactly give what was requested in the question because it filter out rows having partial window. Question asked to just have SMA10 value NULL but still keep those rows. |
Ah, I misunderstood your requirement. The following query should produce the result you want: let data = [
{id=1, value=2},
{id=2, value=3},
{id=3, value=4},
{id=6, value=2},
{id=5, value=3},
{id=4, value=4},
]
from data
window rolling:3 (
sort id
derive {rn = row_number this,
sma3 = case {rn<3 => null, true => average value}
}
select {data.*, sma3}
) Please note: I'm not the best person to ask about the compiler. @aljazerzen or @max-sixty , can you answer the following?
|
This would be super useful. But I'm not sure how it fits with current window semantics. To implement it, we'd probably have to hack something up during resolution. Not the easiest. |
Just a link to a book that compares this: |
I think the request is for something similar to pandas' To develop this, the first thing to do would be to write out the SQL that we'd want to generate. I think this can probably be done with another window function which counts the number of values that are non-null, which we then return (FWIW I remember writing something a bit like this here, though that's possibly more complicated than this case) |
These are really great!! |
Very cool! I also used an SMA like that in my presentations, as well as an EWMA which you can find in this Google Colab if you want to include that as well? |
yes, min_period in pandas
while in SQL it is always 1. The request is about adding an option for pandas default behavior. The written SQL is in the first post, no need to count nulls there, just case row_number(). |
Yes great, thanks. |
Hello,
I would like to file a FR for additional argument to windowing functions interface that could control computation for an incomplete (partial) window.
My observation is that SQL databases are generally not giving any flexibility in that matter, and always compute and return partial window. On the other hand analytical tools are generally not only giving such flexibility but most of them defaults to not returning partial window results. What behavior is desires depends on the use case. From my personal uses cases, it is definitely the latter one that was useful.
My perspective is that if an analysts asks in his/her query for SMA10, then he/she wants SMA10, not an SMA9, SMA8, etc. Therefore whenever SMA10 is not possible to be computed (partial window) then NULL/missing values is expected.
Asking for changing the default in prql would be insane, but an option would be highly desired, and it actually could be another advantage of PRQL over SQL.
So lets look at the example
This SQL query will compute SMA10, but also SMA9...SMA1 whenever 10 observations will not be available in the window.
So my FR is about extra arg, lets call it
partial
and the SQL generated with
partial=false
would need to have a subquery andcase when
value added for prql is potentially big, just one attribute
partial:false
and much less typing than standard SQL way.The text was updated successfully, but these errors were encountered: