Add stable Expr.top_k
#16596
Labels
A-ops
Area: operations
accepted
Ready for implementation
enhancement
New feature or an improvement of an existing feature
Description
In #10054, there was a request for a way to answer the query:
One possible API could have been:
df.top_k(k=k, by='b', group_by='d')
, or, as the OP suggested,df.group_by('d').top_k(k=k, by='b')
.The response was that
is enough, and that's what's currently suggested in the
top_k
docsHowever, as there's no ordering guarantees, then if there's ties in the
by
column, then the risk is that this solution produces a result with rows which never appeared in the original dataframe: #10054 (comment)This was discussed in #15238, and the suggestion is now to introduce a stable
Expr.top_k
. This would solve the original issueThe text was updated successfully, but these errors were encountered: