Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce _with APIs #289

Closed
5 tasks done
josevalim opened this issue Jul 6, 2022 · 1 comment
Closed
5 tasks done

Introduce _with APIs #289

josevalim opened this issue Jul 6, 2022 · 1 comment

Comments

@josevalim
Copy link
Member

josevalim commented Jul 6, 2022

The goal is to introduce filter_with, summarize_with, mutate_with, and arrange_with.

Attack plan

  • Support filter_with with row-based series operations
  • Support summarize_with with aggregation-based series operations
  • Support mutate_with with row, group, and aggregation-based series operations
  • Support arrange_with
  • Decide on Split filter/2 in two functions #224

This will unblock us to fully tackle #223, #227, and #245.

Complications

arrange/distinct introduce one particular issue. We have added the _with prefix to disambiguate the macro-api from the non-macro API. This was easy because the non-macro API for mutate/summarize/filter are function based. However, arrange/distinct already have a non-macro API that is not function based, for example:

arrange(df, desc: "my_field")

But we also want to support this:

arrange(df, desc: my_field)

We have three choices:

  • Keep arrange(df, desc: "my_field") and arrange(df, desc: my_field), under the same function/arity. This may be doable but it may also raise ambiguities. For example, should we allow arrange(df, desc: my_field, asc: "another-field")?

  • Move the non-macro API to arrange_with, which will support keywords or functions, such as arrange_with(df, desc: "my_field")

  • Remove the arrange(df, desc: "my_field") version. People can either use arrange(df, desc: my_field) or arrange_with(df, fn df -> [desc: df["my_field"]] end)

EDIT: distinct has further complications, because the columns are passed as options and we will have to revisit that.

@josevalim josevalim added this to the v0.3 milestone Jul 6, 2022
philss added a commit that referenced this issue Jul 7, 2022
This adds the basic functionality for the `filter_with/2` function
discussed in #223 and addressed by #289

This change adds two new opaque backends - one for data frame and one for series.
They work in conjunction to create lazy series and accumulate operations on them
before sending to the backend.
@philss
Copy link

philss commented Jul 21, 2022

@josevalim I'm going to start summarize_with operations. I believe filter_with can be considered done. WDYT?
cc/ @cigrainger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants