Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying the standard library with multi-arity functions #418

Closed
ghost opened this issue Jun 16, 2014 · 11 comments
Closed

Simplifying the standard library with multi-arity functions #418

ghost opened this issue Jun 16, 2014 · 11 comments

Comments

@ghost
Copy link

ghost commented Jun 16, 2014

My suggestion is simple: functions that are identical save for an optional parameter (such as all functions that end with _by) could drop that prefix and become multi-arity functions, that is, functions that accept several definitions with a varying number of parameters. For example, this could be the manual entry for min / max / min_by / max_by:

Find the minimum or maximum element of the input array. Passing
a parameter allows you to specify a particular field or
property to examine, e.g. min(.foo) finds the object
with the smallest foo field.

For legacy reasons, min_by(.foo) and max_by(.foo) exist,
with identical behaviour to min(.foo) and max(.foo). These
are considered deprecated and will be dropped in the next major
version.

I've started working on it. Is everyone okay with this? @nicowilliams

@nicowilliams
Copy link
Contributor

I kinda like the "_by" suffix, but I like this too. We should find all opportunities for simplification using multi-arities. For example: join.

@ghost
Copy link
Author

ghost commented Jun 16, 2014

These are the potential changes I've found so far:

  • recurse takes over recurse_down and .., the former is deprecated.
  • join and split get reasonable default arguments.
  • min and max take over min_by and max_by, which are deprecated.
  • unique takes over unique_by, which is deprecated.
  • sort takes over sort_by, which is deprecated.
  • group replaces group_by, which is deprecated, for consistency.
  • paths gains an optional argument, so that leaf_paths is the same as paths(values). The former is deprecated. [1]
  • range allows a one parameter invocation which assumes the range to start at zero.

Paraphrasing whoever played the Napster guy, "Drop the _by. It's cleaner."

[1] I'm not entirely sure if I understand how paths works, but it makes sense in my head. Correct me if this is impossible or not worth the effort.*

@nicowilliams
Copy link
Contributor

Re: leaf_paths... that's not quite right. leaf_paths is defined thusly:

def leaf_paths: . as $dot|paths|select(. as $p|$dot|getpath($p)|type|. != "array" and . != "object");

so that paths an optional parameter that could be any of scalars and friends would have to be defined like so:

def paths(node_filter): . as $dot|paths|select(. as $p|$dot|getpath($p)|node_filter);

which works:

$ jq -c 'def paths(node_filter): . as $dot|paths|select(. as $p|$dot|getpath($p)|node_filter); paths(select(type == "string"))'
[[["a", {"b":[11,[{"c":"d"}]]}]]]
[0,0,0]
[0,0,1,"b",1,0,"c"]
$

And that's very nice indeed.

@nicowilliams
Copy link
Contributor

range could also gain a third optional argument (step by).

With that, +1 to all of your list.

@ghost
Copy link
Author

ghost commented Jun 17, 2014

Also, add could take an optional argument as starting point for its internal reduce, as you said over at #280.

@nicowilliams
Copy link
Contributor

@slapresta good point!

@ghost
Copy link
Author

ghost commented Jun 17, 2014

Actually, what we want for leaf_paths is paths(scalars), not paths(values), right?

@nicowilliams
Copy link
Contributor

Yes, we should want paths(filter) to filter paths by the value found at that path. Or maybe we should call that paths_to(filter)? I think the latter is clearer.

Here's one way to define leaf_paths in terms of scalars:

def leaf_paths: . as $dot|paths|. as $p|$dot|getpath($p)|scalars|$p;

@ghost
Copy link
Author

ghost commented Jun 17, 2014

I think paths, without prefix, makes more sense than paths_to, especially if its argument is supposed to be optional.

Anyway, there's a starting point over at #426; perhaps we should keep working over there. I used the definition of paths(node_filter) given by your previous commit and based leaf_paths on that. I did everything on the bullet point list except for range's third argument (which is not defined as a function but as... something else; i guess it requires an understanding of jq's internals I currently lack), add's default value and join and split's default arguments (I'm not sure what these should be or, to be honest, what my idea was when I proposed them)

@ghost ghost closed this as completed Jun 17, 2014
@nicowilliams
Copy link
Contributor

join and split should use "," as the default, IMO.

Regarding add, IMO the best thing to do is to leave it alone. Addition of all all types to null is supported, so it works out alright.

@pkoppstein
Copy link
Contributor

@nicowilliams wrote:

join and split should use "," as the default, IMO.

In awk, perl, ruby, javascript and no doubt other jq peers, split accepts regular expressions. Since jq's support for regexps is on the horizon, there is something to be said for holding off establishing a default for jq's split/1, mainly because it should almost surely be a regular expression.

For example, if we go with ", " as join's default, as Nico suggests, then it might make sense to choose " , *" or perhaps "\s,\s_" as the default for split. Or, following awk, python, ruby, etc, split's default would be "\s+". (_)

For your reference, here is an extract from ruby's documentation about split:

If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.
...
If pattern is omitted, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ' ' were specified.

Footnote
(*) Python is a little unusual in having both str.split and re.split, but the default for python's str.split is effectively "\s+" according to the python documentation:

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants