Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update-assignment //= always evaluates right-hand-side #2410

Open
wader opened this issue Feb 7, 2022 · 12 comments
Open

Update-assignment //= always evaluates right-hand-side #2410

wader opened this issue Feb 7, 2022 · 12 comments

Comments

@wader
Copy link
Member

wader commented Feb 7, 2022

Describe the bug

RHS of a update-assignment //= expression is always evaluated even if LHS is null or false. As jq has some non-pure functions like input and debug it might cause unexpected behaviour. For example I think it should be possible to use debug so see if RHS gets evaluated or not.

To Reproduce

$ jq -n '{a:1} | .a //= (debug | 2)'
["DEBUG:",{"a":1}]
{
  "a": 1
}
$ jq -n '{b:1} | .a //= (debug | 2)'
["DEBUG:",{"b":1}]
{
  "b": 1,
  "a": 2
}

Expected behavior

That RHS is only evaluated if LHS is null or false.

Environment:

macOS 12.1

$ jq --version
jq-1.6-139-ga9ce724

Was discovered when discussing #2407

@itchyny
Copy link
Contributor

itchyny commented Feb 8, 2022

I think the following definition will fix the problem

def _update_alt(ps; f): . as $x | ps |= (. // ($x | f));
 $ jq -n '{a:0,b:null} | .a //= input | .b //= input | .c //= input' <<< '1 2 3'
{
  "a": 0,
  "b": 2,
  "c": 3
}
 $ jq -n '{a:0,b:null} | _update_alt(.a; input) | _update_alt(.b; input) | _update_alt(.c; input)' <<< '1 2 3'
{
  "a": 0,
  "b": 1,
  "c": 2
}

But it breaks some cases.

 $ jq -nc '.a //= empty'
 $ jq -nc '_update_alt(.a; empty)'
null
 $ jq -nc '.a //= range(3)'
{"a":0}
{"a":1}
{"a":2}
 $ jq -nc '_update_alt(.a; range(3))'
{"a":0}

@wader
Copy link
Member Author

wader commented Feb 8, 2022

Trying to wrap my head around this. The breaking seems to happen because a |= b (_modify(a; b)?) at the moment always outputs one value (the first) even if b outputs zero or more than one value?
As i understand the code https://github.com/stedolan/jq/blob/master/src/parser.y#L282 this currently is different for //= because a //= b is rewritten to something like b as $tmp | _modify(a; . // $tmp) which explains why b is always evaluated.

This feels a bit unintuitive:

$ jq -nc '.a |= (1,2,3)'
{"a":1}
$ jq -nc '.a //= (1,2,3)'
{"a":1}
{"a":2}
{"a":3}
$ jq -nc '.a |= empty'
null
$ jq -nc '.a //= empty'

@itchyny
Copy link
Contributor

itchyny commented Feb 9, 2022

The behavior differs between |= and = too and is mentioned in the manual (See https://stedolan.github.io/jq/manual/#Plainassignment:=).

@wader
Copy link
Member Author

wader commented Feb 9, 2022

Aha yeah i noticed the code for _modify is quite special also
https://github.com/stedolan/jq/blob/master/src/builtin.jq#L14 wonder why they ended up different.

@nicowilliams
Copy link
Contributor

@wader let me help you wrap your mind around this:

Exp "//=" Exp {
  $$ = gen_definedor_assign($1, $3);
} |
static block gen_definedor_assign(block object, block val) {
  block tmp = gen_op_var_fresh(STOREV, "tmp");
  return BLOCK(gen_op_simple(DUP),
  /*here -->*/ val, tmp,
               gen_call("_modify", BLOCK(gen_lambda(object),
                                         gen_lambda(gen_definedor(gen_noop(),
                                                                  gen_op_bound(LOADV, tmp))))));
}

This is the reason that the RHS of //= gets the . of the LHS as its value, the reason that it's evaluated every time, and also the reason that the assignment is done once per-value output by the RHS.

Compare to |= which is coded like this:

Exp "|=" Exp {
  $$ = gen_call("_modify", BLOCK(gen_lambda($1), gen_lambda($3)));
} |

Ok, let's translate all of this to English:

  • First |=: gen_call("_modify", BLOCK(gen_lambda($1), gen_lambda($3))); means: "generate a call to _modify with the lhs ($1) as the first argument and the rhs ($3) as the second argument (note that jq function arguments are lambdas, thus the gen_lambda()s).

  • Now gen_definedor_assign() and gen_update() (which are very similar):

    • the DUP is memory management -- ignore for this analysis
    • val is the RHS, and we will invoke it immediately
    • store the val output(s) (RHS) in tmp (a gensym'ed $binding)
    • call _modify (the heart of modify-assign operators) with the input to the LHS as the first argument and a second argument that amounts to . // $tmp where $tmp is the gensym'ed binding mentione above

The difference between //= and other op= assignments is that // is block-coded in gen_definedor() while the ops are builtins like _plus. // could have been jq-coded, but it's not.

@nicowilliams
Copy link
Contributor

Aha yeah i noticed the code for _modify is quite special also
https://github.com/stedolan/jq/blob/master/src/builtin.jq#L14 wonder why they ended up different.

//=, op=, and |= all use _modify. The difference lies in src/parser.y -- it generates very different bytecode for //= and op= compared to |=.

@nicowilliams
Copy link
Contributor

I should really write this up in the wiki...

@nicowilliams
Copy link
Contributor

@nicowilliams
Copy link
Contributor

@wader
Copy link
Member Author

wader commented Jul 12, 2023

@nicowilliams thanks a lot for the writeup! will surely try wrap my head around it once i'm back home

@wader
Copy link
Member Author

wader commented Jul 26, 2023

Think i understand a bit more now. Ill try explain it back to you again :) each path expression outputted by LHS gets the same RHS value, and that value is evaluated before evaluating LHS? but so would it be possible to "wait" with evaluating RHS until we know there is at least one LHS output somehow?

@nicowilliams
Copy link
Contributor

Think i understand a bit more now. Ill try explain it back to you again :) each path expression outputted by LHS gets the same RHS value, and that value is evaluated before evaluating LHS? but so would it be possible to "wait" with evaluating RHS until we know there is at least one LHS output somehow?

Yes, it should be possible. If the RHS has side-effects, that would be desirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants