Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed ${} references in predicates not being expanded using current()… #536

Merged
merged 8 commits into from
May 18, 2021
Merged

Conversation

gushil
Copy link
Contributor

@gushil gushil commented Apr 26, 2021

… if the expression contains whitespace before the reference

Closes #531

Why is this the best possible solution? Were any other approaches considered?

Fixing previous working solution.

What are the regression risks?

None.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No.

Before submitting this PR, please make sure you have:

  • [x ] included test cases for core behavior and edge cases in tests_v1
  • run nosetests and verified all tests pass
  • run black pyxform to format code
  • verified that any code or assets from external sources are properly credited in comments

… if the expression contains whitespace before the reference
Copy link
Contributor

@MartijnR MartijnR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not break it, though I tried quite hard. :)

| | calculate | pos1 | | position(..) |
| | begin repeat | rep5 | | |
| | calculate | pos5 | | position(..) |
| | calculate | item5 | | join(' ', instance('item')/root/item)[index = ${pos5} and ${pos1} = 1]/label) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but just FYI, this test shows how hard it would be (have been) to implement #525. Since join() expects a node-set attribute but ${pos5} in this case has to be relative nonetheless since it used as a position/index....

@gushil
Copy link
Contributor Author

gushil commented May 4, 2021

I could not break it, though I tried quite hard. :)

Thanks!

@gushil
Copy link
Contributor Author

gushil commented May 10, 2021

Hi @MartijnR should I make changes or we just wait for @lognaturel to chime in here?

Thanks

@pbowen-oc
Copy link

@lognaturel - Will you be able to review this in the next couple days? We are working on a release to fix this and other issues and looking to get a tagged version that we can deploy for testing as soon as possible.

@lognaturel
Copy link
Contributor

Will try to review and release by Fri.

@gushil please review #535. Could you also please look into why there’s a CI failure here? Have some versions changed? We want to make sure it’s green before merge.

@gushil
Copy link
Contributor Author

gushil commented May 13, 2021

@lognaturel

It looks like the failure is caused by pip version 21.1 (used by the build)

Based on pypa/pip#9954 , by using pip version 21.1.1 (released on May 1st, a couple of days after the build happened) the warnings will be silenced.

Should we try to trigger new build? (I don't know how to do that yet)

@codecov-commenter
Copy link

codecov-commenter commented May 13, 2021

Codecov Report

Merging #536 (91e30e2) into master (ae7777c) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #536      +/-   ##
==========================================
+ Coverage   83.98%   84.00%   +0.01%     
==========================================
  Files          25       25              
  Lines        3728     3732       +4     
  Branches      871      872       +1     
==========================================
+ Hits         3131     3135       +4     
  Misses        452      452              
  Partials      145      145              
Impacted Files Coverage Δ
pyxform/survey.py 92.73% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae7777c...91e30e2. Read the comment docs.

@lognaturel
Copy link
Contributor

As far as I know, only @ukanga has access to the Appveyor account. When I need to re-run, I force push a change to the latest commit message. I did this here and did not change the contents of the commit.

What I'm understanding here is that #524 introduced the regular expression instance\([^)]+\S+ to identify instance path expressions. This explicitly used whitespace to attempt to identify the end of a path expression starting with instance but I didn't really notice that or consider implications in review. All tests introduced made sure there was always a space after the instance path expression and not within it. This is the kind of assumption that's really valuable to describe in a PR description to make sure it's explicitly considered.

@pbowen-oc's example from the issue is particular because it has multiple predicates. I think that's important to introduce some testing around. I would have been tempted to identify the end of an expression we care about by starting with instance( and ending with the first ] but the possibility of multiple predicates mean we can't do that. There should be a test that keeps us from trying that in the future.

What this PR does is remove any attempt to identify the end of an instance path expression. This kind of thing would be valuable to highlight in the "why is this the best solution" part of the PR template. This change means that any ${} expression that occurs after instance() will get a current() prefix whether or not it's in a predicate. See for example instance('item')/root/item[itemindex = ${item-counter}]/label + ${item-counter}. The first ${item-counter} gets a current() prefix which is good and expected. The second ${item-counter} does too. This is not desirable and I don't love it but it happens that the current() has no effect in that case so it 's not the worst thing in the world. I can't think of a case where the superfluous current() would cause problems. @MartijnR, what do you think? I don't believe we can accurately identify XPath path expressions without a parser that has knowledge of operators, function structure, opening closing brackets/braces, etc. This may be the best we can do. If we go with it, I think this case should be documented with a comment that says something like "The second reference to item-count should not get a current() prefix but we currently have no way of reliably detecting the end of an XPath path expression so any reference after instance() gets a current() prefix whether or not it's in the path expression."

We don't have to do anything about this now, but I wanted to come back to something from the initial issue that I think we didn't get quite right. At #490 (comment), @MartijnR, you said we should only be using current() for XPath path expressions that start with instance(. I'm not really sure that's the ideal. A common thing to do is to have two parallel repeats. For example, one that builds a roster and another that asks questions about each roster member. You might want to do something like /data/roster[position() = ${pos}]/name in the second repeat. In that case you do need the current() prefix. In fact, I can't currently think of a case where it would be undesirable. You mentioned populating choices from a repeat but that's not done manually so the path expression for that would not go through this code.

Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Get @MartijnR's evaluation of the extra current() prefixes now generated
  • Add tests for XPath path expressions with multiple predicates
  • Add test for other ${} expressions after instance() that get a current() prefix with an explanation for why that's happening.

@lognaturel
Copy link
Contributor

Hmm. Now that I think about it with a clear head, wouldn't a regex that captures all strings in between [ and ] and uses the current() prefix for those have the outcome that we want? This depends on whether or not I missed something regarding predicates on nodesets in the forms.

@pbowen-oc @gushil I think this PR certainly makes a valuable improvement. If you can do the two second bullets I mentioned, I could merge and do a release. Then once we get an opportunity to hear from @MartijnR we can evaluate whether the concept I mentioned above might be promising and help capture more useful cases.

@pbowen-oc
Copy link

Thanks, @lognaturel.

@gushil is working on adding tests for the cases you referenced

@gushil gushil requested a review from lognaturel May 17, 2021 15:15
@MartijnR
Copy link
Contributor

wouldn't a regex that captures all strings in between [ and ] and uses the current() prefix for those have the outcome that we want?

That sounds great. I agree it's better not to create unnecessary current() prefixes, and I'm embarrassed I didn't find instance('item')/root/item[itemindex = ${item-counter}]/label + ${item-counter}, because that's exactly the thing I was trying to find..... 👍

You might want to do something like /data/roster[position() = ${pos}]/name in the second repeat. In that case you do need the current() prefix. In fact, I can't currently think of a case where it would be undesirable. You mentioned populating choices from a repeat but that's not done manually so the path expression for that would not go through this code.

Hmmm, yes, I have a vague recollection that you mentioned this about my misunderstanding of current() before... I think I may have regressed and that you're right about current() being appropriate for all cases where a ${ref} results in a relative path inside a predicate (so in pyxform world).

Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the new tests really help add confidence!

Am I understanding correctly that this new approach keeps the instance() check and then adds code to only do the current replacement inside a pair of [] after instance()? So it doesn't address my parallel repeat case (that's ok) but it does avoid putting too many current()s. If I understood that correctly, I'm comfortable going with this for now.

I can come up with some twisted cases with nested predicate expressions that might break but I don't think these need to be accounted for.

I believe there's now one extra loop that doesn't ever iterate more than once and I've flagged that below.

| | calculate | pos1 | | position(..) |
| | begin repeat | rep5 | | |
| | calculate | pos5 | | position(..) |
| | calculate | item5 | | (instance('item')/root/item)[index = ${pos5} and ${pos1} = 1]/label |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this use of parentheses valid in XPath? Mostly curious about what prompted this test!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad. That comes from previous (non committed locally created) test. Will be fixed in the next commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to test reference path inside a predicate (bracketed) that includes reference in a repeat and reference outside a repeat. Is it acceptable test? Do you have any other test suggestion to achieve that?

Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the expression looks fine to me, I was just unclear on the significance of the parens! Agreed having a ref in the repeat and one outside is a good idea. 👍

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lognaturel - I use patterns like this when counting nodes and then iterating over them in a repeating group.

So starting with something like:
count(instance('item')/root/group/item[name = 'test'])

I'd keep the parentheses the same and add an ordinal predicate for my repeating group position counter:
(instance('item')/root/group/item[name = 'test'])[position() = ${pos}]/value

My best recollection is that this was only needed when the data didn't repeat solely at the item node itself but rather at nodes further up the hierarchy (like group in this example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the parens do make sense in that case. There's no need for them in (instance('item')/root/item)[index = ${pos5} and ${pos1} = 1]/label but I guess they're ok around any nodeset expression.

@@ -926,11 +928,20 @@ def _in_secondary_instance_predicate():
# It is possible to have multiple instance in an expression
instance_match_iter = instance_regex.finditer(matchobj.string)
for instance_match in instance_match_iter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer makes any difference since the first match will greedily capture the whole rest of the string, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Will be fixed in the next commit.

@gushil gushil requested a review from lognaturel May 18, 2021 04:44
@lognaturel lognaturel merged commit 3184e9d into XLSForm:master May 18, 2021
@lognaturel
Copy link
Contributor

Will aim to release mid-morning tomorrow (Pacific).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants