-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParseXS improvements #20496
ParseXS improvements #20496
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I build this branch up through this commit 0b3c720, I start to get failures in dist/ExtUtils-ParseXS/t/002-more.t;
:
$ cd t;./perl harness -v ../dist/ExtUtils-ParseXS/t/002-more.t; cd -
ok 1 - require ExtUtils::ParseXS;
Warning: Conflicting duplicate alias 'return_1' changes definition from '0' to 'RET_1' in XSMore.xs, line 194
ok 2 - Create an output file
ok 3 - ExtUtils::CBuilder::compile() returned true value
ok 4 - Make sure XSMore.o exists
ok 5 - ExtUtils::CBuilder::link() returned true value
ok 6 - Make sure XSMore.so exists
ok 7 - No error message recorded, as expected
ok 8 - ExtUtils::ParseXS::errors()
ok 9 - the BOOT keyword
ok 10 - the INCLUDE keyword
ok 11 - the PROTOTYPES keyword
ok 12 - the PROTOTYPE keyword
ok 13 - the ATTRS keyword
ok 14 - ATTRS with prototype
not ok 15 - the CASE keyword (1)
# Failed test 'the CASE keyword (1)'
# at t/002-more.t line 78.
# got: '*XSMore::return_1'
# expected: '1'
ok 16 - the CASE keyword (2)
ok 17 - ALIAS with prototype (1)
When building this branch on FreeBSD with
|
On Tue, 8 Nov 2022, 22:06 James E Keenan, ***@***.***> wrote:
***@***.**** commented on this pull request.
When I build this branch up through this commit, I start to get failures
in dist/ExtUtils-ParseXS/t/002-more.t;:
$ cd t;./perl harness -v ../dist/ExtUtils-ParseXS/t/002-more.t; cd -
ok 1 - require ExtUtils::ParseXS;
Warning: Conflicting duplicate alias 'return_1' changes definition from '0' to 'RET_1' in XSMore.xs, line 194
ok 2 - Create an output file
ok 3 - ExtUtils::CBuilder::compile() returned true value
ok 4 - Make sure XSMore.o exists
ok 5 - ExtUtils::CBuilder::link() returned true value
ok 6 - Make sure XSMore.so exists
ok 7 - No error message recorded, as expected
ok 8 - ExtUtils::ParseXS::errors()
ok 9 - the BOOT keyword
ok 10 - the INCLUDE keyword
ok 11 - the PROTOTYPES keyword
ok 12 - the PROTOTYPE keyword
ok 13 - the ATTRS keyword
ok 14 - ATTRS with prototype
not ok 15 - the CASE keyword (1)
# Failed test 'the CASE keyword (1)'
# at t/002-more.t line 78.
# got: '*XSMore::return_1'
# expected: '1'
ok 16 - the CASE keyword (2)
ok 17 - ALIAS with prototype (1)
Strange. I will dig. Thanks for the report.
Yves
… |
aa1b534
to
b9af180
Compare
FWIW, I pushed this as a smoke-me branch to see what kind of reports we get with it: smoke-me/yves_parsexs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the commit message for 589b5f1, what does it mean for output to be 'deterministic'? What would be the benefit of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 3478106 introduces new functionality: "symbolic alias". [Will need review by someone more familiar with actual XS.]
Commit message indicates commit does several different things at once. Could this be confusing or difficult to maintain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide an example of where (i) in current blead (or 5.36) we get warnings that you consider to be inadequate (or fail to get warnings that would be helpful) and (ii) after merging this branch we will get better warnings?
Can you provide a simple illustration of how WarnHint() will improve warnings emitted by EU::PXS?
Hi @jkeenan, thanks for the feedback.
the code builds an array of code for output using a raw keys statement which produce a non-deterministic ordering of their keys. Its much like this:
thus the output is not deterministic, and from run to run it might change. Sorting the keys means the data is output in a consistent order every time. Which means that with no input change the output should be identical. As far as I could tell that wasn't true for the code involved. To be honest the change didn't break any tests, so it is possible something is sorting this data elsewhere and also providing deterministic output, but I couldn't be sure. Adding the
Ah, I see how you come to that conclusion. What it means is something like "in the process of adding this functionality I necessarily took care of some other edge case issues relating to aliases that were affected by my change, while at the same time correcting their logic to be consistent.". I can change the commit message if you want, it seemed clear to me.
Before we start here is a brief explanation of what the ALIAS mechanism is for and does at a high level. It is common in XS to have a number of functions which require standard boiler plate around a smaller set of "internal code". This is common practice, for instance all the logic to validate arguments might be identical for all the methods in a package, similarly code to generate constants from header values might do the same thing. So XS provides a mechanism where it will use one C function for multiple Perl functions. When the C function is called an integer value (usually called `ix` IIRC) is set to signal to the C code which Perl function was called (or rather which `ix` value is associated with the function name that was called). Thus there is a level of indirection involved on the mapping. Consider the following XS sub header:
This would create one C/XS function, and three perl function entries for This means collisions are possible, both at the name level (a name might be listed twice) or the value level (a value might be listed twice). The ParseXS code has historically tried to warn when either happened (modulo some bugs in its implementation). So for instance the old code will warn you if you have something like this:
it will tell you it is "ignoring" the second "foo". However this is not entirely correct (and in fact is only true because the values are the same), if you feed it:
it will again tell you that is ignoring the second foo, but it doesn't actually /ignore/ the second foo, it overwrites the value of foo so it ends up as 2 (and along the way gets confused about what values have been used). Since my objective with the symbolic aliases is to allow an engineer to specify "this alias is deliberately a dupe of that other alias (or sub) and thus should not produce a warning", I necessarily had to change the logic for warnings and while I was doing so I noticed the issues with the existing warnings. ParseXS also has warnings about multiple names having the same value. So for instance
you will get a warning about do1 and do2 having the same value, which was actually my core objective, as I have legit reasons to create a dupe and I don't want warnings about them, and there is no way to turn the warning off. So I introduced syntax like this:
which would be the same as this:
except it would not throw warnings. I chose Interestingly there is another long standing bug in the dupe value warning code in that
historically does not warn, duping the default value of the base function (0) only warns on the second alias. From the point of view of the internals of ParseXS I can understand how this bug came about, but it is inconsistent with the behavior of other alias values, so I fixed it to warn similarly to how it would handle any other value collision. Anyway, adding this new form of alias declaration resulting in new edge cases being added to the warning detection for some of this, for instance consider this:
do2 should not warn that it is a dupe of 1, but do3 should. While working on these cases I noticed the other issues. Since the all involve the same underlying logic I dealt with them all at once. As far as I intended and understand nothing that used to warn will stop warning, and in only one condition, where a developer is using the default value of 0 for an alias (possibly without knowing) are any new warnings added. On the other hand the warnings, if they were changed at all, should be more informative.
Well part of the intent of this PR is to provide a way to silence unwanted warnings. If I want to alias "foo" and "bar" to the same value internally (maybe for backcompat or something like that) I should be able to do so in a way that xsubpp does not warn, especially when those warnings will be visible to end users who cant do anything about them. None of the core code intentionally does this type of aliasing (probably to avoid warnings), so there are no examples in /core/, and since I added new syntax to the XS module grammar when one does want to do this none of our existing core code would exercise this. However you can see it in the tests. Similarly for the hint message.
You can see pretty much all the changes in one view with the tests in t/001_basic.t and t/XSAlias.t, This is the XS:
and the test output:
which expands out to
the result of using WarnHint is visible on the first two lines. The parenthesized message about using a symbolic reference to avoid the warning is the "Hint". The fix for the incorrect claim about "ignoring" a definition is visible on the third line. Previously the message would not show the value nor show it changing. The fix for the "ignoring" message overall to show all aliases and to show the value is visible on the first and fourth lines, previously we would not show aliases nor would we show the value, and the fix to now show a warning when the default value of 0 is duplicated is visible on the last line. I chose to make the symbolic alias hint "show once" so as not to clutter up the warning list. The point is to call the devs attention to the new feature so they can change their code not to warn if they are intending to create a dupe, IMO we don't need to do so every case there is an alias collision. If I change all the
then we get this on an older perl:
You can see the outright error on line 3 (the value is not being ignored, the old value is being overwritten), and if you compare closely you will notice the warning about xunx and do is missing. The other point is that even though I meant to alias Hopefully that explains things sufficiently. :-) |
Smoke testing results are satisfactory. I was able to do a limited amount of testing of CPAN distros with XS code against a perl built from this branch; results also satisfactory. At this point what we could use is some review of the changes by someone who (unlike me) has actually written XS code. |
Make sure our output is deterministic.
This patch makes it possible to omit some of the whitespace around preprocessor directives. It teaches fetch_para() to understand that a #else or #endif directive that does not end a #if that was seen in the current "paragraph" should not be parsed as part of that paragraph. This means that a conditional block that defines the same sub under different define conditions need not have extra whitespace after each sub definition.
Sometimes you *want* to create multiple names for the same functionality, but doing so with the ALIAS functionality requires awkward workarounds. This adds a new "symbolic alias" that does not warn on dupes as creating a dupe is its whole point. For a symbolic alias the value is the name of an existing alias. This also cleans up some of the warnings related to aliases so we distinguish between when a duplicate is truly ignored or where it overrides a previous value. And deal with a few other edge cases properly.
Also normalize warnings. It used to be if you created an alias of the root function (0) no warning would be produced. Now we will produce a warning, but we also allow symbolic references to defuse the warning.
b9af180
to
9e79a95
Compare
The upcoming C++23 and C23 standards add #elifdef, #elifndef.
Use warn instead of print STDERR, and provide a way to make errors trigger a die instead of an exit(1). Currently the module code is written as though the only way it will be used is via the xsubpp script, so the library does annoying things like calling exit() instead of die() to signal an exception. It also uses print STDERR instead of warn, which means the test code can't just use a $SIG{__WARN__} hook to see its warnings, and instead has to include PrimitiveCapture in the t directory. These two things combine annoyingly in our test code such that when you break the module you can see tests exiting early, but with no useful diagnostics as to why. This patch reworks this to use "warn" instead of print STDERR, and to provide a way to enable the use of "die" instead of exit. Thus making debugging failing tests far easier.
The version in the pod has been long wrong. We are on 3.48 now, it was 3.13_01.
9e79a95
to
206dab6
Compare
In general, I like what I see here, but I have two principal concerns. First: On CPAN there are over 2600 distributions that contain XS code. That's less than 10 percent of the total, but I suspect that two things are simultaneously true: (i) they're more idiosyncratic than CPAN as a whole; and (ii) some of them are vastly more important than 99% of what's on CPAN. Many distros with XS have been broken and unmaintained for years, so getting a clean baseline for the impact of this p.r. is not easy. If we release this as part of 5.37.6 on November 20, the heavy-duty CPANtesters will start testing "all" of CPAN, but it takes time for them to look at FAILs and determine whether the failures are both new and attributable to the code that has just been merged into blead. We have to be prepared for the possibility that flaws in this code which would warrant a revert might not appear for two week or more. When all is said and done this p.r. is only adding a very modest amount of new functionality. If we get a lot of BBC reports we should not burden CPAN authors with the need to do work-arounds to accommodate that functionality. Second: ExtUtils-ParseXS is a dual-release, blead-first distribution. That means that at some point after this p.r. gets merged, someone will have to do a new CPAN release (the most recent was done by @xsawyerx in January of this year) and, as part of preparation for that release, will have to test this against older versions of perl. Based on CPANtesters results for the latest version, I'd say that we would want EU:PXS to PASS going back to perl-5.8.9. Has there been any assessment of how well this code will work with those older versions? I may have missed something, but I don't think this concern has been discussed so far in this ticket. |
@jkeenan I take your point. But the more popular cpan module are usually tested early in the process right? So I think we would get pretty decent feedback pretty fast. Also, i think its worth noting here, nothing I have done (except one warning) changes the behavior of anything that isn't already throwing an exception. For instance the '#else' patch actually makes things that used to throw an error not throw errors. So it is unlikely (albeit possible!) that it will break anything. I think if it does we will see signal very quickly. If it does cause BBC then I would say we would revert it on first sign of trouble. As for back-compat we test that it compiles with older perls as far as I understand it. I had a breakage because i used |
I don't actually know the order in which people like @andk, @eserte, @bingos test CPAN distros after a monthly dev release. I do know that the process of confirming that some CPANtesters failure is a BBC (and, beyond that, bisecting to a particular commit to blead) is a process that cannot be completely automated as it requires human judgment, hence takes time. But, assuming @tonycoz is okay with your more recent commits, I'd say this is good to go. Thanks. |
The github workflow tests all of the modules in dist/ against 5.8, 5.10, 5.24 and the most recent non-development release, 5.36 right now. This doesn't mean it's tested against all of the XS modules on CPAN. |
Ill merge it and lets find out. |
On Wed, Nov 16, 2022 at 01:09:15AM -0800, Yves Orton wrote:
Ill merge it and lets find out.
I'm seeing lots of new warnings on STDERR since the merge. Is this
expected? Is it the XS distributions that are wrong and need fixing?
And if so have it been reported upstream?
Warning: Aliases 'Digest::SHA::sha1' and 'sha1' have identical values of 0 - the base function in SHA.xs, line 118
(If this is deliberate use a symbolic alias instead.)
Warning: Aliases 'Digest::SHA::hmac_sha1' and 'hmac_sha1' have identical values of 0 - the base function in SHA.xs, line 174
Warning: Aliases 'Digest::SHA::hashsize' and 'hashsize' have identical values of 0 - the base function in SHA.xs, line 235
Warning: Aliases 'Digest::SHA::digest' and 'digest' have identical values of 0 - the base function in SHA.xs, line 272
Warning: Aliases 'B::OP::next' and 'next' have identical values of 0 - the base function in B.xs, line 816
(If this is deliberate use a symbolic alias instead.)
Warning: Aliases 'min' and 'min' have identical values of 0 - the base function in ListUtil.xs, line 274
(If this is deliberate use a symbolic alias instead.)
Warning: Aliases 'sum' and 'sum' have identical values of 0 - the base function in ListUtil.xs, line 326
Warning: Aliases 'reduce' and 'reduce' have identical values of 0 - the base function in ListUtil.xs, line 545
Warning: Aliases 'none' and 'any' have identical values of 0 - the base function in ListUtil.xs, line 702
Warning: Aliases 'B::PADNAMELIST::MAX' and 'PadlistMAX' have identical values of 0 - the base function in B.xs, line 2167
Warning: Aliases 'head' and 'head' have identical values of 0 - the base function in ListUtil.xs, line 769
Warning: Aliases 'uniqint' and 'uniq' have identical values of 0 - the base function in ListUtil.xs, line 1325
Warning: Aliases 'bytes2str' and 'decode' have identical values of 0 - the base function in Encode.xs, line 982
(If this is deliberate use a symbolic alias instead.)
Warning: Aliases 'str2bytes' and 'encode' have identical values of 0 - the base function in Encode.xs, line 1004
Warning (mostly harmless): No library found for -lposix
Warning (mostly harmless): No library found for -lcposix
Warning: Aliases 'isNFC_NO' and 'isComp_Ex' have identical values of 0 - the base function in Normalize.xs, line 814
(If this is deliberate use a symbolic alias instead.)
…--
Wesley Crusher gets beaten up by his classmates for being a smarmy git,
and consequently has a go at making some friends of his own age for a
change.
-- Things That Never Happen in "Star Trek" #18
|
On Wed, 16 Nov 2022 at 10:26, iabyn ***@***.***> wrote:
On Wed, Nov 16, 2022 at 01:09:15AM -0800, Yves Orton wrote:
> Ill merge it and lets find out.
I'm seeing lots of new warnings on STDERR since the merge. Is this
expected?
In the sense that I thought we might see a few of these yes.
Is it the XS distributions that are wrong and need fixing?
In the most prescriptive technical sense yes. In the practical sense it is
a bit more debatable. We have always warned when two aliases were created
to the same value. However a bug meant that this wasn't applied to the
value 0. I fixed that bug, on the assumption that this would likely be
rare. I have to admit I did not notice the warnings from our own dists, or
I would have disabled the new warning.
The simplest solution is I can push a PR to disable this new warning. But
if we disable it for 0 why do we have it at all? Why should 0 be special
but the others no? Eg, if you do this:
void
foo (arg)
SV * arg
ALIAS: bar = 0
there is no warning that foo and bar are aliased to the same value. But if
you do the equivalent
void
foo (arg)
SV * arg
ALIAS: foo = 0 bar = 0
there will be, and if you override the base functions ix value like this:
void
foo (arg)
SV * arg
ALIAS: foo = 1 bar = 1
there also will be a warning about bar.
BTW the above is why i considered the original behavior of allowing one
alias to 0 without warning that it creates dupes to be a bug. The code did
not consider the base function to be "an alias" unless it
was explicitly listed.
And if so have it been reported upstream?
No I had not noticed this and was hoping that we would get little to no BBC
reports from it (it is a warning). My bad. I will prepare a PR to disable
this warning while we decided what to do. I had expected this to be only a
very rare case, to realize there are so many in core itself suggests that
is very wrong and this warning will be unhelpful. I might argue we simply
should warn about this at all then, i found the warnings sufficiently
unhelpful that I added support for new syntax to avoid them.
Note this merge was done about 20 minutes ago. This has not been in the
wild long.
Warning: Aliases 'Digest::SHA::sha1' and 'sha1' have identical values of 0
- the base function in SHA.xs, line 118
(If this is deliberate use a symbolic alias instead.)
Warning: Aliases 'Digest::SHA::hmac_sha1' and 'hmac_sha1' have identical
values of 0 - the base function in SHA.xs, line 174
Warning: Aliases 'Digest::SHA::hashsize' and 'hashsize' have identical
values of 0 - the base function in SHA.xs, line 235
Warning: Aliases 'Digest::SHA::digest' and 'digest' have identical values
of 0 - the base function in SHA.xs, line 272
Warning: Aliases 'B::OP::next' and 'next' have identical values of 0 - the
base function in B.xs, line 816
(If this is deliberate use a symbolic alias instead.)
Warning: Aliases 'min' and 'min' have identical values of 0 - the base
function in ListUtil.xs, line 274
(If this is deliberate use a symbolic alias instead.)
Warning: Aliases 'sum' and 'sum' have identical values of 0 - the base
function in ListUtil.xs, line 326
Warning: Aliases 'reduce' and 'reduce' have identical values of 0 - the
base function in ListUtil.xs, line 545
Warning: Aliases 'none' and 'any' have identical values of 0 - the base
function in ListUtil.xs, line 702
Warning: Aliases 'B::PADNAMELIST::MAX' and 'PadlistMAX' have identical
values of 0 - the base function in B.xs, line 2167
Warning: Aliases 'head' and 'head' have identical values of 0 - the base
function in ListUtil.xs, line 769
Warning: Aliases 'uniqint' and 'uniq' have identical values of 0 - the
base function in ListUtil.xs, line 1325
Warning: Aliases 'bytes2str' and 'decode' have identical values of 0 - the
base function in Encode.xs, line 982
(If this is deliberate use a symbolic alias instead.)
Warning: Aliases 'str2bytes' and 'encode' have identical values of 0 - the
base function in Encode.xs, line 1004
Warning (mostly harmless): No library found for -lposix
Warning (mostly harmless): No library found for -lcposix
Warning: Aliases 'isNFC_NO' and 'isComp_Ex' have identical values of 0 -
the base function in Normalize.xs, line 814
(If this is deliberate use a symbolic alias instead.)
IMO what this says is that people want to duplicate aliases a lot more than
people think, and warning when they do is unhelpful.
I think maybe the best option would be to introduce a DEV_MODE type var to
ExtUtils::ParseXS so if people want extended warnings they can get them,
but that normally we would not warn about duplicate aliases.
cheers,
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
On Wed, 16 Nov 2022 at 10:44, demerphq ***@***.***> wrote:
IMO what this says is that people want to duplicate aliases a lot more than people think, and warning when they do is unhelpful.
Or that it means that there is a lot of buggy code that we found with
this change. Its hard to say for sure, but occams razor suggests that
most of it is intentional.
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
@iabyn I have created #20517 to address this once we decide what we should do. We can also revert the patch that "fixed" this warning, but IMO that would be going backwards. This patch needs more tests before merging I guess, but if we want to silence this immediately we apply this and add the tests afterwards. |
This includes a number of ParseXS changes. A version bump, tests that perlxs.pod has the correct version, improvements to parsing cpp processor directives so XS files will be less whitespace sensitive, and enhanced support for aliases.