Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParseXS improvements #20496

Merged
merged 7 commits into from
Nov 16, 2022
Merged

ParseXS improvements #20496

merged 7 commits into from
Nov 16, 2022

Conversation

demerphq
Copy link
Collaborator

@demerphq demerphq commented Nov 8, 2022

This includes a number of ParseXS changes. A version bump, tests that perlxs.pod has the correct version, improvements to parsing cpp processor directives so XS files will be less whitespace sensitive, and enhanced support for aliases.

Copy link
Contributor

@jkeenan jkeenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I build this branch up through this commit 0b3c720, I start to get failures in dist/ExtUtils-ParseXS/t/002-more.t;:

$ cd t;./perl harness -v ../dist/ExtUtils-ParseXS/t/002-more.t; cd -

ok 1 - require ExtUtils::ParseXS;
Warning: Conflicting duplicate alias 'return_1' changes definition from '0' to 'RET_1' in XSMore.xs, line 194
ok 2 - Create an output file
ok 3 - ExtUtils::CBuilder::compile() returned true value
ok 4 - Make sure XSMore.o exists
ok 5 - ExtUtils::CBuilder::link() returned true value
ok 6 - Make sure XSMore.so exists
ok 7 - No error message recorded, as expected
ok 8 - ExtUtils::ParseXS::errors()
ok 9 - the BOOT keyword
ok 10 - the INCLUDE keyword
ok 11 - the PROTOTYPES keyword
ok 12 - the PROTOTYPE keyword
ok 13 - the ATTRS keyword
ok 14 - ATTRS with prototype
not ok 15 - the CASE keyword (1)

#   Failed test 'the CASE keyword (1)'
#   at t/002-more.t line 78.
#          got: '*XSMore::return_1'
#     expected: '1'
ok 16 - the CASE keyword (2)
ok 17 - ALIAS with prototype (1)

@jkeenan
Copy link
Contributor

jkeenan commented Nov 8, 2022

When building this branch on FreeBSD with clang10 as the compiler, make is failing here:

Writing Makefile for DynaLoader
"../../miniperl" "-I../../lib" DynaLoader_pm.PL DynaLoader.pm
rm -f DynaLoader.xs
cp dl_dlopen.xs DynaLoader.xs
"../../miniperl" "-I../../lib" "../../lib/ExtUtils/xsubpp" -noprototypes -typemap '/usr/home/jkeenan/gitwork/perl/ext/DynaLoader/../../lib/ExtUtils/typemap'  DynaLoader.xs > DynaLoader.xsc
Error: 'endif' with no matching 'if' in DynaLoader.xs, line 193
*** Error code 1

Stop.
make[1]: stopped in /usr/home/jkeenan/gitwork/perl/ext/DynaLoader
"../../miniperl" "-I../../lib" "../../lib/ExtUtils/xsubpp" -noprototypes -typemap '/usr/home/jkeenan/gitwork/perl/ext/DynaLoader/../../lib/ExtUtils/typemap'  DynaLoader.xs > DynaLoader.xsc
Error: 'endif' with no matching 'if' in DynaLoader.xs, line 193
*** Error code 1

Stop.
make[1]: stopped in /usr/home/jkeenan/gitwork/perl/ext/DynaLoader
Unsuccessful make(ext/DynaLoader): code=256 at make_ext.pl line 584.
*** Error code 25

Stop.
make: stopped in /usr/home/jkeenan/gitwork/perl

@demerphq
Copy link
Collaborator Author

demerphq commented Nov 9, 2022 via email

@demerphq demerphq force-pushed the yves/parsexs branch 2 times, most recently from aa1b534 to b9af180 Compare November 9, 2022 17:33
@demerphq demerphq requested a review from jkeenan November 11, 2022 10:51
@demerphq
Copy link
Collaborator Author

FWIW, I pushed this as a smoke-me branch to see what kind of reports we get with it:

smoke-me/yves_parsexs

Copy link
Contributor

@jkeenan jkeenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message for 589b5f1, what does it mean for output to be 'deterministic'? What would be the benefit of this?

Copy link
Contributor

@jkeenan jkeenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 3478106 introduces new functionality: "symbolic alias". [Will need review by someone more familiar with actual XS.]
Commit message indicates commit does several different things at once. Could this be confusing or difficult to maintain?

Copy link
Contributor

@jkeenan jkeenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example of where (i) in current blead (or 5.36) we get warnings that you consider to be inadequate (or fail to get warnings that would be helpful) and (ii) after merging this branch we will get better warnings?

Can you provide a simple illustration of how WarnHint() will improve warnings emitted by EU::PXS?

@demerphq
Copy link
Collaborator Author

Hi @jkeenan, thanks for the feedback.

what does it mean for output to be 'deterministic'?

the code builds an array of code for output using a raw keys statement which produce a non-deterministic ordering of their keys.

Its much like this:

$ perl -le'%h=("a".."z"); print keys %h'
eysmiagkcwquo
$ perl -le'%h=("a".."z"); print keys %h'
qiekmuwgayosc
$ perl -le'%h=("a".."z"); print keys %h'
yiksqwucemgao

thus the output is not deterministic, and from run to run it might change. Sorting the keys means the data is output in a consistent order every time. Which means that with no input change the output should be identical. As far as I could tell that wasn't true for the code involved. To be honest the change didn't break any tests, so it is possible something is sorting this data elsewhere and also providing deterministic output, but I couldn't be sure. Adding the sort at worst is a no-op change, and at best ensures deterministic output.

Commit message indicates commit does several different things at once

Ah, I see how you come to that conclusion. What it means is something like "in the process of adding this functionality I necessarily took care of some other edge case issues relating to aliases that were affected by my change, while at the same time correcting their logic to be consistent.". I can change the commit message if you want, it seemed clear to me.

Before we start here is a brief explanation of what the ALIAS mechanism is for and does at a high level. It is common in XS to have a number of functions which require standard boiler plate around a smaller set of "internal code". This is common practice, for instance all the logic to validate arguments might be identical for all the methods in a package, similarly code to generate constants from header values might do the same thing. So XS provides a mechanism where it will use one C function for multiple Perl functions. When the C function is called an integer value (usually called `ix` IIRC) is set to signal to the C code which Perl function was called (or rather which `ix` value is associated with the function name that was called). Thus there is a level of indirection involved on the mapping. Consider the following XS sub header:
void
do(dbh)
   SV *dbh
ALIAS:
    do_fast = 1
    do_slow = 2

This would create one C/XS function, and three perl function entries for do(), do_fast() and do_slow(), which when called will call the same underlying C/XS function with the ix value being 0, 1, or 2 respectively. The C/XS code can then switch on ix and perform the desired operation appropriately.

This means collisions are possible, both at the name level (a name might be listed twice) or the value level (a value might be listed twice). The ParseXS code has historically tried to warn when either happened (modulo some bugs in its implementation). So for instance the old code will warn you if you have something like this:

ALIAS:
   foo = 1
   foo = 1

it will tell you it is "ignoring" the second "foo". However this is not entirely correct (and in fact is only true because the values are the same), if you feed it:

ALIAS:
   foo = 1
   foo = 2

it will again tell you that is ignoring the second foo, but it doesn't actually /ignore/ the second foo, it overwrites the value of foo so it ends up as 2 (and along the way gets confused about what values have been used). Since my objective with the symbolic aliases is to allow an engineer to specify "this alias is deliberately a dupe of that other alias (or sub) and thus should not produce a warning", I necessarily had to change the logic for warnings and while I was doing so I noticed the issues with the existing warnings.

ParseXS also has warnings about multiple names having the same value. So for instance

void
do(dbh)
   SV *dbh
ALIAS:
    do1 = 1
    do2 = 1

you will get a warning about do1 and do2 having the same value, which was actually my core objective, as I have legit reasons to create a dupe and I don't want warnings about them, and there is no way to turn the warning off. So I introduced syntax like this:

void
do(dbh)
   SV *dbh
ALIAS:
    do1 => do
    do2 => do

which would be the same as this:

void
do(dbh)
   SV *dbh
ALIAS:
    do1 = 0
    do2 = 0

except it would not throw warnings. I chose => because the grammar requires a separator that does not match [\w:]. I chose to do 'symbolic aliasing' because I think that when people do this kind of thing intentionally they want to specify "these names should have the same index", where-as if I had simply added a new separator to signal intentionality but kept the values still "numeric" then it would increase the maint burden if the ix for the group were to change.

Interestingly there is another long standing bug in the dupe value warning code in that

void
do(dbh)
   SV *dbh
ALIAS:
    do1 = 0

historically does not warn, duping the default value of the base function (0) only warns on the second alias. From the point of view of the internals of ParseXS I can understand how this bug came about, but it is inconsistent with the behavior of other alias values, so I fixed it to warn similarly to how it would handle any other value collision.

Anyway, adding this new form of alias declaration resulting in new edge cases being added to the warning detection for some of this, for instance consider this:

void
do(dbh)
   SV *dbh
ALIAS:
    do1 = 1
    do2 => do1
    do3 = 1 

do2 should not warn that it is a dupe of 1, but do3 should. While working on these cases I noticed the other issues. Since the all involve the same underlying logic I dealt with them all at once. As far as I intended and understand nothing that used to warn will stop warning, and in only one condition, where a developer is using the default value of 0 for an alias (possibly without knowing) are any new warnings added. On the other hand the warnings, if they were changed at all, should be more informative.

Can you provide an example of where ... we will get better warnings?
how WarnHint() will improve warnings emitted by EU::PXS?

Well part of the intent of this PR is to provide a way to silence unwanted warnings. If I want to alias "foo" and "bar" to the same value internally (maybe for backcompat or something like that) I should be able to do so in a way that xsubpp does not warn, especially when those warnings will be visible to end users who cant do anything about them. None of the core code intentionally does this type of aliasing (probably to avoid warnings), so there are no examples in /core/, and since I added new syntax to the XS module grammar when one does want to do this none of our existing core code would exercise this. However you can see it in the tests. Similarly for the hint message.

You can see pretty much all the changes in one view with the tests in t/001_basic.t and t/XSAlias.t, This is the XS:
void
do(dbh)
   SV *dbh
ALIAS:
    dox = 1
    lox => dox
    pox = 1
    pox = 2
    docks = 1
    dachs => lox
    xunx = 0
    xukes => do

and the test output:

  is $stderr,
    "Warning: Aliases 'pox' and 'dox', 'lox' have"
    . " identical values of 1 in XSAlias.xs, line 9\n"
    . "    (If this is deliberate use a symbolic alias instead.)\n"
    . "Warning: Conflicting duplicate alias 'pox' changes"
    . " definition from '1' to '2' in XSAlias.xs, line 10\n"
    . "Warning: Aliases 'docks' and 'dox', 'lox' have"
    . " identical values of 1 in XSAlias.xs, line 11\n"
    . "Warning: Aliases 'xunx' and 'do' have identical values"
    . " of 0 - the base function in XSAlias.xs, line 13\n",
    "Saw expected warnings from XSAlias.xs";

which expands out to

Warning: Aliases pox and dox, lox have identical values of 1 in XSAlias.xs, line 9
    (If this is deliberate use a symbolic alias instead.)
Warning: Conflicting duplicate alias pox changes definition from 1 to 2 in XSAlias.xs, line 10
Warning: Aliases docks and dox, lox have identical values of 1 in XSAlias.xs, line 11
Warning: Aliases xunx and do have identical values of 0 - the base function in XSAlias.xs, line 13

the result of using WarnHint is visible on the first two lines. The parenthesized message about using a symbolic reference to avoid the warning is the "Hint".

The fix for the incorrect claim about "ignoring" a definition is visible on the third line. Previously the message would not show the value nor show it changing.

The fix for the "ignoring" message overall to show all aliases and to show the value is visible on the first and fourth lines, previously we would not show aliases nor would we show the value, and the fix to now show a warning when the default value of 0 is duplicated is visible on the last line. I chose to make the symbolic alias hint "show once" so as not to clutter up the warning list. The point is to call the devs attention to the new feature so they can change their code not to warn if they are intending to create a dupe, IMO we don't need to do so every case there is an alias collision.

If I change all the => style alias entries to the equivalent '=' entry

void
do(dbh)
   SV *dbh
ALIAS:
    dox = 1
    lox = 1
    pox = 1
    pox = 2
    docks = 1
    dachs = 1
    xunx = 0
    xukes = 0

then we get this on an older perl:

Warning: Aliases 'lox' and 'dox' have identical values in XSAlias.xs, line 8
Warning: Aliases 'pox' and 'lox' have identical values in XSAlias.xs, line 9
Warning: Ignoring duplicate alias 'pox' in XSAlias.xs, line 10
Warning: Aliases 'docks' and 'pox' have identical values in XSAlias.xs, line 11
Warning: Aliases 'dachs' and 'docks' have identical values in XSAlias.xs, line 12
Warning: Aliases 'xukes' and 'xunx' have identical values in XSAlias.xs, line 14

You can see the outright error on line 3 (the value is not being ignored, the old value is being overwritten), and if you compare closely you will notice the warning about xunx and do is missing. The other point is that even though I meant to alias lox and dox I got a warning I don't want!

Hopefully that explains things sufficiently. :-)

@jkeenan
Copy link
Contributor

jkeenan commented Nov 12, 2022

Smoke testing results are satisfactory. I was able to do a limited amount of testing of CPAN distros with XS code against a perl built from this branch; results also satisfactory.

At this point what we could use is some review of the changes by someone who (unlike me) has actually written XS code.

Make sure our output is deterministic.
This patch makes it possible to omit some of the whitespace around
preprocessor directives. It teaches fetch_para() to understand
that a #else or #endif directive that does not end a #if that
was seen in the current "paragraph" should not be parsed as part
of that paragraph. This means that a conditional block that defines
the same sub under different define conditions need not have extra
whitespace after each sub definition.
Sometimes you *want* to create multiple names for the same
functionality, but doing so with the ALIAS functionality requires
awkward workarounds. This adds a new "symbolic alias" that does
not warn on dupes as creating a dupe is its whole point. For a
symbolic alias the value is the name of an existing alias.

This also cleans up some of the warnings related to aliases so
we distinguish between when a duplicate is truly ignored or
where it overrides a previous value. And deal with a few other
edge cases properly.
Also normalize warnings. It used to be if you created an alias
of the root function (0) no warning would be produced. Now
we will produce a warning, but we also allow symbolic references
to defuse the warning.
The upcoming C++23 and C23 standards add #elifdef, #elifndef.
Use warn instead of print STDERR, and provide a way to make
errors trigger a die instead of an exit(1).

Currently the module code is written as though the only way
it will be used is via the xsubpp script, so the library does
annoying things like calling exit() instead of die() to signal
an exception. It also uses print STDERR instead of warn,
which means the test code can't just use a $SIG{__WARN__} hook
to see its warnings, and instead has to include PrimitiveCapture
in the t directory. These two things combine annoyingly in our
test code such that when you break the module you can see tests
exiting early, but with no useful diagnostics as to why.

This patch reworks this to use "warn" instead of print STDERR,
and to provide a way to enable the use of "die" instead of exit.
Thus making debugging failing tests far easier.
The version in the pod has been long wrong. We are on 3.48 now, it was 3.13_01.
@demerphq
Copy link
Collaborator Author

@tonycoz @jkeenan what do you think? Mergable yet?

@jkeenan
Copy link
Contributor

jkeenan commented Nov 15, 2022

@tonycoz @jkeenan what do you think? Mergable yet?

In general, I like what I see here, but I have two principal concerns.

First: On CPAN there are over 2600 distributions that contain XS code. That's less than 10 percent of the total, but I suspect that two things are simultaneously true: (i) they're more idiosyncratic than CPAN as a whole; and (ii) some of them are vastly more important than 99% of what's on CPAN. Many distros with XS have been broken and unmaintained for years, so getting a clean baseline for the impact of this p.r. is not easy.

If we release this as part of 5.37.6 on November 20, the heavy-duty CPANtesters will start testing "all" of CPAN, but it takes time for them to look at FAILs and determine whether the failures are both new and attributable to the code that has just been merged into blead. We have to be prepared for the possibility that flaws in this code which would warrant a revert might not appear for two week or more. When all is said and done this p.r. is only adding a very modest amount of new functionality. If we get a lot of BBC reports we should not burden CPAN authors with the need to do work-arounds to accommodate that functionality.

Second: ExtUtils-ParseXS is a dual-release, blead-first distribution. That means that at some point after this p.r. gets merged, someone will have to do a new CPAN release (the most recent was done by @xsawyerx in January of this year) and, as part of preparation for that release, will have to test this against older versions of perl. Based on CPANtesters results for the latest version, I'd say that we would want EU:PXS to PASS going back to perl-5.8.9. Has there been any assessment of how well this code will work with those older versions? I may have missed something, but I don't think this concern has been discussed so far in this ticket.

@demerphq
Copy link
Collaborator Author

demerphq commented Nov 15, 2022

@jkeenan I take your point. But the more popular cpan module are usually tested early in the process right? So I think we would get pretty decent feedback pretty fast.

Also, i think its worth noting here, nothing I have done (except one warning) changes the behavior of anything that isn't already throwing an exception. For instance the '#else' patch actually makes things that used to throw an error not throw errors. So it is unlikely (albeit possible!) that it will break anything. I think if it does we will see signal very quickly. If it does cause BBC then I would say we would revert it on first sign of trouble.

As for back-compat we test that it compiles with older perls as far as I understand it. I had a breakage because i used s///r which i fixed already. So I think it should be good. Note https://github.com/Perl/perl5/actions/runs/3464003421/jobs/5785197085#step:5:575 test that it builds and passes test on 5.8

@jkeenan
Copy link
Contributor

jkeenan commented Nov 15, 2022

@jkeenan I take your point. But the more popular cpan module are usually tested early in the process right? So I think we would get pretty decent feedback pretty fast.

I don't actually know the order in which people like @andk, @eserte, @bingos test CPAN distros after a monthly dev release. I do know that the process of confirming that some CPANtesters failure is a BBC (and, beyond that, bisecting to a particular commit to blead) is a process that cannot be completely automated as it requires human judgment, hence takes time.

But, assuming @tonycoz is okay with your more recent commits, I'd say this is good to go. Thanks.

@tonycoz
Copy link
Contributor

tonycoz commented Nov 15, 2022

Has there been any assessment of how well this code will work with those older versions?

The github workflow tests all of the modules in dist/ against 5.8, 5.10, 5.24 and the most recent non-development release, 5.36 right now.

This doesn't mean it's tested against all of the XS modules on CPAN.

@demerphq
Copy link
Collaborator Author

Ill merge it and lets find out.

@demerphq demerphq merged commit eae8123 into blead Nov 16, 2022
@iabyn
Copy link
Contributor

iabyn commented Nov 16, 2022 via email

@demerphq
Copy link
Collaborator Author

demerphq commented Nov 16, 2022 via email

@demerphq
Copy link
Collaborator Author

demerphq commented Nov 16, 2022 via email

@demerphq demerphq deleted the yves/parsexs branch November 16, 2022 10:11
@demerphq
Copy link
Collaborator Author

@iabyn I have created #20517 to address this once we decide what we should do. We can also revert the patch that "fixed" this warning, but IMO that would be going backwards. This patch needs more tests before merging I guess, but if we want to silence this immediately we apply this and add the tests afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants