Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Davem/xs refactor5 #22647

Merged
merged 55 commits into from
Oct 18, 2024
Merged

Davem/xs refactor5 #22647

merged 55 commits into from
Oct 18, 2024

Conversation

iabyn
Copy link
Contributor

@iabyn iabyn commented Oct 8, 2024

This branch heavily refactors the XSUB signature and INPUT parsing parts of ExtUtils::ParseXS. Over about 50 commits, it adds a new file:

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm

and adds two classes to it:

ExtUtils::ParseXS::Node::Param
ExtUtils::ParseXS::Node::Sig

It then changes the parsing and processing so that, instead of building up a number of hashes indexed by var name to store the information about a parameter, it instead stores all the information about a particular parameter in a Node::Param object, then adds all those objects to an array in a Node::Sig object.

Then the Node::Param->as_code() method will emit the C code associated with the declaration and initialisation of one parameter.

This can be viewed as the first steps along the road to making ExtUtils::ParseXS build an AST and emit code as a separate step. Its not nearly there yet - most of the module still emits code as it goes along, saving only the minimum state needed. And the C var declarations are still emitted mostly after each INPUT line is processed rather than at the end of all signature/INPUT processing, but at least an AST for just the sig and parameter data is available for later in the processing.

Overall it makes the code much cleaner and reduces the amount of special-casing, which was often distribution over many parts of the module.

There is little visible change in functionality, although there are more error messages now, for things that would have formerly just silently emitted bad C which would likely fail to compile.

Lots of new tests have been added.

  • This set of changes does not require a perldelta entry.

@tonycoz
Copy link
Contributor

tonycoz commented Oct 8, 2024

It looks like the changes aren't compatible with 5.8.

I don't know which versions of perl we're meant to be supporting with dual-life modules.

@Grinnz
Copy link
Contributor

Grinnz commented Oct 8, 2024

Per https://github.com/Perl-Toolchain-Gang/toolchain-site/blob/master/lyon-amendment.md 5.8 no longer needs to be supported by the toolchain, though as always it's nice to balance the hardship of maintenance with the river effect of raising the minimum.

@iabyn
Copy link
Contributor Author

iabyn commented Oct 8, 2024

I've added an extra commit which fixes 5.8.9 backcompat. The module is intended to to be backcompat to 5.8.3.

# needs us. So only 'use fields' on systems where Hash::Util has already
# been built.
if (eval 'require Hash::Util; 1;') {
require 'fields.pm';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not require fields; ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I was cargo-culting it from similar code I added a few weeks ago in ParseXS.pm. That original code had 'fields.pm' probably as a by-product of my several attempts to get fields to work or be bypassed in a miniperl-ish environment. They are essentially the same thing: "require X" is converted at compile time into "require 'X.pm'", so it doesn't really matter. But I suppose aesthetically "require fields" is better.

Copy link
Contributor

@tonycoz tonycoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of commits.

iabyn added 25 commits October 18, 2024 11:12
Rename $argsref to $param for consistency with param_check() etc.

No functional changes.
One code comment referred to something that has since been changed:
update it.

Also add a comment about how INPUT lines are split by a regex.
This sub looks for an initialiser string, which starts with a /[=+;]/.
This commit extracts out that first character into a separate variable,
$init_op, which will make further refactoring easier.

Should be no functional changes.
Flag more explicitly that a var is declared in an INPUT line but not in
the XSUB's signature (rather than just relying on $var_num not being
defined).

Should be no functional changes.
Add the file lib/ExtUtils/ParseXS/Node.pm, and in it, define a base
class and one derived class:

    ExtUtils::ParseXS::Node
    ExtUtils::ParseXS::Node::Param;

Then use the Node::Param class within ParseXS.pm to store details
about a particular XSUB's params. Currently this just involves
upgrading a few hash refs into proper Node::Param objects. There
are no methods yet to operate on such objects: the current methods in
class ExtUtils::ParseXS expect a hash as an arg, and that's what they
get passed - they don't care that the hash is now blessed.  Soon, those
methods will instead be made methods of the Node::Param class.

This commit is the very first baby step into making the XS parser
generate an Abstract Syntax Tree (AST). Currently the parser generates
C code on the fly, maintaining just the minimum state necessary for that
task. I intend over time to (hopefully) gradually store more state in
Node::Foo objects, and rather than throwing the objects away after their
immediate use, to combine them into bigger and bigger subtrees, until
eventually there is a tree which represents an entire XS file.
Rename these methods to check() and as_code(), and make them be methods
of the new ExtUtils::ParseXS::Node::Param class, instead of
ExtUtils::ParseXS.

Should be no functional changes, unless someone has been calling these
private methods directly.

The diff looks complex, but it's mostly just swapping round which arg
is the Param object and which is the ParseXS object; and then doing a
general s/\$self/\$xps/g and s/\$param/\$self/g within the methods.

The next commit will move the two methods into Node.pm.
The previous commit made those two functions be methods of
ExtUtils::ParseXS::Node::Param; now move those two functions into
Node.pm where they belong.

No functional changes; just a literal cut+paste, except for no longer
needing the full package name in the sub declaration, so

    sub ExtUtils::ParseXS::Node::Param::check {
becomes
    sub check {

etc.
The previous commit moved those subs from ParseXS.pm (where indentation
is 2 columns) to the new Node.pm file (where the indentation is 4
columns, because I created that file and so I get to choose). This
commit re-indents those subs to match the 4-indent.

Whitespace and line re-wrapping changes only.
Rename a couple of fields in ExtUtils::ParseXS::Node::Param for
clarity and/or consistency:

    num  => arg_num
    ansi => is_ansi

Should be no functional changes.
This is the first of a few commits which will refactor the code which
splits and processes the parameters in an XSUB's signature, such as

    '(IN char* s, int n = 0)'

The code currently has three separate initial splitting/processing
loops. This commit combines two of those into a common loop. The next
commit will combine the third one.

The biggest loop uses a complex regex to split the signature on commas
into separate parameters, but where something like

    (char c = ',', int n = 0)

correctly ignores the quoted comma. The other two loops are essentially
fallbacks: one where the regex doesn't work, and the other in the
presence of the -noargtypes switch.

The code was structured along the lines of:

    my @Args;

    if (config_allow_argtypes) { # ANSI-style Sig
        if (can split using regex) {
            @Args = /regex/g;
            for (@Args) {
                # big loop to do lots of processing of all the extra
                syntactical features allowed with a full ANSI-style
                signature, i.e. IN/OUT, type, and length(foo),
                leaving the elements of @_ with all the extra stuff
                stripped.
            }
        }
        else {
           @Args = split /,/, ...;
           Warn "Warning: cannot parse argument list";
        }
    }
    else { # K&R-style Sig
        @Args = split /,/, ...;
        if (config_allow_inout) {
            # process IN/OUT etc
    }

    for (@Args) {
        # do common K&R syntax processing of each parameter
    }

This commit merges the first two branches, so the code in outline now
looks like:

    my @Args;

    if (config_allow_argtypes) {
        if (can split using regex) {
            @Args = /regex/g;
        else {
           @Args = split /,/, ...;
           Warn "Warning: cannot parse argument list";
        }

        for (@Args) {
            # do lots of processing of all the extra syntactical
            # features allowed with a full ANSI-style signature,
            # i.e. IN/OUT, type, length(foo)
        }
    }
    ....

In theory this is a change in behaviour, since the "can't split with
regex" branch now does full ANSI-style processing of the parameters,
albeit ones which possibly haven't been split correctly.

But since this branch is supposed never to be taken, and would only
ever be taken if there was a design flaw in the regex which meant
it couldn't smartly split on commas, I don't think this really matters.
The previous commit reduced the three separate XSUB signature-splitting
and initial processing branches into two. This commit reduces it to a
single common block of code. It also heavily refactors the initial
parameter processing code, and adds several new checks.

In outline: before this commit, the sig splitting and processing code
looked like:

    my @Args;

    if (config_allow_argtypes) { # ANSI-style Sig
        if (can split using regex) {
            @Args = /regex/g;
            for (@Args) {
                # big loop to do lots of processing of all the extra
                syntactical features allowed with a full ANSI-style
                signature, i.e. IN/OUT, type, and length(foo),
                leaving the elements of @_ with all the extra stuff
                stripped.
            }
        }
        else {
           @Args = split /,/, ...;
           Warn "Warning: cannot parse argument list";
        }
    }
    else { # K&R-style Sig
        @Args = split /,/, ...;
        if (config_allow_inout) {
            # process IN/OUT etc
    }

    for (@Args) {
        # do common K&R syntax processing of each parameter
    }

Following this commit, it now looks like:

    my @Args;

    if (can split using regex) {
        @Args = /regex/g;
    }
    else {
       @Args = split /,/, ...;
       Warn "Warning: cannot parse argument list";
    }

    for (@Args) {
        # big loop to do lots of processing of all the extra
        syntactical features allowed with a full ANSI-style
        signature, i.e. IN/OUT, type, and length(foo),
        leaving the elements of @_ with all the extra stuff
        stripped.

        $self->blurt("type not allowed")
            if defined $type && !$self{config_allow_argtypes};
        # ... and similar error checks ...
    }

    for (@Args) {
        # do common K&R syntax processing of each parameter
    }

So under -noargtypes, instead of parsing a parameter by using a separate
block of code which doesn't know about the new syntactical features like
types etc, always parse using code which understands all the extra
stuff, but which then errors out with a specific error message if it
finds something disallowed under -noargtypes. This is better than
previously, where the forbidden thing was simply not understood by the
parsing code, and might trigger a confusing generic error, or be
silently accepted and cause malformed C code to be emitted.

The new errors are:

    "parameter type not allowed under -noargtypes")
    "length() pseudo-parameter not allowed under -noargtypes")
    "parameter IN/OUT modifier not allowed under -noinout")
    "Unparseable XSUB parameter: '$_'");
    "Default value not allowed on length() parameter '$name'"

That last one was already an error, but the text has been changed from:

      "Default value on length() argument: '$_'"

and it now does a $self->blurt() rather than a plain 'die', so the
correct line number is reported, and parsing can continue, looking for
further errors.

The next few commits will be moving all the code within that
"do common K&R syntax processing" block into the first 'for (@Args)'
loop, eventually removing the second loop altogether.
Rather than iteratively appending to a comma-separated string of params
on the fly, push them into an array; then join and quote-escape them
only when used.  Makes the code simpler. Also rename the variable to
@report_params since it holds parameter names, not arguments.

Should be no functional changes
XSUBs are allowed to have a trailing ellipsis in the signature to
disable arg count checking:

    int
    foo(a, b, ...)

The current code which parses this is very dumb and broken. It doesn't
complain about ellipses occurring in positions other than the last, and
more bizarrely it actually deletes any embedded ellipis. So for example
the default expression in

    foo(int a, char *b = "stuff ...", int c = 0)

gets modified from "stuff ..." into "stuff ".

It also emits broken code for the arguments of a wrapped function; i.e.

    int
    foo(a, b, ...)

embeds a call to the C function 'foo' which looks like:

    foo(a, b,)

Although auto-call with ellipsis doesn't necessarily make much sense,
this commit changes it to be 'foo(a, b)' which at least compiles.

This commit adds some tests too.
There are two 'for (@Args)' parsing loops, and the processing of the
'= default_expr' part of a parameter is currently split between the two
loops. This commit moves everything into the first loop: this is one of
a series of commits aimed at the complete elimination of the second loop.

It also stores the default value expression in its original form, and
now defers any quote-escaping to the code output stage.

In theory, no functional changes.
There are currently two 'for (@Args)' parsing loops to parse an XSUB's
signature. Move the special-case code for handling a C++ XSUB's initial
THIS/CLASS parameter from before the second loop to before the first
loop.

This is one of a series of commits aimed at the complete elimination of
the second loop.

This commit also fixes a bug introduced a couple of commits ago that
caused the initial "THIS" or "CLASS" arg in this emitted code's error
message to be be skipped:

    if (items < N) croak("usage: foo(...)")

This commit also adds some tests for THIS/CLASS in usage message and
autocall parameters.
There are currently two 'for (@Args)' parsing loops to parse an XSUB's
signature. Move the code from the second to the first loop which
determines the arg number (if any) associated with each parameter.

This is one of a series of commits aimed at the complete elimination of
the second loop.
Add a ExtUtils::ParseXS::Node::Sig class.

This is is a subclass of Node which is intended to hold all the
info parsed and extracted from an XSUB's signature plus any INPUT lines.
It mainly consists of an array of Node::Param objects, which hold info
about each parsed parameter, plus a hash which maps param names to
those param objects.

This commit does the basic work; subsequent commits will handle some
individual cases which are more complex.

The basic arrangement used to be like this:

The XSUB's signature (such as "a, OUT b, int c, d = 999") was split and
parsed, and the info about each parameter was stored in a collection of
hashes; for example:

    $self->{xsub_map_argname_to_in_out}{b} = 'OUT';
    $self->{xsub_map_argname_to_default}{d} = '999';
    etc.

In addition, a Node::Param object was created for each parameter which
has the *type* specified (such as as 'c' above), and pushed onto the
@ANSI_params array.

Then, each INPUT line was parsed, stored as a temporary Node::Param
object, then the as_code() method was called on that object, which
emitted the C declaration and initialisation code for that parameter.
as_code() would use the various {xsub_map_argname_to_foo} hashes
to lookup information needed for the code generation (such as the
default expression to emit).

Finally, as_code() was called on each param in @ANSI_params, to emit
declarations for params listed in the signature but not in an INPUT
line.

What this commit does is to replace all the various

    $self->{xsub_map_argname_to_foo}

fields with a single

    $self->{xsub_sig}

Node::Sig object; then it creates a Node::Param object for each
parameter in the signature and stores them in the Sig object.  This
means that the info about each param is now stored in a single
Node::Param object, rather than being stored across several hashes.

A couple of more tricksy xsub_map_argname_to_foo fields have been left
as-is for now, to be handled by individual commits to follow shortly.

The INPUT processing is left largely unchanged for now: it still, for
each INPUT line, creates a Node::Param object, calls as_code() on it,
and then immediately frees it. The difference being that as_code() now
looks up the info it needs from under $self->{xsub_sig} rather than from
lots of hashes.

The intention is that eventually those temporary INPUT objects won't be
discarded, but will instead be merged into the list of Node::Params in
the Node::Sig object. Code will then be emitted by iterating over the
params in the Sig object once all sig/INPUT parsing is complete. This
will be a significant step towards changing from an "emit code as we go
along" model into a "parse everything into an AST then walk the tree,
emitting code" model.
Move the setting of

    $self->{xsub_seen_THIS_in_INPUT}
    $self->{xsub_seen_RETVAL_in_INPUT}

back to INPUT_handler from the check() method. I moved them earlier
when splitting out INPUT_handler() into separate parsing and checking
functions, but really they should have stayed in INPUT_handler().
It's unlikely to make any functional difference, but logically they
are flagging having literally seen a 'THIS' or 'RETVAL' in an INPUT
line, and  so belong there.
Make the handling of an implied THIS or CLASS first parameter slightly
more regular, by pushing a Node::Param object at the start of the list
of param objects in the Node::Sig object.

The code for this synthetic parameter is still emitted as a special-case:
really it could be emitted at the same time as the ANSI-like parameters,
but this commit keeps the emitted C declarations in the same order for
now. It shouldn't make any difference, but you never know....

This commit breaks a few edge-case behaviours with C++ methods. The
next five commits add tests and fixes.
The xsub_class object field is a string, not a boolean.
There are already some basic tests for C++ XSUBs (i.e. XSUBs which
include a class in the function name). Expand the tests to cover what
should be all the permutations of special cases, e.g. new/DESTROY,
static method, etc.
An XSUB can have a class as part of its function name, to indicate that
it is a C++ method, It can also have a 'static' prefix on the return
type to indicate that it is a class method. For example:

    static int
    X::Y::foo(...)

However, 'static' without 'X::Y::' makes no sense, and generates invalid
C code for an autocall. It's been this way since 5.000 AFAIKT. It's
also recently started emitting an 'uninit var' warning.

This commit detects the "static but no class" condition, emits a new
warning, and turns off the flag which indicated that 'static' was
present.

The new warning is:

    "Ignoring 'static' type modifier:"
  . " only valid with an XSUB name which includes a class"
If an XSUB's name includes a class, it is treated as a C++ method
and an implicit initial 'THIS' or 'CLASS' parameter is added.

If the XSUB also includes an explicit THIS/CLASS variable, Bad Things
happen. There is in fact code in ParseXS.pm which detects 'THIS' in an
INPUT line and suppresses the emitting of a duplicate C var declaration;
but the generated C code is still wrong. For example, in

    X::Y::f(int THIS)

the C code will expect 2 passed args, for both the implicit and explicit
THIS, and the usage() message will be:

    croak_xs_usage(cv, "THIS, THIS")

As it happens, the changes from 3 commits ago to THIS/CLASS handling
turned such duplicate declarations into a compile-time error, .e.g.

    Error: duplicate definition of argument 'THIS' ignored in foo.xs, line N

This commit adds tests for those changes, and also removes the
now-redundant code from ParseXS which skips emitting a duplicate var
definition, as the dup will already have been reported as an error by
then.
C++ XSUB methods which indicate that the 'THIS' arg should be updated,
such as

    My::Class::foo(int i)
        OUTPUT:
            THIS

stopped emitting the update code as of 5 commits ago. Thus commit
restores the behaviour, and adds tests.
iabyn added 25 commits October 18, 2024 11:12
This commit changes how the argument string is generated which is output
as part of an autocall to a C function.

In particular:

- it deletes the ExtUtils::ParseXS::Utilities::C_func_signature()
  method;

- it deletes the test file t/110-assign_func_args.t which exercised that
  method;

- it adds a method called ExtUtils::ParseXS::Node::Sig::C_func_signature()
  which serves a similar purpose as the removed method, but in a
  different and more robust way;

- it adds several tests to t/001-basic.t which provide similar coverage
  to the deleted tests.

The former way of generating the arg string worked by: during XSUB
signature processing, a list of candidate arg names was accumulated, then
joined and stored as $self->{xsub_C_auto_function_signature} by
C_func_signature(), which handled OUT params and skipped any fake
THIS/CLASS arg. Then, If a C_ARGS keyword was then encountered, that
would overwrite the xsub_C_auto_function_signature string. Then when
INPUT lines were processed, any var declared as '&foo' would cause
something like this to be done:

    $self->{xsub_C_auto_function_signature} =~ s/\bfoo\b/&foo/.

This was not robust and could modify the C_ARGS overridden string.

The new method instead scans the parameter list in the Node::Sig  object
and builds the argument list from that *at the time it is needed*. Any
C_ARGS value is saved and used as an override.

In principle there are no functional changes, apart from bug fixes and
the deleting of a "not for public use" method.
Add tests that all the IN/OUT/IN_OUTLIST etc permutations generate
plausible code
Now that *all* XSUB parameters are stored as a list of Node::Param
objects - not just ANSI ones - there's no longer a need to keep a list
of the ANSI ones; instead, just 'grep $_->{is_ansi} all the params'
to get the needed ones.

Should be no functional changes.
Rather than pushing param names into an array if the XSUB signature
declares them as 'OUTLIST' or 'IN_OUTLIST', just grep for such params
within the list of Node::Param objects as needed.

Should be no functional changes.
On the convenience test function test_many(), make the arg which trims
away all generated output except the matching function(s) a bit more
flexible, so that in the next commit, it can be used to extract a boot
function instead of regular function.
This commit:

- eliminates the $self->{xsub_map_arg_idx_to_proto} fields, and
  in keeping with similar recent comments, instead generates the XSUB's
  proto string when needed by scanning the list of params in the
  Node::Sig object.

- adds a Node::Sig method, proto_string(), to do that.

- Adds lots of tests of XSUB prototype strings.

- Adds a 'proto' field to Node::Param objects to store the prototype
  char(s) for that parameter if it's been overridden by the typemap (the
  override is actually broken, and always has been; see the next
  commit for the fix).

- Finishes off removing the second signature args processing loop,
  which the last several commits have been aiming for.

- Moves arg count and min arg count state into the Node::Sig object.

- Eliminates the no-longer-needed %only_C_inlist var.

This commit may well change the generation of XSUB prototype strings in
edge cases - if so, then hopefully for the better.
There is supported syntax in typemaps to specify a sub prototype value
for a parameter, to override the default of '$'. This is done using an
optional third field in the TYPEMAP section of a typemap file.  For
example:

    Foo*  T_FOO \[@%]

However, due to a thinko in the typemap parser code (using 'proto' rather
than 'prototype' as one of the args to a ExtUtils::Typemaps::Type->new()
method call), this overridden value has always been silently ignored.
And there were no tests for it.

This commit fixes that and adds tests.

In the usual case where there is no override, the original intent of the
Typemap parser code was to add '$' as the proto value. Due to the bug,
this was never added, and ExtUtils::Typemaps::Type objects generally had
no proto field. This commit diverges from the original intent and keeps
the proto field undefined rather than setting it to '$', because there
are several tests which check that a regenerated typemap looks like it
expects, and they were all suddenly failing due to a sudden '$'
appearing on the regenerated typemap line.

In addition, in the XS parser, the overridden value as returned by the
typemap parser, was being emitted as-is into the generated C code within
a double-quoted string. For example, a three-parameter XSUB with an
overridden first parameter type, as in the example above, would generate
C code along the lines of:

    newXSproto_portable(..., "\[@%]$$")

and the backslash would get removed by the C compiler. So this commit
also causes backslashes be escaped before emitting the C code:

    newXSproto_portable(..., "\\[@%]$$")
Slightly widen the scope of a {...} block which encloses most of the
arg-splitting/processing code so that the block now encompasses *all*
the code, then re-indent the 'for (@Args)' block within it, which was
was one indent too much due to earlier code refactoring.

Apart from the slight scope tweak, whitespace-only.
after heavy refactoring, make the code comments concerning parsing the
XSUB signature and its parameters reflect the new reality
Eliminate the xsub_map_argname_to_type field of ExtUtils::ParseXS
objects. Instead, now that they exist, get the type value from the
relevant Node::Param object within the Node::Sig object.
The 'soft_type' field was added by me about 13 commits ago to get round
issues where a TYPE/CLASS param's type is being defined both implicitly
(by having the XSUB function name include a class) and explicitly in an
INPUT line.

Further refactoring since then has made the field unnecessary, and so
this commit removed it and just uses the 'type' field of Node::Param
always.

Also add some more tests for dup THIS/CLASS. There were already tests
for a dup in an INPUT line, but not in the signature.
More refactoring: remove the xsub_map_varname_to_seen_in_INPUT field
from the ExtUtils::ParseXS class and instead add an in_input field to
the ExtUtils::ParseXS::Node::Param class.
The series of approx 45 refactoring commits in this branch have added
E::P::Node::Param and E::P::Node:Sig objects, to allow an XSUB's
signature to be stored as a list of Param objects in a Sig object. So in
a straightforward XSUB like:

    int
    foo(a, b = 999)
        int a
        int b

a pair of Param objects is created and stored in a Sig object to
represent the two items in the signature, e.g.

    { var => 'a', arg_num => 1 },
    { var => 'b', arg_num => 2, default => 999 },

But up until now, when the lines in the implicit INPUT section were
subsequently parsed, a *second* pair of temporary Param objects were
created using both data obtained from the Sig objects and from the INPUT
line. The as_code() method was then called on each temp Param object,
and the temp object was then thrown away.

This commit changes it so that instead, any extra info obtained from an
INPUT line is added to the original Param object in the Sig object,
(e.g. the 'type' value) and the as_code() method is called on the
*original* object. No temporary Param objects are created.

This commit is effectively what the previous chain of commits have been
preparing for. Before that, all info obtained from the signature was
stored in a series of hash refs stored in the ExtUtils::ParseXS object,
such as

    $self->{sub_map_argname_to_default}{b} = 999;

The new way is cleaner, conceptually simpler, gathers related code and
data into one location, and has fewer special cases that need handling.

The main thing it doesn't do yet is defer code generation. It still
emits a C var declaration directly after each INPUT line is read and
parsed. Eventually I would like for the whole signature/INPUT sequence
to be read in and parsed and *then* emit all the code one go (and in the
long term, to read in the *whole* XS file before emitting any code).

One intended side effect of this commit is that it now detects duplicate
'alien' parameters again - this was broken during the earlier
refactoring. An alien param is one declared in INPUT but not in the
signature:

    int
    foo(a)
        int a   # normal parameter
        int b   # alien parameter
        int b   # duplicate alien parameter

A test has been added for this.

A test has also been added for looking up the prototype char associated
with a type when the type can change, as with a synthetic parameter like
THIS.
Simplify a complex series of nested if/elses and update code comments.

Should be no functional changes.
Currently, the pseudo-parameter 'length(foo)' is handled by creating
a Node::Param object with a 'var' field value of 'XSauto_length_of_foo',
while that object is then added to the $sig->{names} object using
a key of 'foo'.

This commit removes that inconsistency and makes both use 'length(foo)'.
This shouldn't (in theory) change the what C code is emitted for the
'STRLEN XSauto_length_of_foo = ...' declaration etc, but might (in
principle) change for (better or worse) the detection of errors for
things like duplicate var declarations if someone includes
'XSauto_length_of_foo' in an INPUT line.

I'm not worrying about it too much as the way such C code is generated
will be changed soon.
Rename this long lex var to just '$name' now that the code is simpler
and there is less need to distinguish between various name variants.
Currently there is code in both the signature parsing code and in the
INPUT parsing code which sets no_init if the IN/OUT/etc parameter
modifier matches /^OUT/.

This commit makes it check only during signature parsing, which
simplifies the code slightly.

Should be no functional changes.
Sort the field declaration lines for E::P::Node::Param into a more
logical order.
Move the 200 or so lines of code which splits and parses an XSUB's
signature out from process_file() and into its own method.

This commit changes as little as possible: apart from adding a method
declaration and call, the main body of the code has just been cut and pasted
as-is, apart from reducing the indentation.

The next few commits will change and move the code so that it becomes
a method of ExtUtils::ParseXS::Node::Sig rather than of ExtUtils::ParseXS.
The previous commit moved the signature parsing code into its own
method.
This commit makes the method be in the Extutils::ParseXS::Node::Sig
class. This involves changing the name to just 'parse', and swapping
around all the mentions of $self and $sig.

The next commit will move the method into Node.pm.

(By doing this in 3 commits, code changes don't get hidden in the noise).
The previous two commits have moved the signature parsing code into its
own sub and made it a method of Extutils::ParseXS::Node::Sig.

This commit moves the body of the method into Node.pm.  There are no
changes to the method's code apart from no longer needing to fully
qualify the method name in the declaration.

This commit also moves the our ($C_group_rex, $C_arg) var definitions
from ParseXS.pm to Node.pm, as they're only used to hold regexes used to
split the parameters in the signature. They've also been changed from
'our' to 'my'.
This method splits the XSUB signature string into individual parameter
strings and then parses them. Rename the var from '@Args' to
'@param_texts' since it contains parameters rather than args. The _text
suffix is to distinguish them from $param objects.

This is one of the last steps in my quiet mission to rename variables
etc from *arg* to *param* where they hold info about params rather than args.
This is particularly significant as some parameters (like 'length(foo)')
don't get bound to args, so there isn't a 1:1 correspondence.

Also, shorten the names of a couple of lex vars now that they're
only in a small method with limited scope:

    $args_count          => $nargs
    $optional_args_count => $opt_args

Should be no functional changes.
Several error messages complain about things like "duplicate argument"
when they mean "duplicate parameter".

So update these messages to be more accurate.
A couple of fixes to make it build+test under 5.8.9.

The grep expression

    exists $_->{in_out} && $_->{in_out} =~ /OUT$/

was wrong - it should have been defined rather than exists, as is done
elsewhere. I'm not sure why an 'uninit var' warning only appeared on 5.8.9
but not blead - perhaps some minor autovivification difference?

It was also warning

    $ExtUtils::ParseXS::DIE_ON_ERROR only used once
    $ExtUtils::ParseXS::AUTHOR_WARNINGS only used once

in t/001-basic.t because that test file only loads ExtUtils::ParseXS at
runtime. Again, I'm not sure why it didn't warn on blead too. But I made
the var initialisations more robust against 'once' warnings anyway by
using 'our'.
    require 'fields.pm'

can be written more simply as

    require fields

They compile to the same optree.
@iabyn iabyn force-pushed the davem/xs_refactor5 branch from d52927a to 0a291b5 Compare October 18, 2024 10:46
@iabyn iabyn merged commit 9621dfa into blead Oct 18, 2024
67 checks passed
@iabyn
Copy link
Contributor Author

iabyn commented Oct 18, 2024 via email

@iabyn iabyn deleted the davem/xs_refactor5 branch October 18, 2024 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants