Davem/xs refactor5 #22647

iabyn · 2024-10-08T19:09:43Z

This branch heavily refactors the XSUB signature and INPUT parsing parts of ExtUtils::ParseXS. Over about 50 commits, it adds a new file:

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm

and adds two classes to it:

ExtUtils::ParseXS::Node::Param
ExtUtils::ParseXS::Node::Sig

It then changes the parsing and processing so that, instead of building up a number of hashes indexed by var name to store the information about a parameter, it instead stores all the information about a particular parameter in a Node::Param object, then adds all those objects to an array in a Node::Sig object.

Then the Node::Param->as_code() method will emit the C code associated with the declaration and initialisation of one parameter.

This can be viewed as the first steps along the road to making ExtUtils::ParseXS build an AST and emit code as a separate step. Its not nearly there yet - most of the module still emits code as it goes along, saving only the minimum state needed. And the C var declarations are still emitted mostly after each INPUT line is processed rather than at the end of all signature/INPUT processing, but at least an AST for just the sig and parameter data is available for later in the processing.

Overall it makes the code much cleaner and reduces the amount of special-casing, which was often distribution over many parts of the module.

There is little visible change in functionality, although there are more error messages now, for things that would have formerly just silently emitted bad C which would likely fail to compile.

Lots of new tests have been added.

This set of changes does not require a perldelta entry.

tonycoz · 2024-10-08T22:51:51Z

It looks like the changes aren't compatible with 5.8.

I don't know which versions of perl we're meant to be supporting with dual-life modules.

Grinnz · 2024-10-08T23:03:24Z

Per https://github.com/Perl-Toolchain-Gang/toolchain-site/blob/master/lyon-amendment.md 5.8 no longer needs to be supported by the toolchain, though as always it's nice to balance the hardship of maintenance with the river effect of raising the minimum.

iabyn · 2024-10-08T23:20:49Z

I've added an extra commit which fixes 5.8.9 backcompat. The module is intended to to be backcompat to 5.8.3.

tonycoz · 2024-10-14T04:18:02Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm

+    # needs us. So only 'use fields' on systems where Hash::Util has already
+    # been built.
+    if (eval 'require Hash::Util; 1;') {
+        require 'fields.pm';


Why not require fields; ?

Because I was cargo-culting it from similar code I added a few weeks ago in ParseXS.pm. That original code had 'fields.pm' probably as a by-product of my several attempts to get fields to work or be bypassed in a miniperl-ish environment. They are essentially the same thing: "require X" is converted at compile time into "require 'X.pm'", so it doesn't really matter. But I suppose aesthetically "require fields" is better.

tonycoz

That's a lot of commits.

Rename $argsref to $param for consistency with param_check() etc. No functional changes.

One code comment referred to something that has since been changed: update it. Also add a comment about how INPUT lines are split by a regex.

This sub looks for an initialiser string, which starts with a /[=+;]/. This commit extracts out that first character into a separate variable, $init_op, which will make further refactoring easier. Should be no functional changes.

Flag more explicitly that a var is declared in an INPUT line but not in the XSUB's signature (rather than just relying on $var_num not being defined). Should be no functional changes.

Add the file lib/ExtUtils/ParseXS/Node.pm, and in it, define a base class and one derived class: ExtUtils::ParseXS::Node ExtUtils::ParseXS::Node::Param; Then use the Node::Param class within ParseXS.pm to store details about a particular XSUB's params. Currently this just involves upgrading a few hash refs into proper Node::Param objects. There are no methods yet to operate on such objects: the current methods in class ExtUtils::ParseXS expect a hash as an arg, and that's what they get passed - they don't care that the hash is now blessed. Soon, those methods will instead be made methods of the Node::Param class. This commit is the very first baby step into making the XS parser generate an Abstract Syntax Tree (AST). Currently the parser generates C code on the fly, maintaining just the minimum state necessary for that task. I intend over time to (hopefully) gradually store more state in Node::Foo objects, and rather than throwing the objects away after their immediate use, to combine them into bigger and bigger subtrees, until eventually there is a tree which represents an entire XS file.

Rename these methods to check() and as_code(), and make them be methods of the new ExtUtils::ParseXS::Node::Param class, instead of ExtUtils::ParseXS. Should be no functional changes, unless someone has been calling these private methods directly. The diff looks complex, but it's mostly just swapping round which arg is the Param object and which is the ParseXS object; and then doing a general s/\$self/\$xps/g and s/\$param/\$self/g within the methods. The next commit will move the two methods into Node.pm.

The previous commit made those two functions be methods of ExtUtils::ParseXS::Node::Param; now move those two functions into Node.pm where they belong. No functional changes; just a literal cut+paste, except for no longer needing the full package name in the sub declaration, so sub ExtUtils::ParseXS::Node::Param::check { becomes sub check { etc.

The previous commit moved those subs from ParseXS.pm (where indentation is 2 columns) to the new Node.pm file (where the indentation is 4 columns, because I created that file and so I get to choose). This commit re-indents those subs to match the 4-indent. Whitespace and line re-wrapping changes only.

Rename a couple of fields in ExtUtils::ParseXS::Node::Param for clarity and/or consistency: num => arg_num ansi => is_ansi Should be no functional changes.

@Args

This is the first of a few commits which will refactor the code which splits and processes the parameters in an XSUB's signature, such as '(IN char* s, int n = 0)' The code currently has three separate initial splitting/processing loops. This commit combines two of those into a common loop. The next commit will combine the third one. The biggest loop uses a complex regex to split the signature on commas into separate parameters, but where something like (char c = ',', int n = 0) correctly ignores the quoted comma. The other two loops are essentially fallbacks: one where the regex doesn't work, and the other in the presence of the -noargtypes switch. The code was structured along the lines of: my @Args; if (config_allow_argtypes) { # ANSI-style Sig if (can split using regex) { @Args = /regex/g; for (@Args) { # big loop to do lots of processing of all the extra syntactical features allowed with a full ANSI-style signature, i.e. IN/OUT, type, and length(foo), leaving the elements of @_ with all the extra stuff stripped. } } else { @Args = split /,/, ...; Warn "Warning: cannot parse argument list"; } } else { # K&R-style Sig @Args = split /,/, ...; if (config_allow_inout) { # process IN/OUT etc } for (@Args) { # do common K&R syntax processing of each parameter } This commit merges the first two branches, so the code in outline now looks like: my @Args; if (config_allow_argtypes) { if (can split using regex) { @Args = /regex/g; else { @Args = split /,/, ...; Warn "Warning: cannot parse argument list"; } for (@Args) { # do lots of processing of all the extra syntactical # features allowed with a full ANSI-style signature, # i.e. IN/OUT, type, length(foo) } } .... In theory this is a change in behaviour, since the "can't split with regex" branch now does full ANSI-style processing of the parameters, albeit ones which possibly haven't been split correctly. But since this branch is supposed never to be taken, and would only ever be taken if there was a design flaw in the regex which meant it couldn't smartly split on commas, I don't think this really matters.

@Args

The previous commit reduced the three separate XSUB signature-splitting and initial processing branches into two. This commit reduces it to a single common block of code. It also heavily refactors the initial parameter processing code, and adds several new checks. In outline: before this commit, the sig splitting and processing code looked like: my @Args; if (config_allow_argtypes) { # ANSI-style Sig if (can split using regex) { @Args = /regex/g; for (@Args) { # big loop to do lots of processing of all the extra syntactical features allowed with a full ANSI-style signature, i.e. IN/OUT, type, and length(foo), leaving the elements of @_ with all the extra stuff stripped. } } else { @Args = split /,/, ...; Warn "Warning: cannot parse argument list"; } } else { # K&R-style Sig @Args = split /,/, ...; if (config_allow_inout) { # process IN/OUT etc } for (@Args) { # do common K&R syntax processing of each parameter } Following this commit, it now looks like: my @Args; if (can split using regex) { @Args = /regex/g; } else { @Args = split /,/, ...; Warn "Warning: cannot parse argument list"; } for (@Args) { # big loop to do lots of processing of all the extra syntactical features allowed with a full ANSI-style signature, i.e. IN/OUT, type, and length(foo), leaving the elements of @_ with all the extra stuff stripped. $self->blurt("type not allowed") if defined $type && !$self{config_allow_argtypes}; # ... and similar error checks ... } for (@Args) { # do common K&R syntax processing of each parameter } So under -noargtypes, instead of parsing a parameter by using a separate block of code which doesn't know about the new syntactical features like types etc, always parse using code which understands all the extra stuff, but which then errors out with a specific error message if it finds something disallowed under -noargtypes. This is better than previously, where the forbidden thing was simply not understood by the parsing code, and might trigger a confusing generic error, or be silently accepted and cause malformed C code to be emitted. The new errors are: "parameter type not allowed under -noargtypes") "length() pseudo-parameter not allowed under -noargtypes") "parameter IN/OUT modifier not allowed under -noinout") "Unparseable XSUB parameter: '$_'"); "Default value not allowed on length() parameter '$name'" That last one was already an error, but the text has been changed from: "Default value on length() argument: '$_'" and it now does a $self->blurt() rather than a plain 'die', so the correct line number is reported, and parsing can continue, looking for further errors. The next few commits will be moving all the code within that "do common K&R syntax processing" block into the first 'for (@Args)' loop, eventually removing the second loop altogether.

Rather than iteratively appending to a comma-separated string of params on the fly, push them into an array; then join and quote-escape them only when used. Makes the code simpler. Also rename the variable to @report_params since it holds parameter names, not arguments. Should be no functional changes

XSUBs are allowed to have a trailing ellipsis in the signature to disable arg count checking: int foo(a, b, ...) The current code which parses this is very dumb and broken. It doesn't complain about ellipses occurring in positions other than the last, and more bizarrely it actually deletes any embedded ellipis. So for example the default expression in foo(int a, char *b = "stuff ...", int c = 0) gets modified from "stuff ..." into "stuff ". It also emits broken code for the arguments of a wrapped function; i.e. int foo(a, b, ...) embeds a call to the C function 'foo' which looks like: foo(a, b,) Although auto-call with ellipsis doesn't necessarily make much sense, this commit changes it to be 'foo(a, b)' which at least compiles. This commit adds some tests too.

@Args

There are two 'for (@Args)' parsing loops, and the processing of the '= default_expr' part of a parameter is currently split between the two loops. This commit moves everything into the first loop: this is one of a series of commits aimed at the complete elimination of the second loop. It also stores the default value expression in its original form, and now defers any quote-escaping to the code output stage. In theory, no functional changes.

@Args

There are currently two 'for (@Args)' parsing loops to parse an XSUB's signature. Move the special-case code for handling a C++ XSUB's initial THIS/CLASS parameter from before the second loop to before the first loop. This is one of a series of commits aimed at the complete elimination of the second loop. This commit also fixes a bug introduced a couple of commits ago that caused the initial "THIS" or "CLASS" arg in this emitted code's error message to be be skipped: if (items < N) croak("usage: foo(...)") This commit also adds some tests for THIS/CLASS in usage message and autocall parameters.

@Args

There are currently two 'for (@Args)' parsing loops to parse an XSUB's signature. Move the code from the second to the first loop which determines the arg number (if any) associated with each parameter. This is one of a series of commits aimed at the complete elimination of the second loop.

Add a ExtUtils::ParseXS::Node::Sig class. This is is a subclass of Node which is intended to hold all the info parsed and extracted from an XSUB's signature plus any INPUT lines. It mainly consists of an array of Node::Param objects, which hold info about each parsed parameter, plus a hash which maps param names to those param objects. This commit does the basic work; subsequent commits will handle some individual cases which are more complex. The basic arrangement used to be like this: The XSUB's signature (such as "a, OUT b, int c, d = 999") was split and parsed, and the info about each parameter was stored in a collection of hashes; for example: $self->{xsub_map_argname_to_in_out}{b} = 'OUT'; $self->{xsub_map_argname_to_default}{d} = '999'; etc. In addition, a Node::Param object was created for each parameter which has the *type* specified (such as as 'c' above), and pushed onto the @ANSI_params array. Then, each INPUT line was parsed, stored as a temporary Node::Param object, then the as_code() method was called on that object, which emitted the C declaration and initialisation code for that parameter. as_code() would use the various {xsub_map_argname_to_foo} hashes to lookup information needed for the code generation (such as the default expression to emit). Finally, as_code() was called on each param in @ANSI_params, to emit declarations for params listed in the signature but not in an INPUT line. What this commit does is to replace all the various $self->{xsub_map_argname_to_foo} fields with a single $self->{xsub_sig} Node::Sig object; then it creates a Node::Param object for each parameter in the signature and stores them in the Sig object. This means that the info about each param is now stored in a single Node::Param object, rather than being stored across several hashes. A couple of more tricksy xsub_map_argname_to_foo fields have been left as-is for now, to be handled by individual commits to follow shortly. The INPUT processing is left largely unchanged for now: it still, for each INPUT line, creates a Node::Param object, calls as_code() on it, and then immediately frees it. The difference being that as_code() now looks up the info it needs from under $self->{xsub_sig} rather than from lots of hashes. The intention is that eventually those temporary INPUT objects won't be discarded, but will instead be merged into the list of Node::Params in the Node::Sig object. Code will then be emitted by iterating over the params in the Sig object once all sig/INPUT parsing is complete. This will be a significant step towards changing from an "emit code as we go along" model into a "parse everything into an AST then walk the tree, emitting code" model.

Move the setting of $self->{xsub_seen_THIS_in_INPUT} $self->{xsub_seen_RETVAL_in_INPUT} back to INPUT_handler from the check() method. I moved them earlier when splitting out INPUT_handler() into separate parsing and checking functions, but really they should have stayed in INPUT_handler(). It's unlikely to make any functional difference, but logically they are flagging having literally seen a 'THIS' or 'RETVAL' in an INPUT line, and so belong there.

Make the handling of an implied THIS or CLASS first parameter slightly more regular, by pushing a Node::Param object at the start of the list of param objects in the Node::Sig object. The code for this synthetic parameter is still emitted as a special-case: really it could be emitted at the same time as the ANSI-like parameters, but this commit keeps the emitted C declarations in the same order for now. It shouldn't make any difference, but you never know.... This commit breaks a few edge-case behaviours with C++ methods. The next five commits add tests and fixes.

The xsub_class object field is a string, not a boolean.

There are already some basic tests for C++ XSUBs (i.e. XSUBs which include a class in the function name). Expand the tests to cover what should be all the permutations of special cases, e.g. new/DESTROY, static method, etc.

An XSUB can have a class as part of its function name, to indicate that it is a C++ method, It can also have a 'static' prefix on the return type to indicate that it is a class method. For example: static int X::Y::foo(...) However, 'static' without 'X::Y::' makes no sense, and generates invalid C code for an autocall. It's been this way since 5.000 AFAIKT. It's also recently started emitting an 'uninit var' warning. This commit detects the "static but no class" condition, emits a new warning, and turns off the flag which indicated that 'static' was present. The new warning is: "Ignoring 'static' type modifier:" . " only valid with an XSUB name which includes a class"

If an XSUB's name includes a class, it is treated as a C++ method and an implicit initial 'THIS' or 'CLASS' parameter is added. If the XSUB also includes an explicit THIS/CLASS variable, Bad Things happen. There is in fact code in ParseXS.pm which detects 'THIS' in an INPUT line and suppresses the emitting of a duplicate C var declaration; but the generated C code is still wrong. For example, in X::Y::f(int THIS) the C code will expect 2 passed args, for both the implicit and explicit THIS, and the usage() message will be: croak_xs_usage(cv, "THIS, THIS") As it happens, the changes from 3 commits ago to THIS/CLASS handling turned such duplicate declarations into a compile-time error, .e.g. Error: duplicate definition of argument 'THIS' ignored in foo.xs, line N This commit adds tests for those changes, and also removes the now-redundant code from ParseXS which skips emitting a duplicate var definition, as the dup will already have been reported as an error by then.

C++ XSUB methods which indicate that the 'THIS' arg should be updated, such as My::Class::foo(int i) OUTPUT: THIS stopped emitting the update code as of 5 commits ago. Thus commit restores the behaviour, and adds tests.

This commit changes how the argument string is generated which is output as part of an autocall to a C function. In particular: - it deletes the ExtUtils::ParseXS::Utilities::C_func_signature() method; - it deletes the test file t/110-assign_func_args.t which exercised that method; - it adds a method called ExtUtils::ParseXS::Node::Sig::C_func_signature() which serves a similar purpose as the removed method, but in a different and more robust way; - it adds several tests to t/001-basic.t which provide similar coverage to the deleted tests. The former way of generating the arg string worked by: during XSUB signature processing, a list of candidate arg names was accumulated, then joined and stored as $self->{xsub_C_auto_function_signature} by C_func_signature(), which handled OUT params and skipped any fake THIS/CLASS arg. Then, If a C_ARGS keyword was then encountered, that would overwrite the xsub_C_auto_function_signature string. Then when INPUT lines were processed, any var declared as '&foo' would cause something like this to be done: $self->{xsub_C_auto_function_signature} =~ s/\bfoo\b/&foo/. This was not robust and could modify the C_ARGS overridden string. The new method instead scans the parameter list in the Node::Sig object and builds the argument list from that *at the time it is needed*. Any C_ARGS value is saved and used as an override. In principle there are no functional changes, apart from bug fixes and the deleting of a "not for public use" method.

Add tests that all the IN/OUT/IN_OUTLIST etc permutations generate plausible code

Now that *all* XSUB parameters are stored as a list of Node::Param objects - not just ANSI ones - there's no longer a need to keep a list of the ANSI ones; instead, just 'grep $_->{is_ansi} all the params' to get the needed ones. Should be no functional changes.

Rather than pushing param names into an array if the XSUB signature declares them as 'OUTLIST' or 'IN_OUTLIST', just grep for such params within the list of Node::Param objects as needed. Should be no functional changes.

On the convenience test function test_many(), make the arg which trims away all generated output except the matching function(s) a bit more flexible, so that in the next commit, it can be used to extract a boot function instead of regular function.

This commit: - eliminates the $self->{xsub_map_arg_idx_to_proto} fields, and in keeping with similar recent comments, instead generates the XSUB's proto string when needed by scanning the list of params in the Node::Sig object. - adds a Node::Sig method, proto_string(), to do that. - Adds lots of tests of XSUB prototype strings. - Adds a 'proto' field to Node::Param objects to store the prototype char(s) for that parameter if it's been overridden by the typemap (the override is actually broken, and always has been; see the next commit for the fix). - Finishes off removing the second signature args processing loop, which the last several commits have been aiming for. - Moves arg count and min arg count state into the Node::Sig object. - Eliminates the no-longer-needed %only_C_inlist var. This commit may well change the generation of XSUB prototype strings in edge cases - if so, then hopefully for the better.

There is supported syntax in typemaps to specify a sub prototype value for a parameter, to override the default of '$'. This is done using an optional third field in the TYPEMAP section of a typemap file. For example: Foo* T_FOO \[@%] However, due to a thinko in the typemap parser code (using 'proto' rather than 'prototype' as one of the args to a ExtUtils::Typemaps::Type->new() method call), this overridden value has always been silently ignored. And there were no tests for it. This commit fixes that and adds tests. In the usual case where there is no override, the original intent of the Typemap parser code was to add '$' as the proto value. Due to the bug, this was never added, and ExtUtils::Typemaps::Type objects generally had no proto field. This commit diverges from the original intent and keeps the proto field undefined rather than setting it to '$', because there are several tests which check that a regenerated typemap looks like it expects, and they were all suddenly failing due to a sudden '$' appearing on the regenerated typemap line. In addition, in the XS parser, the overridden value as returned by the typemap parser, was being emitted as-is into the generated C code within a double-quoted string. For example, a three-parameter XSUB with an overridden first parameter type, as in the example above, would generate C code along the lines of: newXSproto_portable(..., "\[@%]$$") and the backslash would get removed by the C compiler. So this commit also causes backslashes be escaped before emitting the C code: newXSproto_portable(..., "\\[@%]$$")

@Args

Slightly widen the scope of a {...} block which encloses most of the arg-splitting/processing code so that the block now encompasses *all* the code, then re-indent the 'for (@Args)' block within it, which was was one indent too much due to earlier code refactoring. Apart from the slight scope tweak, whitespace-only.

after heavy refactoring, make the code comments concerning parsing the XSUB signature and its parameters reflect the new reality

Eliminate the xsub_map_argname_to_type field of ExtUtils::ParseXS objects. Instead, now that they exist, get the type value from the relevant Node::Param object within the Node::Sig object.

The 'soft_type' field was added by me about 13 commits ago to get round issues where a TYPE/CLASS param's type is being defined both implicitly (by having the XSUB function name include a class) and explicitly in an INPUT line. Further refactoring since then has made the field unnecessary, and so this commit removed it and just uses the 'type' field of Node::Param always. Also add some more tests for dup THIS/CLASS. There were already tests for a dup in an INPUT line, but not in the signature.

More refactoring: remove the xsub_map_varname_to_seen_in_INPUT field from the ExtUtils::ParseXS class and instead add an in_input field to the ExtUtils::ParseXS::Node::Param class.

The series of approx 45 refactoring commits in this branch have added E::P::Node::Param and E::P::Node:Sig objects, to allow an XSUB's signature to be stored as a list of Param objects in a Sig object. So in a straightforward XSUB like: int foo(a, b = 999) int a int b a pair of Param objects is created and stored in a Sig object to represent the two items in the signature, e.g. { var => 'a', arg_num => 1 }, { var => 'b', arg_num => 2, default => 999 }, But up until now, when the lines in the implicit INPUT section were subsequently parsed, a *second* pair of temporary Param objects were created using both data obtained from the Sig objects and from the INPUT line. The as_code() method was then called on each temp Param object, and the temp object was then thrown away. This commit changes it so that instead, any extra info obtained from an INPUT line is added to the original Param object in the Sig object, (e.g. the 'type' value) and the as_code() method is called on the *original* object. No temporary Param objects are created. This commit is effectively what the previous chain of commits have been preparing for. Before that, all info obtained from the signature was stored in a series of hash refs stored in the ExtUtils::ParseXS object, such as $self->{sub_map_argname_to_default}{b} = 999; The new way is cleaner, conceptually simpler, gathers related code and data into one location, and has fewer special cases that need handling. The main thing it doesn't do yet is defer code generation. It still emits a C var declaration directly after each INPUT line is read and parsed. Eventually I would like for the whole signature/INPUT sequence to be read in and parsed and *then* emit all the code one go (and in the long term, to read in the *whole* XS file before emitting any code). One intended side effect of this commit is that it now detects duplicate 'alien' parameters again - this was broken during the earlier refactoring. An alien param is one declared in INPUT but not in the signature: int foo(a) int a # normal parameter int b # alien parameter int b # duplicate alien parameter A test has been added for this. A test has also been added for looking up the prototype char associated with a type when the type can change, as with a synthetic parameter like THIS.

Simplify a complex series of nested if/elses and update code comments. Should be no functional changes.

Currently, the pseudo-parameter 'length(foo)' is handled by creating a Node::Param object with a 'var' field value of 'XSauto_length_of_foo', while that object is then added to the $sig->{names} object using a key of 'foo'. This commit removes that inconsistency and makes both use 'length(foo)'. This shouldn't (in theory) change the what C code is emitted for the 'STRLEN XSauto_length_of_foo = ...' declaration etc, but might (in principle) change for (better or worse) the detection of errors for things like duplicate var declarations if someone includes 'XSauto_length_of_foo' in an INPUT line. I'm not worrying about it too much as the way such C code is generated will be changed soon.

Rename this long lex var to just '$name' now that the code is simpler and there is less need to distinguish between various name variants.

Currently there is code in both the signature parsing code and in the INPUT parsing code which sets no_init if the IN/OUT/etc parameter modifier matches /^OUT/. This commit makes it check only during signature parsing, which simplifies the code slightly. Should be no functional changes.

Sort the field declaration lines for E::P::Node::Param into a more logical order.

Move the 200 or so lines of code which splits and parses an XSUB's signature out from process_file() and into its own method. This commit changes as little as possible: apart from adding a method declaration and call, the main body of the code has just been cut and pasted as-is, apart from reducing the indentation. The next few commits will change and move the code so that it becomes a method of ExtUtils::ParseXS::Node::Sig rather than of ExtUtils::ParseXS.

The previous commit moved the signature parsing code into its own method. This commit makes the method be in the Extutils::ParseXS::Node::Sig class. This involves changing the name to just 'parse', and swapping around all the mentions of $self and $sig. The next commit will move the method into Node.pm. (By doing this in 3 commits, code changes don't get hidden in the noise).

The previous two commits have moved the signature parsing code into its own sub and made it a method of Extutils::ParseXS::Node::Sig. This commit moves the body of the method into Node.pm. There are no changes to the method's code apart from no longer needing to fully qualify the method name in the declaration. This commit also moves the our ($C_group_rex, $C_arg) var definitions from ParseXS.pm to Node.pm, as they're only used to hold regexes used to split the parameters in the signature. They've also been changed from 'our' to 'my'.

@Args

This method splits the XSUB signature string into individual parameter strings and then parses them. Rename the var from '@Args' to '@param_texts' since it contains parameters rather than args. The _text suffix is to distinguish them from $param objects. This is one of the last steps in my quiet mission to rename variables etc from *arg* to *param* where they hold info about params rather than args. This is particularly significant as some parameters (like 'length(foo)') don't get bound to args, so there isn't a 1:1 correspondence. Also, shorten the names of a couple of lex vars now that they're only in a small method with limited scope: $args_count => $nargs $optional_args_count => $opt_args Should be no functional changes.

Several error messages complain about things like "duplicate argument" when they mean "duplicate parameter". So update these messages to be more accurate.

A couple of fixes to make it build+test under 5.8.9. The grep expression exists $_->{in_out} && $_->{in_out} =~ /OUT$/ was wrong - it should have been defined rather than exists, as is done elsewhere. I'm not sure why an 'uninit var' warning only appeared on 5.8.9 but not blead - perhaps some minor autovivification difference? It was also warning $ExtUtils::ParseXS::DIE_ON_ERROR only used once $ExtUtils::ParseXS::AUTHOR_WARNINGS only used once in t/001-basic.t because that test file only loads ExtUtils::ParseXS at runtime. Again, I'm not sure why it didn't warn on blead too. But I made the var initialisations more robust against 'once' warnings anyway by using 'our'.

require 'fields.pm' can be written more simply as require fields They compile to the same optree.

iabyn · 2024-10-18T11:57:18Z

On Wed, Oct 16, 2024 at 04:48:46PM -0700, Tony Cook wrote: That's a lot of commits.

Yeah, sorry! Thanks for the review.

…

-- This is a great day for France! -- Nixon at Charles De Gaulle's funeral

tonycoz reviewed Oct 14, 2024

View reviewed changes

tonycoz approved these changes Oct 16, 2024

View reviewed changes

iabyn added 25 commits October 18, 2024 11:12

ParseXS: bump version 3.54 => 3.55

69864d9

ParseXS: param_check(): refactor: rename lex var

ffa2c82

Rename $argsref to $param for consistency with param_check() etc. No functional changes.

ParseXS: INPUT_handler(): update code comments

b7ad367

One code comment referred to something that has since been changed: update it. Also add a comment about how INPUT lines are split by a regex.

ParseXS: INPUT_handler(): refactor: add $init_op

2df9fb3

This sub looks for an initialiser string, which starts with a /[=+;]/. This commit extracts out that first character into a separate variable, $init_op, which will make further refactoring easier. Should be no functional changes.

ParseXS: INPUT_handler(): refactor: add $is_alien

2b42d0c

Flag more explicitly that a var is declared in an INPUT line but not in the XSUB's signature (rather than just relying on $var_num not being defined). Should be no functional changes.

ParseXS: refactor: rename two Node::Param fields

24bd513

Rename a couple of fields in ExtUtils::ParseXS::Node::Param for clarity and/or consistency: num => arg_num ansi => is_ansi Should be no functional changes.

ParseXS: refactor: fix code comment

bc76237

The xsub_class object field is a string, not a boolean.

ParseXS: expand tests for C++ methods

8219ce6

There are already some basic tests for C++ XSUBs (i.e. XSUBs which include a class in the function name). Expand the tests to cover what should be all the permutations of special cases, e.g. new/DESTROY, static method, etc.

ParseXS: fix OUTPUT THIS

919d86b

C++ XSUB methods which indicate that the 'THIS' arg should be updated, such as My::Class::foo(int i) OUTPUT: THIS stopped emitting the update code as of 5 commits ago. Thus commit restores the behaviour, and adds tests.

iabyn added 25 commits October 18, 2024 11:12

ParseXS: add more tests for OUTLIST etc

48b29c9

Add tests that all the IN/OUT/IN_OUTLIST etc permutations generate plausible code

ParseXS: refactor: remove @OUTLIST_vars

3ed0dcc

Rather than pushing param names into an array if the XSUB signature declares them as 'OUTLIST' or 'IN_OUTLIST', just grep for such params within the list of Node::Param objects as needed. Should be no functional changes.

ParseXS: tidy up sig-processing code comments

f09d72c

after heavy refactoring, make the code comments concerning parsing the XSUB signature and its parameters reflect the new reality

ParseXS: refactor: rm {xsub_map_argname_to_type}

9c24c4e

Eliminate the xsub_map_argname_to_type field of ExtUtils::ParseXS objects. Instead, now that they exist, get the type value from the relevant Node::Param object within the Node::Sig object.

ParseXS: rm xsub_map_varname_to_seen_in_INPUT

ac6d839

More refactoring: remove the xsub_map_varname_to_seen_in_INPUT field from the ExtUtils::ParseXS class and instead add an in_input field to the ExtUtils::ParseXS::Node::Param class.

parseXS: refactor: INPUT_handler() init parsing

05c2d6d

Simplify a complex series of nested if/elses and update code comments. Should be no functional changes.

ParseXS: refactor: rename $name_or_lenname

ffc0f1a

Rename this long lex var to just '$name' now that the code is simpler and there is less need to distinguish between various name variants.

ParseXS: refactor: sort Node::Param fields

effafc6

Sort the field declaration lines for E::P::Node::Param into a more logical order.

ParseXS: change error msgs argument => parameter

f2d49d0

Several error messages complain about things like "duplicate argument" when they mean "duplicate parameter". So update these messages to be more accurate.

ParseXS: refactor: simplify require fields

0a291b5

require 'fields.pm' can be written more simply as require fields They compile to the same optree.

iabyn force-pushed the davem/xs_refactor5 branch from d52927a to 0a291b5 Compare October 18, 2024 10:46

iabyn merged commit 9621dfa into blead Oct 18, 2024
67 checks passed

iabyn deleted the davem/xs_refactor5 branch October 18, 2024 11:58

haarg mentioned this pull request Oct 21, 2024

BBC: Blead Breaks JSON::DWIW #22685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Davem/xs refactor5 #22647

Davem/xs refactor5 #22647

iabyn commented Oct 8, 2024

tonycoz commented Oct 8, 2024

Grinnz commented Oct 8, 2024

iabyn commented Oct 8, 2024

tonycoz Oct 14, 2024

iabyn Oct 14, 2024

tonycoz left a comment

iabyn commented Oct 18, 2024 via email

Davem/xs refactor5 #22647

Davem/xs refactor5 #22647

Conversation

iabyn commented Oct 8, 2024

tonycoz commented Oct 8, 2024

Grinnz commented Oct 8, 2024

iabyn commented Oct 8, 2024

tonycoz Oct 14, 2024

Choose a reason for hiding this comment

iabyn Oct 14, 2024

Choose a reason for hiding this comment

tonycoz left a comment

Choose a reason for hiding this comment

iabyn commented Oct 18, 2024 via email