Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another problem with datahandle #2458

Open
nrnhines opened this issue Aug 10, 2023 · 13 comments
Open

Another problem with datahandle #2458

nrnhines opened this issue Aug 10, 2023 · 13 comments
Labels

Comments

@nrnhines
Copy link
Member

This may be related to #2457 but I think it is different. Works for 8.2.2 (and earlier) but not master
To reproduce:

git clone [email protected]:nrnhines/dcmdt.git
cd dcmdt
nrnivmodl
python3 -i dcdt.py

fails with

$ python3 -i dcdt.py
NEURON: CVode[0]::active: generic_data_handle{cont=capacitance cm row=0/2 type=double*}::literal_value<void*> cannot be called on a handle [that was] in modern mode
 in dcdt.ses near line 69
 }
  ^
        CVode[0].active(1)
      NumericalMethodPanel[0].iscvode()
    NumericalMethodPanel[0].restore(301, 1)
  xopen("dcdt.ses")
and others
NEURON: hoc_Load_file dcdt.ses
 in dcdt.ses near line 0
 ^
        load_file("dcdt.ses")
Traceback (most recent call last):
  File "/home/hines/models/dcmdt/dcdt.py", line 30, in <module>
    h.load_file("dcdt.ses")
RuntimeError: hocobj_call error: hoc_execerror: hoc_Load_file dcdt.ses
@nrnhines nrnhines added the bug label Aug 10, 2023
@nrnhines
Copy link
Member Author

nrnhines commented Aug 11, 2023

The problem (no solution yet) turns out to be explicitly handling a pointer (call it POINTER c} in a VERBATIM block via

  VERBATIM
  if (_p_c) {
  }
  ENDVERBATIM

I.e. a typical legacy block to determine if the pointer is valid.
The mod file translator uses

#define _p_c _ppvar[0].literal_value<void*>()

When that is evaluated and _p_c points to a voltage, then the error is

RuntimeError: hocobj_call error: generic_data_handle{Node::field::Voltage row=1/3 type=double*}::literal_value<void*> cannot be called on a handle [that was] in modern mode

A more or less minimal example of this is

$ cat temp.mod
NEURON {
  SUFFIX temp
  THREADSAFE
  POINTER c
}
 
ASSIGNED {
  c
}
 
INITIAL {
  VERBATIM
  if (!_p_c) {
    printf("POINTER c not set\n");
  }
  ENDVERBATIM
}

and

$ cat temp.py
from neuron import h

s = h.Section()
s.cm = 1
s.insert("temp")
s(0.5).temp._ref_c = s(0.5)._ref_v

h.finitialize(-65)

execute with

nrnivmodl
python3 temp.py

If c is set to point to h._ref_t (i.e. s(0.5).temp._ref_c = h._ref_t, the error message changes to

RuntimeError: hocobj_call error: generic_data_handle{raw=0x7fa33377f3c0 type=double*} does not hold a literal value of type void*

If c is never set from python, then one does get the output

POINTER C not set

A user written pointer test is still useful, since

INITIAL {
  printf("POINTER c points to %g\n", c)
}

gives a segfault if the pointer is not set.

@1uc
Copy link
Collaborator

1uc commented Aug 16, 2023

While reading the generated code I spot he following:

#define c	*_ppvar[0].get<double*>()
#define _p_c _ppvar[0].literal_value<void*>()

This looks surprising, because _ppvar[0].get<double*>() returns a double*, which is then dereferenced. Hence, (I'm again guessing a little) semantically in MOD c "point" to the value of something else; but c is the value (much like a C++ reference); and _p_c is the address of the thing that c "points to".

My first guess is that this is a code generation bug and it should have generated:

#define c	*_ppvar[0].get<double*>()
#define _p_c  _ppvar[0].get<double*>()

Further down there's code that suggests that we're using the SoA containers for mechanisms, i.e. I would guess that we'd just have a column with double * entries (one for each node where this mechanism is active). If it's an entry in a soa container, then this generic_data_handle doesn't contain a literal value, rather it behaves like a type erased (modern) data_handle, or to translate to the closest C equivalent _ppvar[0] is a void * and _ppvar[0].get<double*>() is just (double*) _ppvar[0]. The lines that make me believe it's an entry in a soa container are:

   _nrn_mechanism_register_data_fields(_mechtype,
                                       _nrn_mechanism_field<double>{"c1"} /* 0 */,
                                       ...
                                       _nrn_mechanism_field<double>{"_g"} /* 8 */,
                                       _nrn_mechanism_field<double*>{"c", "pointer"} /* 0 */);

I've locally edited the file x86_64/dcdt.cpp after it was generated; and reran nrnivmodl go generate a new special (I checked that x86_64/dcdt.cpp wasn't overwritten). When I ran

python -i dcdt.py

it didn't create a warning and didn't crash either. @nrnhines do you think my guesses are reasonable? If yes, I'll try to find out how to fix the code generation.

@ramcdougal
Copy link
Member

ramcdougal commented Aug 16, 2023

I think you're correct about the semantics of _p because through NEURON 8.2.2, you'd get e.g.

#define on_init *_ppvar[2]._pval
#define _p_on_init _ppvar[2]._pval

where they're literally the same except one is dereferenced. I don't know why you'd ever have a void* stored in a data handle, I thought part of the purpose of the changes was to get rid of the use of void*.

In my code when I ran into this the other day, I simply changed my VERBATIM block's mention of _p_on_init to &on_init, which I honestly like better anyways and works with both 8.22 and 9.0a.

(I'd argue for getting rid of the _p variables entirely except that would probably break legacy VERBATIM blocks... although maybe they're already broken.)

@1uc
Copy link
Collaborator

1uc commented Aug 16, 2023

a void* stored in a data handle

There's a subtlety here. The first is that data_handle and generic_data_handle are different objects. What we have here is a generic_data_handle. The important difference is that one of them is a template class:

template <typename T>
struct data_handle {

and the other isn't:

*/
struct generic_data_handle {
private:

This is what I mean when I say that generic_data_handle is "type-erased". It doesn't know (at least not at compile time) the type of what it's a "handle" of. This is why we need to help it remember by typing:

generic_data_handle c_gdh = ...;
double c = *c_gdh.get<double*>();

You can peak inside and see the void * that enables the type-erasure here:

Yes, that's a pointer to a soa<T> container, and not a row within that soa container. It it were just the address of a specific row in the soa container, then we'd be very close to the C analog of (double*) (m_container).

Edit: The above is only true if generic_data_handle doesn't store a "literal value". Which is the additional complexity of generic_data_handle: it acts much like a variant. However, I don't think we're dealing with a "literal value" here (and the program is failing because it claims we're not dealing with one). What we have is just the equivalent of a type-erased (modern) data_handle, because we're referring to a row in some soa container.

@nrnhines
Copy link
Member Author

do you think my guesses are reasonable?

Yes.

But I think the SoA implementation of POINTER (like the previous implementations from inception of the POINTER keyword onward) isn't providing permutation invariance, reallocation invariance, and invalidation error messaging (except for itself! Not what it is pointing at.) that we now expect from data handles (at least for data handles to RANGE variables). Sounds like what is wanted is "POINTER is a data handle to a data handle".
NMODL syntax, e.g.,

POINTER pre
...
    i = g*(vpre - v)

more or less was orginially designed to point to a range variable. In this case the user restricts them to point to a voltage at another location. Though it has an obvious use also for HOC scalar, or ultimately, any double, e.g. A Vector element.

Sadly, in VERBATIM blocks, POINTER was at hand to serve for pointer to Vector or Random123, or, indeed, any HOC Object*
or C struct. And that is why void* was used in the old c translated mod files (where the user would cast at liberty in the VERBATIM code). And COREPOINTER got introduced to transfer struct data (again with user written VERBATIM block code) to and from CoreNEURON.

Going back to "POINTER is a data handle to a data handle". That is in tension with the desire for optimum performance.
Perhaps after an explicit low performance implementation, it could (if worthwhile) be improved to have less indirection while frozen and get updated if any data handle data is moved, invalidated, permuted.

Actually, give my current level of understanding of data handle, it is not clear to me that POINTER cannot be considered as
"POINTER is a data handle" and many instances of the mechanism mean it is just an extra std:vector associated with the mechanism.

@ramcdougal
Copy link
Member

Sadly, in VERBATIM blocks, POINTER was at hand to serve for pointer to Vector or Random123, or, indeed, any HOC Object* or C struct

Or in my example, a Python callback... i.e. it need not be data per se, we can misuse it to pass around functions and just do the casting in a VERBATIM block. In the future, there should probably be a dedicated POINTER type for this, but that's beyond the scope of this issue.

@1uc
Copy link
Collaborator

1uc commented Aug 17, 2023

I'm a lot more confused now, because it seems like POINTER hasn't been migrated to the soa container for mechanisms. Three reasons for this statement:

@nrnhines
Copy link
Member Author

I looked at the translated dcdt.mod that began this issue and see that the last arg of

_nrn_mechanism_register_data_fields(_mechtype,

most of which are for RANGE variables, is

_nrn_mechanism_field<double*>{"c", "pointer"} /* 0 */);

So I surmise that all of the special types that used to be in the dparam as well as the standard RANGE doubles that used to be in param, are now organized in one place. But from the <legacy?> threadargs definitions, the non-doubles are still separated into the Datum* _ppval.
As an aside, I notice that the hh.mod registration of ion variables follows the similar pattern. E.g.

_nrn_mechanism_field<double*>{"_ion_ek", "k_ion"} /* 3 */,

@nrnhines
Copy link
Member Author

It seems to me now that the typical usage of POINTER, I.e, a double* to a RANGE variable, is presently fully functional except for an error message if invalid. *_ppvar[0].get<double*>() is certainly a huge advantage over the pre SoA POINTER in being invariant under moving and reallocating what it points to. This issue could be closed with merely a version of get that raises an error if it is invalid. E.g. *_ppvar[0].get_if_valid<double*>().

Then the explicit check I added in the mod file would not be needed.

Later on, for higher performance, the check could be avoided by a check for validity during initialize.

I believe that one of the ways CoreNEURON has higher performance is that all model data is in a single array and thus a pointer to any range variable outside of the mechanism instance is merely an integer. In particular this includes Node* variables (such as voltage) and ionic variables that are used by the mechanism instance. For the future, it seems this could be effectively introduced into NEURON but I first have to look into how this is currently handled for voltage.

1uc added a commit that referenced this issue Aug 18, 2023
This commit adds an example (by means of unit-tests) that demonstrates
the behaviour we see #2458. It involves understanding how raw pointers
and pointers to values inside a `soa` are function. It further
demonstrates the difference of `generic_data_handle::get` and
`generic_data_handle::literal_value`.
@1uc
Copy link
Collaborator

1uc commented Aug 18, 2023

(Messages are crossing; but unfortunately I think the situation might be slightly more complicated. Hence I'm still posting despite the issue being "resolved".)

I think I understand why the generated code uses literal_value. Unfortunately, it's important to understand the hybrid nature of generic_data_handle. The simple version is that it can either store a raw pointer; or a data_handle. If it stores a raw pointer, the raw pointer is simply stored in the generic_data_handle. If it represents a type-erased data_handle then it stores enough information to create a genuine data_handle<T> on the fly.

When creating a generic_data_handle from a pointer, it'll perform a search to see if this address at that time refers to any element in any (known) soa container. If that's the case, then the generic_data_handle will create a modern data_handle and store all required information to recreate that data handle when needed. This includes storing the "stable identifier" of the element that the pointer points to.

Example, if a mechanism has a c pointer and in HOC we set c to point to the voltage of the first segment, then when performing:

_ppval[0] = &v_first_segment;

one will find that the address &v_first_segment lives inside a soa container. It'll obtain a (non-owning) stable identifier of v_first_segment, and store it along with some additional information in _ppval[0]. If at a later point we ask for c, then it'll create the data handle and virtue of the stable identifier, find the correct value of c (even if the soa has been shuffled or reallocated).

So far it behaves exactly like we initially guessed. Interestingly, under these conditions the POINTER is stable under permutations. I've added a little test that "proves" as much, see #2466.

However, there's a twist. Namely code that assigns to _p_c in a verbatim block. We'll use the address of printf just because I hope this might be a reasonably proxy for what @ramcdougal does with python callbacks:

VERBATIM
   _p_c = &printf; 
ENDVERBATIM

VERBATIM
  _p_c("foo%d", 42);
ENDVERBATIM

If we let _p_c == _ppval[0].get<double*>() then

_p_c  = &printf;

expands to

_ppval[0].get<double*>() = &printf;

Since get returns by value, i.e. it's signature is double * generic_data_handle::get<double*>(); we'd be assigning &printf to an rvalue. This fails to compile.

This is where literal_value comes in. From the documentation:

To match the flexibility of the old Datum, this had to be augmented to allow small literal values (int, double,void*, …) to be stored inside it and accessed via either get<T>() (if only the value is required) or literal_value<T>() if, even worse, the address of the wrapped value needs to be taken.

One imporant difference is that literal_value returns a non-const reference to the value that's stored in the generic_data_handel as a 'literal value'. However, that address is meaningless if the generic_data_handle represents a modern data_handle. Essentially, if _ppval[0] refers to a raw pointer and we let _p_c = _ppval[0].literal_value<void*>() then

_ppval[0].literal_value<void*>() = &printf;

correctly stores the address of printf inside _ppval[0]. However, then encounter the issue first reported here.

1uc added a commit that referenced this issue Aug 18, 2023
This commit adds an example (by means of unit-tests) that demonstrates
the behaviour we see #2458. It involves understanding how raw pointers
and pointers to values inside a `soa` are function. It further
demonstrates the difference of `generic_data_handle::get` and
`generic_data_handle::literal_value`.
@olupton
Copy link
Collaborator

olupton commented Aug 21, 2023

I have not fully parsed if there are any outstanding questions here, but I thought I would chime in with a few thoughts/comments/observations:

POINTER variables are used in two distinct ways. With reference to the generated code:

#define c	*_ppvar[0].get<double*>()
#define _p_c _ppvar[0].literal_value<void*>()
  • they can be bound from HOC/Python to point to part of the SoA model data (e.g. a Node voltage). In this case a permutation-stable handle is stored, and c can be used to access the pointed-to value. Attempting to use &c is formally undefined behaviour if the stored pointer is null, I believe (see https://stackoverflow.com/questions/51691273/is-null-well-defined-in-c), so using this to test validity is questionable. Taking + storing a non-null address that has been obtained in this way is very likely to be a bug, see for example Use SOA format for mechanism data #2027 (comment) and https://nrn.readthedocs.io/en/latest/dev/data-structures.html#storing-addresses-of-range-variables). I would have to check if it is possible to store a raw pointer this way -- I suspect that it is, and that one can (from Python) bind a POINTER variable to an element of a HOC/Python vector.
  • they can be assigned raw pointer values in VERBATIM blocks. In this case _p_c = ... is used. Unfortunately the pattern &_p_c also appears, which is the origin of the use of literal_value (which returns a reference) instead of get. An example is
    pv = (void**)(&_p_ptr);
    , but I believe this is also widespread in ModelDB. This will need to be seriously considered when the storage behind POINTER variables is transposed to SoA format, to ensure that &_p_c values are not stored for a long enough that they can be invalidated by reallocation or permutation.

As has I think been noted, if (as in the first case) the POINTER variable is referring to something (like a voltage) in the model in a permutation-stable way then c is the pointed-to voltage value, and &c is a non-null pointer (double*), but that pointer is generally calculated on the fly -- it is not stored anywhere in the model data. So, for this case then _p_c and &c could be made to yield the same pointer value, but &_p_c would be invalid -- thus breaking the other use-case.

It's unfortunate that POINTER has this overloaded meaning. In #2027 I concluded that, while the two cases are both common, mixing usage of c and _p_c for the same POINTER variable c in the same MOD file did not happen. I believe this is true in the models covered by the test suite and nrn-modeldb-ci, but it seems from the above that counterexamples exist. In my view, the second use-case (via _p_c) should be handled differently, essentially as a RANGE variable of non-double type -- loosely void*, but ideally this could be done in a more type-safe manner.

...
  RANGE my_ptr [MyStruct*]
...
  my_ptr = new MyStruct;
...
  *my_ptr; // MyStruct&

In any case, literal_value is a transition measure that should be consigned to the history bin as soon as possible. I wrote about this in https://nrn.readthedocs.io/en/latest/dev/data-structures.html#eliminate-the-need-for-literal-value.

At the risk of adding to the debt, I don't think there would be any technical barrier to adding _p_valid_c, yielding a bool saying whether c points to a valid part of the SoA data, or if it wraps a non-null raw pointer (thus covering both of the cases above).

@nrnhines
Copy link
Member Author

nrnhines commented Aug 22, 2023

mixing usage of c and _p_c for the same POINTER variable c in the same MOD file did not happen.

That kind of mixing would definitely be a bug. My use of if (_p_c) { was just a naive attempt to generate a recoverable error if I fail to set the data handle from the interpreter instead of getting a segfault.

It's unfortunate that POINTER has this overloaded meaning.

That is certainly true. The nmodl translator correctly disambiguates, I believe, just by noting if c is used in an expression or assignment statement. If so it must be a datahandle to a double. The only way it can be used to point to an Object is within VERBATIM blocks where the _p_c syntax is used to set or get a raw pointer.

It seems that #define c *_ppvar[0].get<double*>() requires that either get is changed to raise an error if the data handle is invalid, or that the translator is changed to use a method name that does raise an error (and pay the cost of the validity check).

Nevertheless, I do like the idea of new keywords/syntax that avoids the overloading.

@olupton
Copy link
Collaborator

olupton commented Aug 23, 2023

The nmodl translator correctly disambiguates, I believe, just by noting if c is used in an expression or assignment statement. If so it must be a datahandle to a double.

OK, I had assumed that this sort of introspection was beyond nocmodl's capabilities, and that it would need the modern NMODL translator.

I would be inclined to modify the code generation to take advantage of this knowledge, and to generate, effectively, a void*-typed RANGE variable if _p_X is being used.

It seems that #define c _ppvar[0].get<double>() requires that either get is changed to raise an error if the data handle is invalid, or that the translator is changed to use a method name that does raise an error (and pay the cost of the validity check).

In the short term, I suppose that this is the simplest way forward.

(apologies, I hit the wrong button and posted prematurely)

@olupton olupton closed this as completed Aug 23, 2023
@olupton olupton reopened this Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants