docs: pybind11 demo project should have NumPy own the data #3261

jpivarski · 2024-09-30T18:40:10Z

The original demo does link the memory-release to when the py::capsule does out of scope in Python, but there are issues with crossing ownership of arrays between Python and C++.

The (nontrivial) back-and-forth API for LayoutBuilder snapshots was explicitly intended to allow Python to fully own the data, so I'm updating the example to make it do that.

@HavryliukAY, I'm approaching this as I would always approach a problem like this, doing it in small steps that can be individually tested. I don't ordinarily make a git-commit for each step, but I'll be doing that in this PR to show what this looks like. Between each commit, I do

pip uninstall demo
pip install .
python -c 'import demo; print(repr(demo.create_demo_array()))'

to see what I'm doing. In the first commit, "step 0", it returns None.

jpivarski · 2024-09-30T18:45:51Z

I did a quick test with

auto np = py::module::import("not_a_real_module_name");

without checking for errors, and pybind11 was nice enough to raise a ModuleNotFoundError for me—no seg-faults! So, based on this, I can py::module::import NumPy and Awkward Array without checking to see if they exist, since the standard error-handling is all I need.

jpivarski · 2024-09-30T18:54:24Z

nbytes = 24
nbytes = 32
nbytes = 24
None

jpivarski · 2024-09-30T18:56:51Z

[177  65 227   8  71  87   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0]
[248 248 148   8  71  87   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
[177  65 227   8  71  87   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0]

(The exact values in the allocated arrays will depend on what random memory is located at that address.)

jpivarski · 2024-09-30T19:06:02Z

pointer = 94383525626576 raw data = 1 1 1
pointer = 94383523939584 raw data = 1 1 1
pointer = 94383525626576 raw data = 1 1 1

The pointer positions are run-dependent, but since I temporarily changed np.empty to np.ones, we should expect all of the bytes to be equal to 1. This test didn't seg-fault because I saw in a previous step that all of these buffers would be allocated with at least 3 bytes...

jpivarski · 2024-09-30T19:16:28Z

{'node1-data': array([ 86, 159, 205, 171, 248,  90,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0], dtype=uint8), 'node2-offsets': array([ 94, 194, 220, 171, 248,  90,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0], dtype=uint8), 'node3-data': array([221,  65, 208, 175,   5,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,  32,   0,   0,   0,   0,   0,   0,   0], dtype=uint8)}

jpivarski · 2024-09-30T19:20:47Z

{'node1-data': array([154, 153, 153, 153, 153, 153, 241,  63, 154, 153, 153, 153, 153,
       153,   1,  64, 102, 102, 102, 102, 102, 102,  10,  64], dtype=uint8), 'node2-offsets': array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
       0, 0, 6, 0, 0, 0, 0, 0, 0, 0], dtype=uint8), 'node3-data': array([1, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0,
       0, 0], dtype=uint8)}

These values are not random data:

>>> from numpy import array, uint8
>>> container = {'node1-data': array([154, 153, 153, 153, 153, 153, 241,  63, 154, 153, 153, 153, 153,
...        153,   1,  64, 102, 102, 102, 102, 102, 102,  10,  64], dtype=uint8), 'node2-offsets': array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
...        0, 0, 6, 0, 0, 0, 0, 0, 0, 0], dtype=uint8), 'node3-data': array([1, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0,
...        0, 0], dtype=uint8)}
>>> container["node1-data"].view("f8")
array([1.1, 2.2, 3.3])
>>> container["node2-offsets"].view("i8")
array([0, 1, 3, 6])
>>> container["node3-data"].view("i4")
array([1, 1, 2, 1, 2, 3], dtype=int32)

It's the data we want!

jpivarski · 2024-09-30T19:23:56Z

The above commit is the last step, which builds the array from ak.from_buffers and returns it, instead of returning None.

python -c 'import demo; demo.create_demo_array().show(type=True)'

prints

type: 3 * {
    one: float64,
    two: var * int32
}
[{one: 1.1, two: [1]},
 {one: 2.2, two: [1, 2]},
 {one: 3.3, two: [1, 2, 3]}]

jpivarski · 2024-09-30T19:31:41Z

TODO: something similar should be done for the Cython example.

awkward/header-only/examples/cython/demo_impl.cpp

Lines 40 to 44 in e946646

    
           // Allocate memory 
        
           std::map<std::string, void *> buffers = {}; 
        
           for (auto it: names_nbytes) { 
        
               buffers[it.first] = malloc(it.second); 
        
           }

It should be even easier in Cython, since you can call Python directly.

step 0: remove all existing code and return None

1e9c29c

jpivarski temporarily deployed to docs September 30, 2024 18:52 — with GitHub Actions Inactive

step 1: make sure we can iterate over names_nbytes

edfb65e

step 2: make sure we can create a NumPy array through pybind11

9ec5c2a

step 3: make sure we can see the raw data in the array

bf0b3dd

step 4: make sure we can fill the dict and the std::map

b27af56

step 5: filling the cpp_container fills the py_container

fb4d817

done: we are now returning the array build by ak.from_buffers

be04b55

jpivarski enabled auto-merge (squash) September 30, 2024 19:25

jpivarski mentioned this pull request Sep 30, 2024

Cython header-only demo project should have NumPy own the data #3262

Open

jpivarski deployed to docs September 30, 2024 19:34 — with GitHub Actions View deployment

jpivarski merged commit c4268e0 into main Sep 30, 2024
44 checks passed

jpivarski deleted the jpivarski/demo-should-make-python-allocate-arrays branch September 30, 2024 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: pybind11 demo project should have NumPy own the data #3261

docs: pybind11 demo project should have NumPy own the data #3261

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

docs: pybind11 demo project should have NumPy own the data #3261

docs: pybind11 demo project should have NumPy own the data #3261

Conversation

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024

jpivarski commented Sep 30, 2024