Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: pybind11 demo project should have NumPy own the data #3261

Merged
merged 7 commits into from
Sep 30, 2024

Conversation

jpivarski
Copy link
Member

The original demo does link the memory-release to when the py::capsule does out of scope in Python, but there are issues with crossing ownership of arrays between Python and C++.

The (nontrivial) back-and-forth API for LayoutBuilder snapshots was explicitly intended to allow Python to fully own the data, so I'm updating the example to make it do that.

@HavryliukAY, I'm approaching this as I would always approach a problem like this, doing it in small steps that can be individually tested. I don't ordinarily make a git-commit for each step, but I'll be doing that in this PR to show what this looks like. Between each commit, I do

pip uninstall demo
pip install .
python -c 'import demo; print(repr(demo.create_demo_array()))'

to see what I'm doing. In the first commit, "step 0", it returns None.

@jpivarski
Copy link
Member Author

I did a quick test with

auto np = py::module::import("not_a_real_module_name");

without checking for errors, and pybind11 was nice enough to raise a ModuleNotFoundError for me—no seg-faults! So, based on this, I can py::module::import NumPy and Awkward Array without checking to see if they exist, since the standard error-handling is all I need.

@jpivarski
Copy link
Member Author

nbytes = 24
nbytes = 32
nbytes = 24
None

@jpivarski
Copy link
Member Author

[177  65 227   8  71  87   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0]
[248 248 148   8  71  87   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
[177  65 227   8  71  87   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0]

(The exact values in the allocated arrays will depend on what random memory is located at that address.)

@jpivarski
Copy link
Member Author

pointer = 94383525626576 raw data = 1 1 1
pointer = 94383523939584 raw data = 1 1 1
pointer = 94383525626576 raw data = 1 1 1

The pointer positions are run-dependent, but since I temporarily changed np.empty to np.ones, we should expect all of the bytes to be equal to 1. This test didn't seg-fault because I saw in a previous step that all of these buffers would be allocated with at least 3 bytes...

@jpivarski
Copy link
Member Author

{'node1-data': array([ 86, 159, 205, 171, 248,  90,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0], dtype=uint8), 'node2-offsets': array([ 94, 194, 220, 171, 248,  90,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0], dtype=uint8), 'node3-data': array([221,  65, 208, 175,   5,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,  32,   0,   0,   0,   0,   0,   0,   0], dtype=uint8)}

@jpivarski
Copy link
Member Author

{'node1-data': array([154, 153, 153, 153, 153, 153, 241,  63, 154, 153, 153, 153, 153,
       153,   1,  64, 102, 102, 102, 102, 102, 102,  10,  64], dtype=uint8), 'node2-offsets': array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
       0, 0, 6, 0, 0, 0, 0, 0, 0, 0], dtype=uint8), 'node3-data': array([1, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0,
       0, 0], dtype=uint8)}

These values are not random data:

>>> from numpy import array, uint8
>>> container = {'node1-data': array([154, 153, 153, 153, 153, 153, 241,  63, 154, 153, 153, 153, 153,
...        153,   1,  64, 102, 102, 102, 102, 102, 102,  10,  64], dtype=uint8), 'node2-offsets': array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
...        0, 0, 6, 0, 0, 0, 0, 0, 0, 0], dtype=uint8), 'node3-data': array([1, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0,
...        0, 0], dtype=uint8)}
>>> container["node1-data"].view("f8")
array([1.1, 2.2, 3.3])
>>> container["node2-offsets"].view("i8")
array([0, 1, 3, 6])
>>> container["node3-data"].view("i4")
array([1, 1, 2, 1, 2, 3], dtype=int32)

It's the data we want!

@jpivarski
Copy link
Member Author

The above commit is the last step, which builds the array from ak.from_buffers and returns it, instead of returning None.

python -c 'import demo; demo.create_demo_array().show(type=True)'

prints

type: 3 * {
    one: float64,
    two: var * int32
}
[{one: 1.1, two: [1]},
 {one: 2.2, two: [1, 2]},
 {one: 3.3, two: [1, 2, 3]}]

@jpivarski jpivarski enabled auto-merge (squash) September 30, 2024 19:25
@jpivarski
Copy link
Member Author

TODO: something similar should be done for the Cython example.

// Allocate memory
std::map<std::string, void *> buffers = {};
for (auto it: names_nbytes) {
buffers[it.first] = malloc(it.second);
}

It should be even easier in Cython, since you can call Python directly.

@jpivarski jpivarski merged commit c4268e0 into main Sep 30, 2024
44 checks passed
@jpivarski jpivarski deleted the jpivarski/demo-should-make-python-allocate-arrays branch September 30, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant