-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: pybind11 demo project should have NumPy own the data #3261
docs: pybind11 demo project should have NumPy own the data #3261
Conversation
I did a quick test with auto np = py::module::import("not_a_real_module_name"); without checking for errors, and pybind11 was nice enough to raise a ModuleNotFoundError for me—no seg-faults! So, based on this, I can |
nbytes = 24
nbytes = 32
nbytes = 24
None |
[177 65 227 8 71 87 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[248 248 148 8 71 87 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[177 65 227 8 71 87 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0] (The exact values in the allocated arrays will depend on what random memory is located at that address.) |
pointer = 94383525626576 raw data = 1 1 1
pointer = 94383523939584 raw data = 1 1 1
pointer = 94383525626576 raw data = 1 1 1 The pointer positions are run-dependent, but since I temporarily changed |
{'node1-data': array([ 86, 159, 205, 171, 248, 90, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8), 'node2-offsets': array([ 94, 194, 220, 171, 248, 90, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0], dtype=uint8), 'node3-data': array([221, 65, 208, 175, 5, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)} |
{'node1-data': array([154, 153, 153, 153, 153, 153, 241, 63, 154, 153, 153, 153, 153,
153, 1, 64, 102, 102, 102, 102, 102, 102, 10, 64], dtype=uint8), 'node2-offsets': array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
0, 0, 6, 0, 0, 0, 0, 0, 0, 0], dtype=uint8), 'node3-data': array([1, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0,
0, 0], dtype=uint8)} These values are not random data: >>> from numpy import array, uint8
>>> container = {'node1-data': array([154, 153, 153, 153, 153, 153, 241, 63, 154, 153, 153, 153, 153,
... 153, 1, 64, 102, 102, 102, 102, 102, 102, 10, 64], dtype=uint8), 'node2-offsets': array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
... 0, 0, 6, 0, 0, 0, 0, 0, 0, 0], dtype=uint8), 'node3-data': array([1, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0,
... 0, 0], dtype=uint8)}
>>> container["node1-data"].view("f8")
array([1.1, 2.2, 3.3])
>>> container["node2-offsets"].view("i8")
array([0, 1, 3, 6])
>>> container["node3-data"].view("i4")
array([1, 1, 2, 1, 2, 3], dtype=int32) It's the data we want! |
The above commit is the last step, which builds the array from python -c 'import demo; demo.create_demo_array().show(type=True)' prints type: 3 * {
one: float64,
two: var * int32
}
[{one: 1.1, two: [1]},
{one: 2.2, two: [1, 2]},
{one: 3.3, two: [1, 2, 3]}] |
TODO: something similar should be done for the Cython example. awkward/header-only/examples/cython/demo_impl.cpp Lines 40 to 44 in e946646
It should be even easier in Cython, since you can call Python directly. |
The original demo does link the memory-release to when the
py::capsule
does out of scope in Python, but there are issues with crossing ownership of arrays between Python and C++.The (nontrivial) back-and-forth API for LayoutBuilder snapshots was explicitly intended to allow Python to fully own the data, so I'm updating the example to make it do that.
@HavryliukAY, I'm approaching this as I would always approach a problem like this, doing it in small steps that can be individually tested. I don't ordinarily make a git-commit for each step, but I'll be doing that in this PR to show what this looks like. Between each commit, I do
to see what I'm doing. In the first commit, "step 0", it returns
None
.