-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deephaven.learn
fails when converting sufficiently large data
#5403
Comments
The key error here is:
This is being thrown by |
Our problem is specifically happening at https://github.com/deephaven/deephaven-core/blob/main/py/server/deephaven/learn/gather.py#L83. We should do a test where the inputs to the method are validated and look correct. If there is a DH bug, it is likely in |
It could also be a jpy bug in the array conversion. |
I did some experiments that show the following:
|
This may be the numpy code: https://github.com/numpy/numpy/blob/main/numpy/_core/src/multiarray/ctors.c#L3756 |
New experiment. I think this may be related to int32 indexing limits in some bit of code. The int32 max value is 2,147,483,647. To test this theory, consider an 8byte type like
Consider a 4byte type like
|
Here is where the error is raised. https://github.com/numpy/numpy/blob/main/numpy/_core/src/multiarray/ctors.c#L3819 |
Reading through the NumPy C implementation: https://github.com/numpy/numpy/blob/main/numpy/_core/src/multiarray/ctors.c#L3756 Key variables are:
The key point of failure is: PyErr_Format(PyExc_ValueError,
"offset must be non-negative and no greater than buffer "\
"length (%" NPY_INTP_FMT ")", (npy_intp)ts); If the information on types is correct, the ts = view.len; The view is generated using: if (PyObject_GetBuffer(buf, &view, PyBUF_WRITABLE|PyBUF_SIMPLE) < 0) {
writeable = 0;
PyErr_Clear();
if (PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE) < 0) {
Py_DECREF(buf);
Py_DECREF(type);
return NULL;
}
} Together, this makes me wonder if |
Testing the 4 cases above with: from deephaven.learn import gather, learn, Input, Output
from deephaven import empty_table
import numpy as np
n_cols = 3
n_rows = 89478485
et = empty_table(n_rows).update([f"X{idx} = randomDouble(0.0, 10.0)" for idx in range(3)])
def model(features):
return np.max(features)
def t_to_np(rows, cols):
return gather.table_to_numpy_2d(rows, cols, np_type=np.double)
def np_to_t(data, idx):
return data
t = learn(
table=et,
model_func=model,
inputs=[Input([f"X{idx}" for idx in range(3)], t_to_np)],
outputs=[Output("Y", np_to_t, "double")],
batch_size=n_rows
) To test this theory, consider an 8byte type like
Consider a 4byte type like int or float:
|
fixed by jpy-consortium/jpy#145 |
Description
When trying to convert sufficiently large data to a NumPy array using
deephaven.learn
, an error gets raised complaining about a negative offset and buffer length.Steps to reproduce
Expected results
The query to run successfully.
Actual results
The following error w/ stack trace:
Additional details and attachments
Converting
et
to a NumPy array works when usingdeephaven.numpy.to_numpy
.Versions
The text was updated successfully, but these errors were encountered: