`deephaven.learn` fails when converting sufficiently large data #5403

jjbrosnan · 2024-04-24T14:14:36Z

Description

When trying to convert sufficiently large data to a NumPy array using deephaven.learn, an error gets raised complaining about a negative offset and buffer length.

Steps to reproduce

from deephaven.learn import gather, learn, Input, Output
from deephaven import empty_table
import numpy as np

n_rows = 32_000_000

et = empty_table(n_rows).update([f"X{idx} = randomDouble(0.0, 10.0)" for idx in range(1, 13)])

def model(features):
    return np.max(features)

def t_to_np(rows, cols):
    return gather.table_to_numpy_2d(rows, cols, np_type=np.double)

def np_to_t(data, idx):
    return data

t = learn(
    table=et,
    model_func=model,
    inputs=[Input([f"X{idx}" for idx in range(1, 13)], t_to_np)],
    outputs=[Output("Y", np_to_t, "double")],
    batch_size=n_rows
)

Expected results

The query to run successfully.

Actual results

The following error w/ stack trace:

r-Scheduler-Serial-1 | .c.ConsoleServiceGrpcImpl | Error running script: java.lang.RuntimeException: Error in Python interpreter:
Type: <class 'deephaven.dherror.DHError'>
Value: failed to complete the learn function. : ValueError: offset must be non-negative and no greater than buffer length (-1222967296)
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/table.py", line 764, in update
    return Table(j_table=self.j_table.update(*formulas))
RuntimeError: io.deephaven.engine.exceptions.TableInitializationException: Error while initializing Update([Y]): an exception occurred while performing the initial select or update
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$29(QueryTable.java:1476)
	at io.deephaven.engine.table.impl.perf.QueryPerformanceRecorder.withNugget(QueryPerformanceRecorder.java:369)
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$30(QueryTable.java:1413)
	at io.deephaven.engine.table.impl.QueryTable.memoizeResult(QueryTable.java:3490)
	at io.deephaven.engine.table.impl.QueryTable.selectOrUpdate(QueryTable.java:1412)
	at io.deephaven.engine.table.impl.QueryTable.update(QueryTable.java:1390)
	at io.deephaven.engine.table.impl.QueryTable.update(QueryTable.java:95)
	at io.deephaven.api.TableOperationsDefaults.update(TableOperationsDefaults.java:94)
	at org.jpy.PyLib.executeCode(Native Method)
	at org.jpy.PyObject.executeCode(PyObject.java:138)
	at io.deephaven.engine.util.PythonEvaluatorJpy.evalScript(PythonEvaluatorJpy.java:73)
	at io.deephaven.integrations.python.PythonDeephavenSession.lambda$evaluate$1(PythonDeephavenSession.java:205)
	at io.deephaven.util.locks.FunctionalLock.doLockedInterruptibly(FunctionalLock.java:51)
	at io.deephaven.integrations.python.PythonDeephavenSession.evaluate(PythonDeephavenSession.java:205)
	at io.deephaven.engine.util.AbstractScriptSession.lambda$evaluateScript$0(AbstractScriptSession.java:148)
	at io.deephaven.engine.context.ExecutionContext.lambda$apply$0(ExecutionContext.java:196)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:207)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:195)
	at io.deephaven.engine.util.AbstractScriptSession.evaluateScript(AbstractScriptSession.java:148)
	at io.deephaven.engine.util.DelegatingScriptSession.evaluateScript(DelegatingScriptSession.java:72)
	at io.deephaven.engine.util.ScriptSession.evaluateScript(ScriptSession.java:75)
	at io.deephaven.server.console.ConsoleServiceGrpcImpl.lambda$executeCommand$4(ConsoleServiceGrpcImpl.java:191)
	at io.deephaven.server.session.SessionState$ExportBuilder.lambda$submit$3(SessionState.java:1519)
	at io.deephaven.server.session.SessionState$ExportObject.doExport(SessionState.java:992)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:97)
	at java.base/java.lang.Thread.run(Thread.java:1583)
caused by io.deephaven.engine.table.impl.select.FormulaEvaluationException: In formula: Y = doublePyCast((PyObject) (__scatterer.scatter(0, __FutureOffset)))
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.applyFormulaPerItem(Formula.java:166)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.lambda$fillChunkHelper$4(Formula.java:155)
	at io.deephaven.engine.rowset.RowSequence.lambda$forAllRowKeys$0(RowSequence.java:175)
	at io.deephaven.engine.rowset.impl.singlerange.SingleRangeMixin.forEachRowKey(SingleRangeMixin.java:17)
	at io.deephaven.engine.rowset.RowSequence.forAllRowKeys(RowSequence.java:174)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.fillChunkHelper(Formula.java:153)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.fillChunk(Formula.java:130)
	at io.deephaven.engine.table.impl.sources.ViewColumnSource.fillChunk(ViewColumnSource.java:219)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.doApplyUpdate(SelectColumnLayer.java:412)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.lambda$doSerialApplyUpdate$2(SelectColumnLayer.java:264)
	at io.deephaven.engine.util.systemicmarking.SystemicObjectTracker.executeSystemically(SystemicObjectTracker.java:56)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.doSerialApplyUpdate(SelectColumnLayer.java:263)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.lambda$onAllRequiredColumnsCompleted$1(SelectColumnLayer.java:212)
	at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.lambda$submit$0(ImmediateJobScheduler.java:37)
	at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.submit(ImmediateJobScheduler.java:51)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.onAllRequiredColumnsCompleted(SelectColumnLayer.java:210)
	at io.deephaven.engine.table.impl.select.analyzers.SelectAndViewAnalyzer$SelectLayerCompletionHandler.onLayerCompleted(SelectAndViewAnalyzer.java:627)
	at io.deephaven.engine.table.impl.select.analyzers.BaseLayer.applyUpdate(BaseLayer.java:76)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.applyUpdate(SelectColumnLayer.java:151)
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$29(QueryTable.java:1463)
	... 29 more
caused by java.lang.RuntimeException: Error in Python interpreter:
Type: <class 'deephaven.dherror.DHError'>
Value: failed to convert rows: {0-31999999} and cols: [Lio.deephaven.engine.table.ColumnSource;@4372ccd to a 2D NumPy array : ValueError: offset must be non-negative and no greater than buffer length (-1222967296)
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 83, in table_to_numpy_2d
    tensor = np.frombuffer(buffer, dtype=np_type)
ValueError: offset must be non-negative and no greater than buffer length (-1222967296)

Line: 92
Namespace: table_to_numpy_2d
File: /opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py
Traceback (most recent call last):
  File "<string>", line 13, in t_to_np
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 92, in table_to_numpy_2d

	at org.jpy.PyLib.callAndReturnObject(Native Method)
	at org.jpy.PyObject.call(PyObject.java:449)
	at io.deephaven.integrations.python.PythonFunctionCaller.apply(PythonFunctionCaller.java:32)
	at io.deephaven.integrations.python.PythonFunctionCaller.apply(PythonFunctionCaller.java:15)
	at io.deephaven.integrations.learn.Future.gather(Future.java:81)
	at io.deephaven.integrations.learn.Future.get(Future.java:59)
	at io.deephaven.integrations.learn.Scatterer.scatter(Scatterer.java:39)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.applyFormulaPerItem(Formula.java:164)
	... 48 more


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/__init__.py", line 148, in learn
    return (table
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/table.py", line 766, in update
    raise DHError(e, "table update operation failed.") from e
deephaven.dherror.DHError: table update operation failed. : ValueError: offset must be non-negative and no greater than buffer length (-1222967296)
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/table.py", line 764, in update
    return Table(j_table=self.j_table.update(*formulas))
RuntimeError: io.deephaven.engine.exceptions.TableInitializationException: Error while initializing Update([Y]): an exception occurred while performing the initial select or update
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$29(QueryTable.java:1476)
	at io.deephaven.engine.table.impl.perf.QueryPerformanceRecorder.withNugget(QueryPerformanceRecorder.java:369)
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$30(QueryTable.java:1413)
	at io.deephaven.engine.table.impl.QueryTable.memoizeResult(QueryTable.java:3490)
	at io.deephaven.engine.table.impl.QueryTable.selectOrUpdate(QueryTable.java:1412)
	at io.deephaven.engine.table.impl.QueryTable.update(QueryTable.java:1390)
	at io.deephaven.engine.table.impl.QueryTable.update(QueryTable.java:95)
	at io.deephaven.api.TableOperationsDefaults.update(TableOperationsDefaults.java:94)
	at org.jpy.PyLib.executeCode(Native Method)
	at org.jpy.PyObject.executeCode(PyObject.java:138)
	at io.deephaven.engine.util.PythonEvaluatorJpy.evalScript(PythonEvaluatorJpy.java:73)
	at io.deephaven.integrations.python.PythonDeephavenSession.lambda$evaluate$1(PythonDeephavenSession.java:205)
	at io.deephaven.util.locks.FunctionalLock.doLockedInterruptibly(FunctionalLock.java:51)
	at io.deephaven.integrations.python.PythonDeephavenSession.evaluate(PythonDeephavenSession.java:205)
	at io.deephaven.engine.util.AbstractScriptSession.lambda$evaluateScript$0(AbstractScriptSession.java:148)
	at io.deephaven.engine.context.ExecutionContext.lambda$apply$0(ExecutionContext.java:196)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:207)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:195)
	at io.deephaven.engine.util.AbstractScriptSession.evaluateScript(AbstractScriptSession.java:148)
	at io.deephaven.engine.util.DelegatingScriptSession.evaluateScript(DelegatingScriptSession.java:72)
	at io.deephaven.engine.util.ScriptSession.evaluateScript(ScriptSession.java:75)
	at io.deephaven.server.console.ConsoleServiceGrpcImpl.lambda$executeCommand$4(ConsoleServiceGrpcImpl.java:191)
	at io.deephaven.server.session.SessionState$ExportBuilder.lambda$submit$3(SessionState.java:1519)
	at io.deephaven.server.session.SessionState$ExportObject.doExport(SessionState.java:992)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:97)
	at java.base/java.lang.Thread.run(Thread.java:1583)
caused by io.deephaven.engine.table.impl.select.FormulaEvaluationException: In formula: Y = doublePyCast((PyObject) (__scatterer.scatter(0, __FutureOffset)))
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.applyFormulaPerItem(Formula.java:166)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.lambda$fillChunkHelper$4(Formula.java:155)
	at io.deephaven.engine.rowset.RowSequence.lambda$forAllRowKeys$0(RowSequence.java:175)
	at io.deephaven.engine.rowset.impl.singlerange.SingleRangeMixin.forEachRowKey(SingleRangeMixin.java:17)
	at io.deephaven.engine.rowset.RowSequence.forAllRowKeys(RowSequence.java:174)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.fillChunkHelper(Formula.java:153)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.fillChunk(Formula.java:130)
	at io.deephaven.engine.table.impl.sources.ViewColumnSource.fillChunk(ViewColumnSource.java:219)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.doApplyUpdate(SelectColumnLayer.java:412)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.lambda$doSerialApplyUpdate$2(SelectColumnLayer.java:264)
	at io.deephaven.engine.util.systemicmarking.SystemicObjectTracker.executeSystemically(SystemicObjectTracker.java:56)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.doSerialApplyUpdate(SelectColumnLayer.java:263)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.lambda$onAllRequiredColumnsCompleted$1(SelectColumnLayer.java:212)
	at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.lambda$submit$0(ImmediateJobScheduler.java:37)
	at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.submit(ImmediateJobScheduler.java:51)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.onAllRequiredColumnsCompleted(SelectColumnLayer.java:210)
	at io.deephaven.engine.table.impl.select.analyzers.SelectAndViewAnalyzer$SelectLayerCompletionHandler.onLayerCompleted(SelectAndViewAnalyzer.java:627)
	at io.deephaven.engine.table.impl.select.analyzers.BaseLayer.applyUpdate(BaseLayer.java:76)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.applyUpdate(SelectColumnLayer.java:151)
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$29(QueryTable.java:1463)
	... 29 more
caused by java.lang.RuntimeException: Error in Python interpreter:
Type: <class 'deephaven.dherror.DHError'>
Value: failed to convert rows: {0-31999999} and cols: [Lio.deephaven.engine.table.ColumnSource;@4372ccd to a 2D NumPy array : ValueError: offset must be non-negative and no greater than buffer length (-1222967296)
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 83, in table_to_numpy_2d
    tensor = np.frombuffer(buffer, dtype=np_type)
ValueError: offset must be non-negative and no greater than buffer length (-1222967296)

Line: 92
Namespace: table_to_numpy_2d
File: /opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py
Traceback (most recent call last):
  File "<string>", line 13, in t_to_np
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 92, in table_to_numpy_2d

	at org.jpy.PyLib.callAndReturnObject(Native Method)
	at org.jpy.PyObject.call(PyObject.java:449)
	at io.deephaven.integrations.python.PythonFunctionCaller.apply(PythonFunctionCaller.java:32)
	at io.deephaven.integrations.python.PythonFunctionCaller.apply(PythonFunctionCaller.java:15)
	at io.deephaven.integrations.learn.Future.gather(Future.java:81)
	at io.deephaven.integrations.learn.Future.get(Future.java:59)
	at io.deephaven.integrations.learn.Scatterer.scatter(Scatterer.java:39)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.applyFormulaPerItem(Formula.java:164)
	... 48 more



Line: 171
Namespace: learn
File: /opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/__init__.py
Traceback (most recent call last):
  File "<string>", line 18, in <module>
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/__init__.py", line 171, in learn

	at org.jpy.PyLib.executeCode(Native Method)
	at org.jpy.PyObject.executeCode(PyObject.java:138)
	at io.deephaven.engine.util.PythonEvaluatorJpy.evalScript(PythonEvaluatorJpy.java:73)
	at io.deephaven.integrations.python.PythonDeephavenSession.lambda$evaluate$1(PythonDeephavenSession.java:205)
	at io.deephaven.util.locks.FunctionalLock.doLockedInterruptibly(FunctionalLock.java:51)
	at io.deephaven.integrations.python.PythonDeephavenSession.evaluate(PythonDeephavenSession.java:205)
	at io.deephaven.engine.util.AbstractScriptSession.lambda$evaluateScript$0(AbstractScriptSession.java:148)
	at io.deephaven.engine.context.ExecutionContext.lambda$apply$0(ExecutionContext.java:196)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:207)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:195)
	at io.deephaven.engine.util.AbstractScriptSession.evaluateScript(AbstractScriptSession.java:148)
	at io.deephaven.engine.util.DelegatingScriptSession.evaluateScript(DelegatingScriptSession.java:72)
	at io.deephaven.engine.util.ScriptSession.evaluateScript(ScriptSession.java:75)
	at io.deephaven.server.console.ConsoleServiceGrpcImpl.lambda$executeCommand$4(ConsoleServiceGrpcImpl.java:191)
	at io.deephaven.server.session.SessionState$ExportBuilder.lambda$submit$3(SessionState.java:1519)
	at io.deephaven.server.session.SessionState$ExportObject.doExport(SessionState.java:992)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:97)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Additional details and attachments

Converting et to a NumPy array works when using deephaven.numpy.to_numpy.

Versions

Deephaven: 0.33.3
OS: OS X
Browser: Chrome
Docker: 20.10.13

The text was updated successfully, but these errors were encountered:

chipkent · 2024-04-24T18:09:17Z

The key error here is:

Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 83, in table_to_numpy_2d
    tensor = np.frombuffer(buffer, dtype=np_type)
ValueError: offset must be non-negative and no greater than buffer length (-1222967296)

This is being thrown by np.frombuffer. Other examples confirming the source can be seen at:
facebookresearch/metaseq#186
microsoft/ProphetNet#9

chipkent · 2024-04-24T18:14:05Z

Our problem is specifically happening at https://github.com/deephaven/deephaven-core/blob/main/py/server/deephaven/learn/gather.py#L83. We should do a test where the inputs to the method are validated and look correct.

If there is a DH bug, it is likely in io.deephaven.integrations.learn.gather.NumPy.tensorBuffer2D<x>.

chipkent · 2024-04-24T18:16:29Z

It could also be a jpy bug in the array conversion.

alexpeters1208 · 2024-04-24T18:19:10Z

I did some experiments that show the following:

When n_rows is cut in half to 16_000_000, learn succeeds.
When np_type = np.intc (int) and the random data function is changed appropriately, learn succeeds.
When np_type = np.intc (int), n_rows is doubled to 64_000_000, and the random data function is changed appropriately, learn fails.
When np_type = np.single (float) and the random data function is changed appropriately, learn succeeds.
When np_type = np.single (float), n_rows is doubled to 64_000_000, and the random data function is changed appropriately, learn fails.
When np_type = np.int_ (long) and the random data function is changed appropriately, learn fails.
When np_type = np.int_ (long), n_rows is cut in half to 16_000_000 and the random data function is changed appropriately, learn succeeds.

chipkent · 2024-04-24T18:23:49Z

This may be the numpy code: https://github.com/numpy/numpy/blob/main/numpy/_core/src/multiarray/ctors.c#L3756

chipkent · 2024-04-24T18:31:21Z

New experiment. I think this may be related to int32 indexing limits in some bit of code. The int32 max value is 2,147,483,647.

To test this theory, consider an 8byte type like long or double.

Does the error occur with floor(2,147,483,647/3/8)=89478485 rows and 3 cols?
Does the error occur with ciel(2,147,483,647/3/8)=89478486 rows and 3 cols?

Consider a 4byte type like int or float.

Does the error occur with floor(2,147,483,647/3/4)=178956970 rows and 3 cols?
Does the error occur with ciel(2,147,483,647/3/4)=178956971 rows and 3 cols?

chipkent · 2024-04-24T18:40:18Z

Here is where the error is raised. https://github.com/numpy/numpy/blob/main/numpy/_core/src/multiarray/ctors.c#L3819

chipkent · 2024-04-24T19:01:31Z

Reading through the NumPy C implementation: https://github.com/numpy/numpy/blob/main/numpy/_core/src/multiarray/ctors.c#L3756

Key variables are:

Py_ssize_t ts
npy_intp offset

Py_ssize_t is defined by PEP353 to be:

A new type Py_ssize_t is introduced, which has the same size as the compiler’s size_t type, but is signed. It will be a typedef for ssize_t where available.

size_t is a uint64 according to https://www.geeksforgeeks.org/size_t-data-type-c-language/. ssize_t is larger than ssize_t where it can hold all values of size_t plus a negative value for errors.

npy_intp is also defined as Py_ssize_t in https://github.com/numpy/numpy/blob/main/numpy/_core/include/numpy/npy_common.h#L201.

The key point of failure is:

        PyErr_Format(PyExc_ValueError,
                     "offset must be non-negative and no greater than buffer "\
                     "length (%" NPY_INTP_FMT ")", (npy_intp)ts);

If the information on types is correct, the (npy_intp) cast does nothing, and ts contains a bad value.
ts is assigned from a buffer view via:

    ts = view.len;

The view is generated using:

    if (PyObject_GetBuffer(buf, &view, PyBUF_WRITABLE|PyBUF_SIMPLE) < 0) {
        writeable = 0;
        PyErr_Clear();
        if (PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE) < 0) {
            Py_DECREF(buf);
            Py_DECREF(type);
            return NULL;
        }
    }

Together, this makes me wonder if jpy has a bug in PyObject_GetBuffer for large arrays. @jmao-denver

chipkent · 2024-04-24T19:28:22Z

Testing the 4 cases above with:

from deephaven.learn import gather, learn, Input, Output
from deephaven import empty_table
import numpy as np

n_cols = 3
n_rows = 89478485

et = empty_table(n_rows).update([f"X{idx} = randomDouble(0.0, 10.0)" for idx in range(3)])

def model(features):
    return np.max(features)

def t_to_np(rows, cols):
    return gather.table_to_numpy_2d(rows, cols, np_type=np.double)

def np_to_t(data, idx):
    return data

t = learn(
    table=et,
    model_func=model,
    inputs=[Input([f"X{idx}" for idx in range(3)], t_to_np)],
    outputs=[Output("Y", np_to_t, "double")],
    batch_size=n_rows
)

To test this theory, consider an 8byte type like long or double.

Does the error occur with floor(2,147,483,647/3/8)=89478485 rows and 3 cols? WORKS
Does the error occur with ciel(2,147,483,647/3/8)=89478486 rows and 3 cols? FAILS

Consider a 4byte type like int or float:

Does the error occur with floor(2,147,483,647/3/4)=178956970 rows and 3 cols? WORKS
Does the error occur with ciel(2,147,483,647/3/4)=178956971 rows and 3 cols? FAILS

jmao-denver · 2024-05-07T20:00:30Z

fixed by jpy-consortium/jpy#145

jjbrosnan added bug Something isn't working triage user-reported devrel-watch DevRel team is watching labels Apr 24, 2024

chipkent mentioned this issue Apr 24, 2024

Large arrays create incorrect buffers jpy-consortium/jpy#143

Closed

jmao-denver self-assigned this Apr 30, 2024

jmao-denver added this to the 2. April 2024 milestone Apr 30, 2024

jmao-denver removed the triage label Apr 30, 2024

jmao-denver mentioned this issue Apr 30, 2024

Use Py_ssize_t when calculate buffer len jpy-consortium/jpy#145

Merged

jmao-denver closed this as completed May 7, 2024

chipkent removed the devrel-watch DevRel team is watching label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`deephaven.learn` fails when converting sufficiently large data #5403

`deephaven.learn` fails when converting sufficiently large data #5403

jjbrosnan commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

alexpeters1208 commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

jmao-denver commented May 7, 2024

deephaven.learn fails when converting sufficiently large data #5403

deephaven.learn fails when converting sufficiently large data #5403

Comments

jjbrosnan commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

alexpeters1208 commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

chipkent commented Apr 24, 2024

jmao-denver commented May 7, 2024

`deephaven.learn` fails when converting sufficiently large data #5403

`deephaven.learn` fails when converting sufficiently large data #5403