Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deephaven.learn fails when converting sufficiently large data #5403

Closed
jjbrosnan opened this issue Apr 24, 2024 · 10 comments
Closed

deephaven.learn fails when converting sufficiently large data #5403

jjbrosnan opened this issue Apr 24, 2024 · 10 comments
Assignees
Labels
bug Something isn't working user-reported
Milestone

Comments

@jjbrosnan
Copy link
Contributor

Description

When trying to convert sufficiently large data to a NumPy array using deephaven.learn, an error gets raised complaining about a negative offset and buffer length.

Steps to reproduce

from deephaven.learn import gather, learn, Input, Output
from deephaven import empty_table
import numpy as np

n_rows = 32_000_000

et = empty_table(n_rows).update([f"X{idx} = randomDouble(0.0, 10.0)" for idx in range(1, 13)])

def model(features):
    return np.max(features)

def t_to_np(rows, cols):
    return gather.table_to_numpy_2d(rows, cols, np_type=np.double)

def np_to_t(data, idx):
    return data

t = learn(
    table=et,
    model_func=model,
    inputs=[Input([f"X{idx}" for idx in range(1, 13)], t_to_np)],
    outputs=[Output("Y", np_to_t, "double")],
    batch_size=n_rows
)

Expected results

The query to run successfully.

Actual results

The following error w/ stack trace:

r-Scheduler-Serial-1 | .c.ConsoleServiceGrpcImpl | Error running script: java.lang.RuntimeException: Error in Python interpreter:
Type: <class 'deephaven.dherror.DHError'>
Value: failed to complete the learn function. : ValueError: offset must be non-negative and no greater than buffer length (-1222967296)
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/table.py", line 764, in update
    return Table(j_table=self.j_table.update(*formulas))
RuntimeError: io.deephaven.engine.exceptions.TableInitializationException: Error while initializing Update([Y]): an exception occurred while performing the initial select or update
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$29(QueryTable.java:1476)
	at io.deephaven.engine.table.impl.perf.QueryPerformanceRecorder.withNugget(QueryPerformanceRecorder.java:369)
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$30(QueryTable.java:1413)
	at io.deephaven.engine.table.impl.QueryTable.memoizeResult(QueryTable.java:3490)
	at io.deephaven.engine.table.impl.QueryTable.selectOrUpdate(QueryTable.java:1412)
	at io.deephaven.engine.table.impl.QueryTable.update(QueryTable.java:1390)
	at io.deephaven.engine.table.impl.QueryTable.update(QueryTable.java:95)
	at io.deephaven.api.TableOperationsDefaults.update(TableOperationsDefaults.java:94)
	at org.jpy.PyLib.executeCode(Native Method)
	at org.jpy.PyObject.executeCode(PyObject.java:138)
	at io.deephaven.engine.util.PythonEvaluatorJpy.evalScript(PythonEvaluatorJpy.java:73)
	at io.deephaven.integrations.python.PythonDeephavenSession.lambda$evaluate$1(PythonDeephavenSession.java:205)
	at io.deephaven.util.locks.FunctionalLock.doLockedInterruptibly(FunctionalLock.java:51)
	at io.deephaven.integrations.python.PythonDeephavenSession.evaluate(PythonDeephavenSession.java:205)
	at io.deephaven.engine.util.AbstractScriptSession.lambda$evaluateScript$0(AbstractScriptSession.java:148)
	at io.deephaven.engine.context.ExecutionContext.lambda$apply$0(ExecutionContext.java:196)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:207)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:195)
	at io.deephaven.engine.util.AbstractScriptSession.evaluateScript(AbstractScriptSession.java:148)
	at io.deephaven.engine.util.DelegatingScriptSession.evaluateScript(DelegatingScriptSession.java:72)
	at io.deephaven.engine.util.ScriptSession.evaluateScript(ScriptSession.java:75)
	at io.deephaven.server.console.ConsoleServiceGrpcImpl.lambda$executeCommand$4(ConsoleServiceGrpcImpl.java:191)
	at io.deephaven.server.session.SessionState$ExportBuilder.lambda$submit$3(SessionState.java:1519)
	at io.deephaven.server.session.SessionState$ExportObject.doExport(SessionState.java:992)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:97)
	at java.base/java.lang.Thread.run(Thread.java:1583)
caused by io.deephaven.engine.table.impl.select.FormulaEvaluationException: In formula: Y = doublePyCast((PyObject) (__scatterer.scatter(0, __FutureOffset)))
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.applyFormulaPerItem(Formula.java:166)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.lambda$fillChunkHelper$4(Formula.java:155)
	at io.deephaven.engine.rowset.RowSequence.lambda$forAllRowKeys$0(RowSequence.java:175)
	at io.deephaven.engine.rowset.impl.singlerange.SingleRangeMixin.forEachRowKey(SingleRangeMixin.java:17)
	at io.deephaven.engine.rowset.RowSequence.forAllRowKeys(RowSequence.java:174)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.fillChunkHelper(Formula.java:153)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.fillChunk(Formula.java:130)
	at io.deephaven.engine.table.impl.sources.ViewColumnSource.fillChunk(ViewColumnSource.java:219)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.doApplyUpdate(SelectColumnLayer.java:412)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.lambda$doSerialApplyUpdate$2(SelectColumnLayer.java:264)
	at io.deephaven.engine.util.systemicmarking.SystemicObjectTracker.executeSystemically(SystemicObjectTracker.java:56)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.doSerialApplyUpdate(SelectColumnLayer.java:263)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.lambda$onAllRequiredColumnsCompleted$1(SelectColumnLayer.java:212)
	at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.lambda$submit$0(ImmediateJobScheduler.java:37)
	at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.submit(ImmediateJobScheduler.java:51)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.onAllRequiredColumnsCompleted(SelectColumnLayer.java:210)
	at io.deephaven.engine.table.impl.select.analyzers.SelectAndViewAnalyzer$SelectLayerCompletionHandler.onLayerCompleted(SelectAndViewAnalyzer.java:627)
	at io.deephaven.engine.table.impl.select.analyzers.BaseLayer.applyUpdate(BaseLayer.java:76)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.applyUpdate(SelectColumnLayer.java:151)
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$29(QueryTable.java:1463)
	... 29 more
caused by java.lang.RuntimeException: Error in Python interpreter:
Type: <class 'deephaven.dherror.DHError'>
Value: failed to convert rows: {0-31999999} and cols: [Lio.deephaven.engine.table.ColumnSource;@4372ccd to a 2D NumPy array : ValueError: offset must be non-negative and no greater than buffer length (-1222967296)
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 83, in table_to_numpy_2d
    tensor = np.frombuffer(buffer, dtype=np_type)
ValueError: offset must be non-negative and no greater than buffer length (-1222967296)

Line: 92
Namespace: table_to_numpy_2d
File: /opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py
Traceback (most recent call last):
  File "<string>", line 13, in t_to_np
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 92, in table_to_numpy_2d

	at org.jpy.PyLib.callAndReturnObject(Native Method)
	at org.jpy.PyObject.call(PyObject.java:449)
	at io.deephaven.integrations.python.PythonFunctionCaller.apply(PythonFunctionCaller.java:32)
	at io.deephaven.integrations.python.PythonFunctionCaller.apply(PythonFunctionCaller.java:15)
	at io.deephaven.integrations.learn.Future.gather(Future.java:81)
	at io.deephaven.integrations.learn.Future.get(Future.java:59)
	at io.deephaven.integrations.learn.Scatterer.scatter(Scatterer.java:39)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.applyFormulaPerItem(Formula.java:164)
	... 48 more


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/__init__.py", line 148, in learn
    return (table
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/table.py", line 766, in update
    raise DHError(e, "table update operation failed.") from e
deephaven.dherror.DHError: table update operation failed. : ValueError: offset must be non-negative and no greater than buffer length (-1222967296)
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/table.py", line 764, in update
    return Table(j_table=self.j_table.update(*formulas))
RuntimeError: io.deephaven.engine.exceptions.TableInitializationException: Error while initializing Update([Y]): an exception occurred while performing the initial select or update
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$29(QueryTable.java:1476)
	at io.deephaven.engine.table.impl.perf.QueryPerformanceRecorder.withNugget(QueryPerformanceRecorder.java:369)
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$30(QueryTable.java:1413)
	at io.deephaven.engine.table.impl.QueryTable.memoizeResult(QueryTable.java:3490)
	at io.deephaven.engine.table.impl.QueryTable.selectOrUpdate(QueryTable.java:1412)
	at io.deephaven.engine.table.impl.QueryTable.update(QueryTable.java:1390)
	at io.deephaven.engine.table.impl.QueryTable.update(QueryTable.java:95)
	at io.deephaven.api.TableOperationsDefaults.update(TableOperationsDefaults.java:94)
	at org.jpy.PyLib.executeCode(Native Method)
	at org.jpy.PyObject.executeCode(PyObject.java:138)
	at io.deephaven.engine.util.PythonEvaluatorJpy.evalScript(PythonEvaluatorJpy.java:73)
	at io.deephaven.integrations.python.PythonDeephavenSession.lambda$evaluate$1(PythonDeephavenSession.java:205)
	at io.deephaven.util.locks.FunctionalLock.doLockedInterruptibly(FunctionalLock.java:51)
	at io.deephaven.integrations.python.PythonDeephavenSession.evaluate(PythonDeephavenSession.java:205)
	at io.deephaven.engine.util.AbstractScriptSession.lambda$evaluateScript$0(AbstractScriptSession.java:148)
	at io.deephaven.engine.context.ExecutionContext.lambda$apply$0(ExecutionContext.java:196)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:207)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:195)
	at io.deephaven.engine.util.AbstractScriptSession.evaluateScript(AbstractScriptSession.java:148)
	at io.deephaven.engine.util.DelegatingScriptSession.evaluateScript(DelegatingScriptSession.java:72)
	at io.deephaven.engine.util.ScriptSession.evaluateScript(ScriptSession.java:75)
	at io.deephaven.server.console.ConsoleServiceGrpcImpl.lambda$executeCommand$4(ConsoleServiceGrpcImpl.java:191)
	at io.deephaven.server.session.SessionState$ExportBuilder.lambda$submit$3(SessionState.java:1519)
	at io.deephaven.server.session.SessionState$ExportObject.doExport(SessionState.java:992)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:97)
	at java.base/java.lang.Thread.run(Thread.java:1583)
caused by io.deephaven.engine.table.impl.select.FormulaEvaluationException: In formula: Y = doublePyCast((PyObject) (__scatterer.scatter(0, __FutureOffset)))
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.applyFormulaPerItem(Formula.java:166)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.lambda$fillChunkHelper$4(Formula.java:155)
	at io.deephaven.engine.rowset.RowSequence.lambda$forAllRowKeys$0(RowSequence.java:175)
	at io.deephaven.engine.rowset.impl.singlerange.SingleRangeMixin.forEachRowKey(SingleRangeMixin.java:17)
	at io.deephaven.engine.rowset.RowSequence.forAllRowKeys(RowSequence.java:174)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.fillChunkHelper(Formula.java:153)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.fillChunk(Formula.java:130)
	at io.deephaven.engine.table.impl.sources.ViewColumnSource.fillChunk(ViewColumnSource.java:219)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.doApplyUpdate(SelectColumnLayer.java:412)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.lambda$doSerialApplyUpdate$2(SelectColumnLayer.java:264)
	at io.deephaven.engine.util.systemicmarking.SystemicObjectTracker.executeSystemically(SystemicObjectTracker.java:56)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.doSerialApplyUpdate(SelectColumnLayer.java:263)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.lambda$onAllRequiredColumnsCompleted$1(SelectColumnLayer.java:212)
	at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.lambda$submit$0(ImmediateJobScheduler.java:37)
	at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.submit(ImmediateJobScheduler.java:51)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.onAllRequiredColumnsCompleted(SelectColumnLayer.java:210)
	at io.deephaven.engine.table.impl.select.analyzers.SelectAndViewAnalyzer$SelectLayerCompletionHandler.onLayerCompleted(SelectAndViewAnalyzer.java:627)
	at io.deephaven.engine.table.impl.select.analyzers.BaseLayer.applyUpdate(BaseLayer.java:76)
	at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.applyUpdate(SelectColumnLayer.java:151)
	at io.deephaven.engine.table.impl.QueryTable.lambda$selectOrUpdate$29(QueryTable.java:1463)
	... 29 more
caused by java.lang.RuntimeException: Error in Python interpreter:
Type: <class 'deephaven.dherror.DHError'>
Value: failed to convert rows: {0-31999999} and cols: [Lio.deephaven.engine.table.ColumnSource;@4372ccd to a 2D NumPy array : ValueError: offset must be non-negative and no greater than buffer length (-1222967296)
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 83, in table_to_numpy_2d
    tensor = np.frombuffer(buffer, dtype=np_type)
ValueError: offset must be non-negative and no greater than buffer length (-1222967296)

Line: 92
Namespace: table_to_numpy_2d
File: /opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py
Traceback (most recent call last):
  File "<string>", line 13, in t_to_np
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 92, in table_to_numpy_2d

	at org.jpy.PyLib.callAndReturnObject(Native Method)
	at org.jpy.PyObject.call(PyObject.java:449)
	at io.deephaven.integrations.python.PythonFunctionCaller.apply(PythonFunctionCaller.java:32)
	at io.deephaven.integrations.python.PythonFunctionCaller.apply(PythonFunctionCaller.java:15)
	at io.deephaven.integrations.learn.Future.gather(Future.java:81)
	at io.deephaven.integrations.learn.Future.get(Future.java:59)
	at io.deephaven.integrations.learn.Scatterer.scatter(Scatterer.java:39)
	at io.deephaven.temp.c_a0a81b97cdde7df53eac6e4d23f297c788d36d08b417333327f384c7994d8662v65_0.Formula.applyFormulaPerItem(Formula.java:164)
	... 48 more



Line: 171
Namespace: learn
File: /opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/__init__.py
Traceback (most recent call last):
  File "<string>", line 18, in <module>
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/__init__.py", line 171, in learn

	at org.jpy.PyLib.executeCode(Native Method)
	at org.jpy.PyObject.executeCode(PyObject.java:138)
	at io.deephaven.engine.util.PythonEvaluatorJpy.evalScript(PythonEvaluatorJpy.java:73)
	at io.deephaven.integrations.python.PythonDeephavenSession.lambda$evaluate$1(PythonDeephavenSession.java:205)
	at io.deephaven.util.locks.FunctionalLock.doLockedInterruptibly(FunctionalLock.java:51)
	at io.deephaven.integrations.python.PythonDeephavenSession.evaluate(PythonDeephavenSession.java:205)
	at io.deephaven.engine.util.AbstractScriptSession.lambda$evaluateScript$0(AbstractScriptSession.java:148)
	at io.deephaven.engine.context.ExecutionContext.lambda$apply$0(ExecutionContext.java:196)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:207)
	at io.deephaven.engine.context.ExecutionContext.apply(ExecutionContext.java:195)
	at io.deephaven.engine.util.AbstractScriptSession.evaluateScript(AbstractScriptSession.java:148)
	at io.deephaven.engine.util.DelegatingScriptSession.evaluateScript(DelegatingScriptSession.java:72)
	at io.deephaven.engine.util.ScriptSession.evaluateScript(ScriptSession.java:75)
	at io.deephaven.server.console.ConsoleServiceGrpcImpl.lambda$executeCommand$4(ConsoleServiceGrpcImpl.java:191)
	at io.deephaven.server.session.SessionState$ExportBuilder.lambda$submit$3(SessionState.java:1519)
	at io.deephaven.server.session.SessionState$ExportObject.doExport(SessionState.java:992)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:97)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Additional details and attachments

Converting et to a NumPy array works when using deephaven.numpy.to_numpy.

Versions

  • Deephaven: 0.33.3
  • OS: OS X
  • Browser: Chrome
  • Docker: 20.10.13
@jjbrosnan jjbrosnan added bug Something isn't working triage user-reported devrel-watch DevRel team is watching labels Apr 24, 2024
@chipkent
Copy link
Member

The key error here is:

Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/learn/gather.py", line 83, in table_to_numpy_2d
    tensor = np.frombuffer(buffer, dtype=np_type)
ValueError: offset must be non-negative and no greater than buffer length (-1222967296)

This is being thrown by np.frombuffer. Other examples confirming the source can be seen at:
facebookresearch/metaseq#186
microsoft/ProphetNet#9

@chipkent
Copy link
Member

Our problem is specifically happening at https://github.com/deephaven/deephaven-core/blob/main/py/server/deephaven/learn/gather.py#L83. We should do a test where the inputs to the method are validated and look correct.

If there is a DH bug, it is likely in io.deephaven.integrations.learn.gather.NumPy.tensorBuffer2D<x>.

@chipkent
Copy link
Member

It could also be a jpy bug in the array conversion.

@alexpeters1208
Copy link
Contributor

I did some experiments that show the following:

  1. When n_rows is cut in half to 16_000_000, learn succeeds.
  2. When np_type = np.intc (int) and the random data function is changed appropriately, learn succeeds.
  3. When np_type = np.intc (int), n_rows is doubled to 64_000_000, and the random data function is changed appropriately, learn fails.
  4. When np_type = np.single (float) and the random data function is changed appropriately, learn succeeds.
  5. When np_type = np.single (float), n_rows is doubled to 64_000_000, and the random data function is changed appropriately, learn fails.
  6. When np_type = np.int_ (long) and the random data function is changed appropriately, learn fails.
  7. When np_type = np.int_ (long), n_rows is cut in half to 16_000_000 and the random data function is changed appropriately, learn succeeds.

@chipkent
Copy link
Member

@chipkent
Copy link
Member

New experiment. I think this may be related to int32 indexing limits in some bit of code. The int32 max value is 2,147,483,647.

To test this theory, consider an 8byte type like long or double.

  • Does the error occur with floor(2,147,483,647/3/8)=89478485 rows and 3 cols?
  • Does the error occur with ciel(2,147,483,647/3/8)=89478486 rows and 3 cols?

Consider a 4byte type like int or float.

  • Does the error occur with floor(2,147,483,647/3/4)=178956970 rows and 3 cols?
  • Does the error occur with ciel(2,147,483,647/3/4)=178956971 rows and 3 cols?

@chipkent
Copy link
Member

@chipkent
Copy link
Member

Reading through the NumPy C implementation: https://github.com/numpy/numpy/blob/main/numpy/_core/src/multiarray/ctors.c#L3756

Key variables are:

  • Py_ssize_t ts
  • npy_intp offset

Py_ssize_t is defined by PEP353 to be:

A new type Py_ssize_t is introduced, which has the same size as the compiler’s size_t type, but is signed. It will be a typedef for ssize_t where available.

size_t is a uint64 according to https://www.geeksforgeeks.org/size_t-data-type-c-language/. ssize_t is larger than ssize_t where it can hold all values of size_t plus a negative value for errors.

npy_intp is also defined as Py_ssize_t in https://github.com/numpy/numpy/blob/main/numpy/_core/include/numpy/npy_common.h#L201.

The key point of failure is:

        PyErr_Format(PyExc_ValueError,
                     "offset must be non-negative and no greater than buffer "\
                     "length (%" NPY_INTP_FMT ")", (npy_intp)ts);

If the information on types is correct, the (npy_intp) cast does nothing, and ts contains a bad value.
ts is assigned from a buffer view via:

    ts = view.len;

The view is generated using:

    if (PyObject_GetBuffer(buf, &view, PyBUF_WRITABLE|PyBUF_SIMPLE) < 0) {
        writeable = 0;
        PyErr_Clear();
        if (PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE) < 0) {
            Py_DECREF(buf);
            Py_DECREF(type);
            return NULL;
        }
    }

Together, this makes me wonder if jpy has a bug in PyObject_GetBuffer for large arrays. @jmao-denver

@chipkent
Copy link
Member

Testing the 4 cases above with:

from deephaven.learn import gather, learn, Input, Output
from deephaven import empty_table
import numpy as np

n_cols = 3
n_rows = 89478485

et = empty_table(n_rows).update([f"X{idx} = randomDouble(0.0, 10.0)" for idx in range(3)])

def model(features):
    return np.max(features)

def t_to_np(rows, cols):
    return gather.table_to_numpy_2d(rows, cols, np_type=np.double)

def np_to_t(data, idx):
    return data

t = learn(
    table=et,
    model_func=model,
    inputs=[Input([f"X{idx}" for idx in range(3)], t_to_np)],
    outputs=[Output("Y", np_to_t, "double")],
    batch_size=n_rows
)

To test this theory, consider an 8byte type like long or double.

  • Does the error occur with floor(2,147,483,647/3/8)=89478485 rows and 3 cols? WORKS
  • Does the error occur with ciel(2,147,483,647/3/8)=89478486 rows and 3 cols? FAILS

Consider a 4byte type like int or float:

  • Does the error occur with floor(2,147,483,647/3/4)=178956970 rows and 3 cols? WORKS
  • Does the error occur with ciel(2,147,483,647/3/4)=178956971 rows and 3 cols? FAILS

@jmao-denver
Copy link
Contributor

fixed by jpy-consortium/jpy#145

@chipkent chipkent removed the devrel-watch DevRel team is watching label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working user-reported
Projects
None yet
Development

No branches or pull requests

4 participants