Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_add_actions(False) Throws index out of bounds #1579

Closed
aersam opened this issue Aug 1, 2023 · 6 comments
Closed

get_add_actions(False) Throws index out of bounds #1579

aersam opened this issue Aug 1, 2023 · 6 comments
Labels
binding/python Issues for the Python package bug Something isn't working

Comments

@aersam
Copy link
Contributor

aersam commented Aug 1, 2023

Environment

Delta-rs version: 0.10.1

Binding: Python

Environment:

  • Cloud provider: None
  • OS: Windows, could also repo on Linux (Azure Websites)
  • Other:

Bug

What happened:

import deltalake

d = deltalake.DeltaTable("data/delta/mail/path")
print(d.get_add_actions(False)) 

throws:

thread '<unnamed>' panicked at 'index out of bounds: the len is 0 but the index is 0', rust\src\table_state_arrow.rs:494:39
stack backtrace:
   0:     0x7ffde2af5ebc - BrotliDecoderVersion
   1:     0x7ffde2b1f38b - BrotliDecoderVersion
   2:     0x7ffde2af1069 - BrotliDecoderVersion
   3:     0x7ffde2af5c6b - BrotliDecoderVersion
   4:     0x7ffde2af8729 - BrotliDecoderVersion
   5:     0x7ffde2af83df - BrotliDecoderVersion
   6:     0x7ffde2af8c2e - BrotliDecoderVersion
   7:     0x7ffde2af8b1d - BrotliDecoderVersion
   8:     0x7ffde2af6ae9 - BrotliDecoderVersion
   9:     0x7ffde2af8820 - BrotliDecoderVersion
  10:     0x7ffde2c52965 - BrotliDecoderVersion
  11:     0x7ffde2c52ae4 - BrotliDecoderVersion
  12:     0x7ffde01ddcb2 - PyInit__internal
  13:     0x7ffddfd1f43f - BrotliDecoderSetParameter
  14:     0x7ffddfd0d7be - BrotliDecoderSetParameter
  15:     0x7ffddfd16731 - BrotliDecoderSetParameter
  16:     0x7ffe19ad6282 - PyUnicode_ToDecimalDigit
  17:     0x7ffe19a4f52b - PyObject_Vectorcall
  18:     0x7ffe19a508d4 - PyEval_EvalFrameDefault
  19:     0x7ffe19acbbd3 - PyMapping_Check
  20:     0x7ffe19acb453 - PyEval_EvalCode
  21:     0x7ffe19aef83e - PyArena_Free
  22:     0x7ffe19aef7ba - PyArena_Free
  23:     0x7ffe19be4666 - PyThread_tss_is_created
  24:     0x7ffe19b3ac89 - PyRun_SimpleFileObject
  25:     0x7ffe19b71a18 - PyRun_AnyFileObject
  26:     0x7ffe19b7165b - PySys_GetSizeOf
  27:     0x7ffe19b71517 - PySys_GetSizeOf
  28:     0x7ffe19b0442c - Py_RunMain
  29:     0x7ffe19b042bd - Py_RunMain
  30:     0x7ffe19a8e76d - Py_Main
  31:     0x7ff7a1371230 - <unknown>
  32:     0x7ffe970f7614 - BaseThreadInitThunk
  33:     0x7ffe984e26b1 - RtlUserThreadStart
Traceback (most recent call last):
  File "c:\Projects\DataHub\DataAnalyticsAPI\test.py", line 4, in <module>
    print(d.get_add_actions(False))
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\DataHub\DataAnalyticsAPI\.venv\Lib\site-packages\deltalake\table.py", line 612, in get_add_actions
    return self._table.get_add_actions(flatten)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: index out of bounds: the len is 0 but the index is 0

The same does not happen with flatten=True

What you expected to happen:
Well, it should work ;)

@wjones127
Copy link
Collaborator

@aersam thanks for reporting. Is this table empty?

@aersam
Copy link
Contributor Author

aersam commented Aug 1, 2023

No. But it has column mapping, could that be the issue? The stats contains stats for the physical column names, it seems. I do not query the table using delta-rs, but I need the metadata from it (I use the metadata to generate some sql for a system that supports parquet but not delta)

@wjones127
Copy link
Collaborator

But it has column mapping, could that be the issue?

Ah that's likely. Hasn't been tested at all with tables that use features we don't support in our readers or writers.

@aersam
Copy link
Contributor Author

aersam commented Aug 1, 2023

Ok, thats sad ;) I can live with it for now as it works with flatten=True, however I think there are valid use cases for using delta-rs mostly for metadata. Eg, I could completely write a duckdb view with column mapping and deletion vectors support by just having the metadata complete from delta-rs.

@wjones127
Copy link
Collaborator

I think there are valid use cases for using delta-rs mostly for metadata

I don't disagree. We might be able to fix this function to work for a wider range of tables.

@rtyler rtyler added the binding/python Issues for the Python package label Sep 15, 2023
@aersam
Copy link
Contributor Author

aersam commented Nov 10, 2023

Closing because now #1835 is a new error with 0.13

@aersam aersam closed this as completed Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants