Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream KeyError: ('awkward_ListOffsetArray_reduce_local_nextparents_64', <class 'numpy.int64'>, <class 'numpy.int32'>) #307

Closed
douglasdavis opened this issue Jul 5, 2023 · 15 comments

Comments

@douglasdavis
Copy link
Collaborator

Problem surfaces with awkward==2.3.0

failures:

FAILED tests/test_structure.py::test_argsort[True] - KeyError: ('awkward_ListOffsetArray_reduce_local_nextparents_64', <class 'numpy.int64'>, <class 'numpy.int32'>)
FAILED tests/test_structure.py::test_argsort[False] - KeyError: ('awkward_ListOffsetArray_reduce_local_nextparents_64', <class 'numpy.int64'>, <class 'numpy.int32'>)
FAILED tests/test_structure.py::test_sort[True] - KeyError: ('awkward_ListOffsetArray_reduce_local_nextparents_64', <class 'numpy.int64'>, <class 'numpy.int32'>)
FAILED tests/test_structure.py::test_sort[False] - KeyError: ('awkward_ListOffsetArray_reduce_local_nextparents_64', <class 'numpy.int64'>, <class 'numpy.int32'>)

Some kind of dtype mismatch it looks like? I'll look at the awkward changelog to see if I can find anythin but Cc @agoose77 for visibility!

@douglasdavis
Copy link
Collaborator Author

Looks like the paired awkward-cpp version has an impact as well. I'm actually having a confusing time trying to force installing explicit combinations of awkward and awkward-cpp that either always pass or always fail but I'm getting sometimes passing and sometimes failing 🤷

@agoose77
Copy link
Collaborator

agoose77 commented Jul 5, 2023

I think this counts as an awkward bug, because we don't define any overloads for non int-64 types. We should probably do that. I'm not sure why this suddenly happens since 2.2.x, but something is producing an Index32 instead of Index64, triggering this pre-existing bug. It looks like whatever is in the Parquet file. We might have changed something related to this upstream, but nothing comes to mind.

@martindurant
Copy link
Collaborator

Whether to make int32 or int64 indexes is an options when converting ak to arrow. Perhaps the default changed?

@jpivarski
Copy link
Collaborator

Those KeyErrors are definitely awkward-cpp/awkward mismatches. (They're missing kernel name-signatures, and some kernels have recently been removed.) When Awkward comes from pip or conda, pip or conda must enforce the version match.

I install Awkward from a git-cloned repo, so it's not as easy, and maybe you're doing the same thing. My background package manager is conda (Mambaforge), and I minimize the number of packages installed by pip into this environment, but Awkward is one of them.

Step 1: make sure there are no old awkward or awkward-cpp packages from old versions.

pip uninstall awkward awkward-cpp   # multiple times; each time removes only one

Step 2: with the latest git pull, redo the installation for developer steps.

nox -s prepare
pip install -v ./awkward-cpp
pip install -e .

Just clearing it out and reinstalling it whenever I think there's an awkward-cpp change gets my environment to a state where it works again.

I think this counts as an awkward bug, because we don't define any overloads for non int-64 types.

Not necessarily: output Index types are always int64, but inputs may be specialized to int32, uint32, int64. It may be that this is just the first specialization to fail in the test.

@jpivarski
Copy link
Collaborator

Whether to make int32 or int64 indexes is an options when converting ak to arrow. Perhaps the default changed?

Arrow has traditionally favored int32 over int64. If they changed, it would be in a direction that would make us less likely to error (since we default to int64).

@douglasdavis
Copy link
Collaborator Author

Yea we're getting an i32 offsets in reading parquet; here's some debugging info from dak.from_parquet:

----------------------------------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------------------------------
DEBUG    dask_awkward.lib.io.parquet:parquet.py:123 project_columns received: ['points.x']
DEBUG    dask_awkward.lib.io.parquet:parquet.py:124 new schema is RecordForm([ListOffsetForm('i32', RecordForm([NumpyForm('int64')], ['x']))], ['points'])
DEBUG    dask_awkward.lib.io.parquet:parquet.py:125 new schema columns are: ['points.list.item.x']
DEBUG    dask_awkward.lib.io.parquet:parquet.py:126 
FromParquetFn(
  schema=RecordForm([ListOffsetForm('i32', RecordForm([NumpyForm('int64')], ['x']))], ['points'])
  listsep=list.item
  unnamed_root=False
  columns=['points.list.item.x']

@douglasdavis
Copy link
Collaborator Author

douglasdavis commented Jul 5, 2023

perhaps these arguments which of ..._to32=True should be set to False instead? (this is when we write to parquet and dask_awkward's test suite is using data generated by using its own to_parquet)

t = to_arrow_table(
data,
list_to32=True,
string_to32=True,
bytestring_to32=True,
categorical_as_dictionary=True,
extensionarray=False,

@jpivarski
Copy link
Collaborator

Okay, yeah: @agoose77 is right. awkward_ListOffsetArray_reduce_local_nextparents_64 never had any specializations in which the second array type is int32. (The list of specializations in main-v1 is the same as those in main.) The int32 coming from Parquet should have been identified and converted before reaching this kernel.

@jpivarski
Copy link
Collaborator

to_parquet should be parameterized by list_to32 etc. and pass them through, so that users get to choose whether to push Awkward's int64 arrays through Arrow/Parquet or convert. But it has to work either way: if from_parquet gets int32, it has to deal with it correctly.

@jpivarski
Copy link
Collaborator

I have an up-to-date Awkward, did mamba update --all, and I'm using dask-awkward directly from main (pip install -e .), and removed one of the xfail marks to reproduce the error.

Something (could be pytest) is taking away the most relevant part of the stack trace:

src/dask_awkward/lib/structure.py:167: in __call__
    return ak.argsort(array, **self.kwargs)
        array      = <Array [[6, 9, 9], [], [4, 4], [2], [5, 2, 7]] type='5 * var * int64'>
        self       = <dask_awkward.lib.structure._ArgsortFn object at 0x7f58e7433520>
../awkward/src/awkward/_dispatch.py:51: in dispatch
    return gen_or_result
        args       = (<Array [[6, 9, 9], [], [4, 4], [2], [5, 2, 7]] type='5 * var * int64'>,)
        array_like = <Array [[6, 9, 9], [], [4, 4], [2], [5, 2, 7]] type='5 * var * int64'>
        array_likes = (<Array [[6, 9, 9], [], [4, 4], [2], [5, 2, 7]] type='5 * var * int64'>,)
        dispatch   = <function argsort at 0x7f5932e8f940>
        func       = <function argsort at 0x7f5932e8f8b0>
        gen_or_result = <generator object argsort at 0x7f59200575f0>
        kwargs     = {'ascending': False, 'axis': -1, 'behavior': None, 'stable': True}
../awkward/src/awkward/_errors.py:63: in __exit__
    self.handle_exception(exception_type, exception_value)
        exception_type = <class 'KeyError'>
        exception_value = KeyError(('awkward_ListOffsetArray_reduce_local_nextparents_64', <class 'numpy.int64'>, <class 'numpy.int32'>))
        self       = <awkward._errors.OperationErrorContext object at 0x7f59041a2310>
        traceback  = <traceback object at 0x7f58e76c5940>

It goes from dask-awkward to awkward/_dispatch.py to awkward/errors.py without showing me the step where it's failing in ak.argsort. It does execute ak.argsort, but I had to guess that.

I found my way to the error by inserting raise Exception("meow"), bisecting it by whether the exception was the KeyError or this exception. Indeed, we do get to a point in ListOffsetArray._argsort_next that uses awkward_ListOffsetArray_reduce_local_nextparents_64 without ensuring that the node actually has 64-bit offsets, so that's the bug, and it can be fixed by just ensuring 64-bit right before the kernel call.

A better solution would be to add specializations to the kernel, so that it can be used in these situations without making a new 64-bit array. That solution has a slower turn-around time because it would require a new awkward-cpp, but since you marked these as xskip, you don't mind if it takes the rest of the day, do you?

Unless you tell me otherwise, that's how I'm going to fix this, by adding kernel specializations, deploying a new awkward-cpp, and then a new awkward. (@agoose77 should focus on his viva: I'll be doing all of those steps myself.)

@douglasdavis
Copy link
Collaborator Author

That solution has a slower turn-around time because it would require a new awkward-cpp, but since you marked these as xskip, you don't mind if it takes the rest of the day, do you?

I definitely don't mind whatever the timing may be!

@jpivarski
Copy link
Collaborator

scikit-hep/awkward#2572 should do it.

Since you're not pressed for time, we'll wait until after @agoose77's viva to review that PR and do a pair of awkward-cpp/awkward releases. It will be on the weekend or Monday.

@agoose77
Copy link
Collaborator

agoose77 commented Jul 5, 2023

we'll wait until after @agoose77's viva to review that PR

Didn't need to! Having a quick coffee break and LGTM.

@jpivarski
Copy link
Collaborator

These xfails can now be removed with Awkward 2.3.1:

--- a/tests/test_structure.py
+++ b/tests/test_structure.py
@@ -362,7 +362,6 @@ def test_singletons(daa, L4, tmp_path):
     )
 
 
-@pytest.mark.xfail(reason="upstream awkward dtype mismatch")
 @pytest.mark.parametrize("ascending", [True, False])
 def test_argsort(daa, caa, ascending):
     assert_eq(
@@ -379,7 +378,6 @@ def test_argsort(daa, caa, ascending):
     )
 
 
-@pytest.mark.xfail(reason="upstream awkward dtype mismatch")
 @pytest.mark.parametrize("ascending", [True, False])
 def test_sort(daa, caa, ascending):
     assert_eq(

@douglasdavis
Copy link
Collaborator Author

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants