Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix ray lance sink error #3230

Merged
merged 1 commit into from
Jan 10, 2025
Merged

Conversation

Jay-ju
Copy link
Contributor

@Jay-ju Jay-ju commented Dec 11, 2024

@github-actions github-actions bot added bug Something isn't working python labels Dec 11, 2024
@Jay-ju Jay-ju force-pushed the fix_ray_lance_sink branch from 40166e9 to 96ac308 Compare December 11, 2024 10:10
@Jay-ju Jay-ju force-pushed the fix_ray_lance_sink branch 7 times, most recently from 1c815c3 to 24692b6 Compare December 11, 2024 12:42
Comment on lines 141 to 144
if isinstance(first_element, (pa.Table, pd.DataFrame)):
write_results = [
result["write_result"].iloc[0]["result"] for result in write_results
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, does this work on older versions of Ray as well? It's probably not a deal breaker if it doesn't (we can require a certain version of Ray) but I'm curious to know how we should document this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the changes here are compatible with older versions of Ray. So there is no need for a separate documentation. However, before the code I submitted to the Ray community is merged, only versions prior to Ray 2.38.0 can be used.

@chenkovsky
Copy link
Contributor

Hi @Jay-ju @westonpace ray committer has created a PR ray-project/ray#49251 . maybe fix should be based on that PR.

@westonpace
Copy link
Contributor

I've tried this locally. When I run the tests in python/python/tests/test_ray.py I get the following error:

========================================================================================== FAILURES ==========================================================================================
_______________________________________________________________________________________ test_ray_sink ________________________________________________________________________________________

tmp_path = PosixPath('/tmp/pytest-of-pace/pytest-4/test_ray_sink0')

    def test_ray_sink(tmp_path: Path):
        schema = pa.schema([pa.field("id", pa.int64()), pa.field("str", pa.string())])
    
        sink = LanceDatasink(tmp_path)
        ray.data.range(10).map(
            lambda x: {"id": x["id"], "str": f"str-{x['id']}"}
>       ).write_datasink(sink)

python/tests/test_ray.py:32: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/lance/lib/python3.10/site-packages/ray/data/dataset.py:3857: in write_datasink
    datasink.on_write_complete(raw_write_results)
python/lance/ray/sink.py:143: in on_write_complete
    write_results = [
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

.0 = <list_iterator object at 0x78db2e2ff8b0>

    write_results = [
>       result["write_result"].iloc[0]["result"] for result in write_results
    ]
E   TypeError: 'WriteResult' object is not subscriptable

python/lance/ray/sink.py:144: TypeError

I did some inspecting. result is indeed a pandas dataframe. It has 1 column, 1 row, 1 cell. The value in that cell is an object of type ray.data.datasource.datasink.WriteResult. This matches https://docs.ray.io/en/latest/data/api/doc/ray.data.Datasink.on_write_complete.html

A WriteResult object is not subscriptable (e.g. can't call ["result"] on it). From what I see in https://github.com/ray-project/ray/blob/4f1a30852f2d62b5d7dfae50bddcefbee9dd406b/python/ray/data/datasource/datasink.py#L15 there are only two properties on WriteResult which are num_rows and size_bytes and this is not enough information to commit the write.

I agree with @chenkovsky that we will need to wait for ray-project/ray#49251 to be released.

@westonpace
Copy link
Contributor

(I tested with both ray==2.38 and ray==2.40)

@Jay-ju Jay-ju force-pushed the fix_ray_lance_sink branch 3 times, most recently from 7943f7e to a152077 Compare December 30, 2024 06:49
@Jay-ju
Copy link
Contributor Author

Jay-ju commented Dec 30, 2024

@westonpace I have made changes based on the latest code of ray. Please review it again.

@Jay-ju Jay-ju mentioned this pull request Dec 30, 2024
@Jay-ju Jay-ju force-pushed the fix_ray_lance_sink branch 2 times, most recently from af033ae to 0f22455 Compare January 1, 2025 11:52
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing to work on this! I did confirm that all tests pass with this change using the nightly release from Ray and the tests continue to pass with older version (2.37) so i think this is good.

I have a few minor warning message / comment suggestions and then we can merge this. Please re-request review when you've been able to address them.

python/python/lance/ray/sink.py Outdated Show resolved Hide resolved
python/python/lance/ray/sink.py Outdated Show resolved Hide resolved
python/python/lance/ray/sink.py Outdated Show resolved Hide resolved
@Jay-ju Jay-ju force-pushed the fix_ray_lance_sink branch 2 times, most recently from 8cd692e to 81c276b Compare January 7, 2025 01:47
@Jay-ju Jay-ju force-pushed the fix_ray_lance_sink branch from 81c276b to 645cb5a Compare January 7, 2025 01:51
@Jay-ju
Copy link
Contributor Author

Jay-ju commented Jan 8, 2025

@westonpace please help me review this PR again? tks

@westonpace westonpace merged commit 8db5943 into lancedb:main Jan 10, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants