Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with the substrait integration test #10710

Open
Tracked by #5173
alamb opened this issue May 29, 2024 · 14 comments
Open
Tracked by #5173

Integrate with the substrait integration test #10710

alamb opened this issue May 29, 2024 · 14 comments
Assignees
Labels
enhancement New feature or request substrait

Comments

@alamb
Copy link
Contributor

alamb commented May 29, 2024

Is your feature request related to a problem or challenge?

In #10653, @Blizzara added the beginnings of testing for substrait plans that came from other systems.

@richtia noted #10653 (comment)

FYI...The substrait project also has a repo that aims to do integration testing between different substrait producers/consumers. https://github.com/substrait-io/consumer-testing

Describe the solution you'd like

I would like to get DataFusion working with that substrait integration test

Describe alternatives you've considered

No response

Additional context

No response

@Lordworms
Copy link
Contributor

take

@Lordworms
Copy link
Contributor

Lordworms commented Jun 8, 2024

While I was adding this feature, I encountered lots of issues while trying to generate LogicalPlan from substrait
https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans

Writing this to list all these issues

@alamb
Copy link
Contributor Author

alamb commented Jun 9, 2024

Thank you @Lordworms -- indeed it is a great job to find issues when testing

I wonder if there is some way to get the integration test partly working and checked in somewhere (maybe with the failing tests commented out). If you were able to do so, I think we would be able to get many more people to help with fixing the actual gaps

@Lordworms
Copy link
Contributor

Thank you @Lordworms -- indeed it is a great job to find issues when testing

I wonder if there is some way to get the integration test partly working and checked in somewhere (maybe with the failing tests commented out). If you were able to do so, I think we would be able to get many more people to help with fixing the actual gaps

actually, I have made tpch_1 work just now, I'll draft a PR, and hope to get a review from you, Thanks a lot.

@alamb
Copy link
Contributor Author

alamb commented Jun 9, 2024

I added to the substrait support epic: #5173

@alamb
Copy link
Contributor Author

alamb commented Jun 9, 2024

actually, I have made tpch_1 work just now, I'll draft a PR, and hope to get a review from you, Thanks a lot.

Wow -- nice work!

@richtia
Copy link

richtia commented Jun 9, 2024

Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for you as well!

@Lordworms
Copy link
Contributor

Lordworms commented Jun 9, 2024

Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for you as well!

Hi @richtia , I just find in some confusion in tpch_1_json

in general tpch1, the filter looks like
image
but in this repo, the actual tpch_1 you use is like
image

Is there any purpose of changing the expression?

Thanks a lot

@richtia
Copy link

richtia commented Jun 9, 2024

Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for you as well!

Hi @richtia , I just find in some confusion in tpch_1_json

in general tpch1, the filter looks like image but in this repo, the actual tpch_1 you use is like image

Is there any purpose of changing the expression?

Thanks a lot

Ahh...these TPCH queries were pulled from somewhere else. If there's a mistake in any, they should definitely be updated! Feel free to file an issue or make the change yourself!

Thanks for the catch.

@Blizzara
Copy link
Contributor

The - interval part does seem aligned with TPH-H 3.0.1 spec (page 29). The 120 should in theory be a random number betwen 60 and 120 but that's unrelevant to this usage.

@Lordworms
Copy link
Contributor

@richtia hi! when I was trying to do plan 7,8 and 9, I find the substrait json file is empty, any reason for this? https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans

@richtia
Copy link

richtia commented Jul 9, 2024

@richtia hi! when I was trying to do plan 7,8 and 9, I find the substrait json file is empty, any reason for this? https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans

The plans were generated by Isthmus (https://github.com/substrait-io/substrait-java/tree/main/isthmus) with some minor modifications. I think at the time, Isthmus had trouble generating those 3 plans. So I had just left them blank. I'll create an issue in the repo. Feel free to add them. Otherwise I'll try to get to it some time.

@Lordworms
Copy link
Contributor

@richtia hi! when I was trying to do plan 7,8 and 9, I find the substrait json file is empty, any reason for this? https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans

The plans were generated by Isthmus (https://github.com/substrait-io/substrait-java/tree/main/isthmus) with some minor modifications. I think at the time, Isthmus had trouble generating those 3 plans. So I had just left them blank. I'll create an issue in the repo. Feel free to add them. Otherwise I'll try to get to it some time.

sure, I'll add that later.

@Lordworms
Copy link
Contributor

Lordworms commented Jul 16, 2024

Hi, @richtia I just wonder how you get the original substrait plan, since I use duckdb to be the producer and generate the plan, it didn't have some structure like

"local_files": {
                                                  "items": [
                                                    {
                                                      "uri_file": "file://FILENAME_PLACEHOLDER_0",
                                                      "parquet": {}
                                                    }
                                                  ]
                                                }
}

I know it is similar to use namedTable here as what duckDB did, but I still want to keep the consistency with previous plan. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request substrait
Projects
None yet
Development

No branches or pull requests

5 participants