-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL statement (UNION
+ EXCEPT
) causes panic
#4837
Comments
Thank you for the report @DDtKey |
@DDtKey are you able to provide the files you used to cause the bug, or at least give an example of their structure/data? I don't seem to be able to reproduce the panic myself |
It seems to be reproducible for any file structure 🤔 But for example, very simple ones:
and
And at least for |
Hmm, still can't reproduce, on latest master 83c1026: jeffrey:~/Code/arrow-datafusion/datafusion-cli$ cat data1.csv
name
Alex
Bob
Alice
jeffrey:~/Code/arrow-datafusion/datafusion-cli$ cat data2.csv
name
Alex
Bob
John
jeffrey:~/Code/arrow-datafusion/datafusion-cli$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.10s
Running `/media/jeffrey/1tb_860evo_ssd/.cargo_target_cache/debug/datafusion-cli`
DataFusion CLI v15.0.0
❯ CREATE EXTERNAL TABLE foo1 STORED AS CSV WITH HEADER ROW LOCATION 'data1.csv';
0 rows in set. Query took 0.013 seconds.
❯ CREATE EXTERNAL TABLE foo2 STORED AS CSV WITH HEADER ROW LOCATION 'data2.csv';
0 rows in set. Query took 0.002 seconds.
❯ (select * from foo1 except select * from foo2) union all (select * from foo2 except select * from foo1);
+-------+
| name |
+-------+
| Alice |
| John |
+-------+
2 rows in set. Query took 0.012 seconds.
❯ \q
jeffrey:~/Code/arrow-datafusion/datafusion-cli$ git rev-parse --short HEAD
83c10269 Or should I try for different file formats too? |
@Jefffrey oh, looks like it isn't reproducible against main branchq(but for last stable release) for me as well. Thank you. But results isn't correct for me 🤔
But if I would use data_frame.write_csv("...").await?; It will lead to an error: Any thoughts? |
The output seems correct to me from my understanding, unless you're referring to the inconsistent output row ordering? The error seems straightforward enough, where it doesn't seem to override the existing output file? |
My bad, sorry. Accidentally I've ran 3 parallel processes, that's the reason of 3 outputs & file already exists error. |
After some tests against current master branch I was able to discover a new bug(broken behavior), see new issue: #4844 |
That would be a good idea to have more test coverage, though I confess I'm not entirely certain where such a test would be located, maybe in the sqllogictests? Relevant epic #4460 |
cc @ygf11 |
I find the bug is in ❯ create table table_2(name text, id INT) as values('Alex',1);
0 rows in set. Query took 0.002 seconds.
❯ create table table_1(name text, id TINYINT) as values('Alex',1);
0 rows in set. Query took 0.002 seconds.
❯ (
SELECT * FROM table_1
EXCEPT
SELECT * FROM table_2
)
UNION ALL
(
SELECT * FROM table_2
EXCEPT
SELECT * FROM table_1
);
SchemaError(FieldNotFound { field: Column { relation: Some("table_2"), name: "id" }, valid_fields: Some([Column { relation: Some("table_1"), name: "name" }, Column { relation: Some("table_1"), name: "id" }]) }) For union operation, we need ensure each data type of left and right should be same. But it uses To fix this issue, I think we can abandon |
😭, I think this is the only solution to resolve union plan for now. |
Describe the bug
datafusion 15.0.0
panics for specific SQL-query:To Reproduce
Steps to reproduce the behavior:
It will panic:
thread 'main' panicked at 'index out of bounds: the len is 2 but the index is 2'
(len
&index
depends on number of columns in file)Expected behavior
It should works correctly or return an error, but not to panic.
The text was updated successfully, but these errors were encountered: