Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace panic with Result in reader.rs #3354

Closed
wants to merge 3 commits into from
Closed

Replace panic with Result in reader.rs #3354

wants to merge 3 commits into from

Conversation

askoa
Copy link
Contributor

@askoa askoa commented Sep 3, 2022

Which issue does this PR close?

Partial fix for #3317

Rationale for this change

Replace panic with Result.

What changes are included in this PR?

Replace panic with replace in below places

"/home/andy/git/apache/arrow-datafusion/datafusion/row/src/reader.rs":302 .unwrap();
"/home/andy/git/apache/arrow-datafusion/datafusion/row/src/reader.rs":310 .unwrap();
"/home/andy/git/apache/arrow-datafusion/datafusion/row/src/reader.rs":337 let to = to.as_any_mut().downcast_mut::().unwrap();
"/home/andy/git/apache/arrow-datafusion/datafusion/row/src/reader.rs":350 let to = to.as_any_mut().downcast_mut::().unwrap();
"/home/andy/git/apache/arrow-datafusion/datafusion/row/src/reader.rs":377 _ => unimplemented!(),
"/home/andy/git/apache/arrow-datafusion/datafusion/row/src/reader.rs":404 _ => unimplemented!(),

Are there any user-facing changes?

Not sure. Few functions are modified to return Result. Not sure if they are part of public API.

@github-actions github-actions bot added the core Core DataFusion crate label Sep 3, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2022

Codecov Report

Merging #3354 (e239bed) into master (786c319) will decrease coverage by 0.02%.
The diff coverage is 54.83%.

@@            Coverage Diff             @@
##           master    #3354      +/-   ##
==========================================
- Coverage   85.47%   85.45%   -0.03%     
==========================================
  Files         294      294              
  Lines       54059    54079      +20     
==========================================
+ Hits        46209    46212       +3     
- Misses       7850     7867      +17     
Impacted Files Coverage Δ
datafusion/row/src/reader.rs 81.21% <50.00%> (-7.07%) ⬇️
...sion/core/src/physical_plan/aggregates/row_hash.rs 95.78% <100.00%> (ø)
datafusion/core/src/physical_plan/metrics/value.rs 86.93% <0.00%> (-0.51%) ⬇️
datafusion/expr/src/logical_plan/plan.rs 77.28% <0.00%> (-0.34%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @ghubkoa

I think most of these paths are basically impossible to hit in reality, but the Result plumbing is good.

@askoa
Copy link
Contributor Author

askoa commented Sep 4, 2022

@alamb @xudong963 It looks like there are some issues with jit tests and I am looking into it.

@askoa
Copy link
Contributor Author

askoa commented Sep 4, 2022

@alamb @xudong963 @andygrove I need help to fix this PR. After spending some time on this I was able to figure out that the JIT tests are failing with below message.

(signal: 11, SIGSEGV: invalid memory reference)

The error happens when calling the function code_fn

https://github.com/apache/arrow-datafusion/blob/b175f9a48d296ea66a07acc3825f4d6e0def2367/datafusion/row/src/jit/reader.rs#L57

The function code_fn is dynamically assigned in below lines

https://github.com/apache/arrow-datafusion/blob/b175f9a48d296ea66a07acc3825f4d6e0def2367/datafusion/row/src/jit/reader.rs#L48-L53

And signature of the gen_func function is defined below

https://github.com/apache/arrow-datafusion/blob/b175f9a48d296ea66a07acc3825f4d6e0def2367/datafusion/row/src/jit/reader.rs#L71-L75

I need to make changes in two places

  1. While calling reg_fn! the last parameter should represent Result instead of None. I am not able to figure out how to pass Result. Looks like JIT is capable of handling primitive types. I tried Some(PTR) and it did not work.
  2. In mem:: transmute, I think I need to change the signature to return Result. But not sure if this is enough.

Do you guys have any inputs on how to go about this?

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

perhaps @yjshen may have some input

@yjshen
Copy link
Member

yjshen commented Sep 6, 2022

Currently, the use of external read_field_x functions is implemented as b.call_stmts, which are executed for side effects. You need to create a local variable and assign the Expr::Call result to it.

For the reader case we are meeting here, I personally think it's OK to panic here since we are deserializing RecordBatch from Vec<u8> with a known row type. It must be a bug if we create a list of array builders according to the schema but cannot cast them to the corresponding type-specific builders later.

@askoa
Copy link
Contributor Author

askoa commented Sep 6, 2022

For the reader case we are meeting here, I personally think it's OK to panic here since we are deserializing RecordBatch from Vec<u8> with a known row type. It must be a bug if we create a list of array builders according to the schema but cannot cast them to the corresponding type-specific builders later.

I stumbled upon the rust book article which suggests to return Result for recoverable error. If the error is not recoverable then panic is recommended. I agree with @yjshen that the read_field* functions should result in panic as the error does not look recoverable.

If I don't get any opposing views, I'll revert the code for the read_field* functions and send updated PR.

Edit: This PR becomes redundant if the read_field* methods should continue to panic. I'll be closing the PR and will not send any updates.

@askoa askoa marked this pull request as draft September 6, 2022 15:04
@askoa
Copy link
Contributor Author

askoa commented Sep 6, 2022

panic seems acceptable approach for read_field* functions. Not replacing panic with Result. Hence closing the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants