-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent http response between presto-native and presto-spark during error #19609
Comments
cc @mbasmanova |
Proposals 1 and 2 make sense to me. Let's make these changes unless there are objections.
This one I'm not sure about. In Velox, we use VELOX_CHECK and VELOX_USER_CHECK macros to throw exceptions. These generate stacktraces as long as that functionality is enabled via gflags.
|
CC: @miaoever |
CC: @majetideepak |
I see, VELOX_CHECK and VELOX_USER_CHECK indeed captures the stacktrace. In that case, we could extend and provide PRESTO_* macros which mirror their respective VELOX_* functionality. |
If we go ahead with proposal 1 and 2, would we need to modify how presto coordinator handles http status code? |
@vermapratyush Thanks for creating the issue. This is a great finding. Looking at presto/presto-native-execution/presto_cpp/main/TaskResource.cpp Lines 237 to 244 in e1bb41a
It appears the worker is relying on coordinator to classify the error. It creates an empty task and with exception details. I am thinking why we don't use same thing for non velox exception as well. Basically create an empty tasks and put message for following sections. Then co-ordinator needs no change.
|
The assumption here is that coordinator does not rely on http status code to make decisions and we are ok with changing http status code from 5xx to 2xx along with change to payload structure (only for current 5xx response). Also, if we go this route, what do we do for other APIs like TaskResource::deleteTask. They will still be returning 5XX with a non-json response body, which would be different from TaskResource::createOrUpdateTask |
@vermapratyush Makes sense. I think our best option is to read the coordinator side code and decide based on that. Otherwise we end with random items :(. CC: @pranjalssh @rschlussel @arunthirupathi for design help here. |
A quick look at https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/server/remotetask/HttpRemoteTask.java makes me think that we basically only have a failure response when there was some connection issue and execution failures will just show that info in the TaskStatus itself (maybe TaskState and ExecutionFailureInfo fields) https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/execution/TaskStatus.java. But I would defer to @ajaygeorge who knows much more about the worker <-> coordinator communication. |
@vermapratyush
Failure handling is different for the JSON path and thrift path. presto/presto-main/src/main/java/com/facebook/presto/server/SimpleHttpResponseHandler.java Lines 53 to 73 in 3513d07
|
The interaction between presto-native cpp server and presto-spark during unhappy flows is not consistent.
In the API TaskResource#createOrUpdateTaskImpl (
presto/presto-native-execution/presto_cpp/main/TaskResource.cpp
Line 237 in e1bb41a
In TaskResource::deleteTask (
presto/presto-native-execution/presto_cpp/main/TaskResource.cpp
Line 330 in e1bb41a
There are few problems with this approach:
Some proposal to the problem
Use boost::exception to throw exceptions. boost::exception provides a much structured format for exception and includes stacktrace.VELOX_CHECK and VELOX_USER_CHECK already captures stack trace, extend the functionality and surface it as PRESTO_* macros with mirroring funcitonality.If we go ahead with proposal 1 and 2, would we need to modify how presto coordinator handles http status code?
The text was updated successfully, but these errors were encountered: