-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Move PyTorch request ID and cache hit indicator to top level #88901
[ML] Move PyTorch request ID and cache hit indicator to top level #88901
Conversation
This change will facilitate a performance improvement on the C++ side. The request ID and cache hit indicator are the parts that need to be changed when the C++ process responds to an inference request. Having them at the top level means we do not need to parse and manipulate the original response - we can simply cache the inner object of the response and add the outer fields around it when serializing it.
Pinging @elastic/ml-core (Team:ML) |
Ah, the time taken should move up too, as that is calculated differently for cache hits. |
@@ -174,10 +192,10 @@ void processErrorResult(PyTorchResult result) { | |||
|
|||
errorCount++; | |||
|
|||
logger.trace(() -> format("[%s] Parsed error with id [%s]", deploymentId, errorResult.requestId())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing how processResult
is synchronized and processErrorResult
isn't, errorCount
can be woefully incorrect due to race conditions.
That can be fixed in a different commit. This particular change looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's quite a simple change I addressed it in 1bdf3a0
These tests will fail if elastic/ml-cpp#2376 with them unmuted. elastic#88901 will follow up with the Java side changes.
These tests will fail if elastic/ml-cpp#2376 with them unmuted. #88901 will follow up with the Java side changes.
…2376) Previously the inference cache stored complete results, including a request ID and time taken. This was inefficient as it then meant the original response had to be parsed and modified before sending back to the Java side. This PR changes the cache to store just the inner portion of the inference result. Then the outer layer is added per request after retrieving from the cache. Additionally, the result writing functions are moved into a class of their own, which means they can be unit tested. Companion to elastic/elasticsearch#88901
@elasticmachine update branch |
This change will facilitate a performance improvement on the C++
side. The request ID and cache hit indicator are the parts that
need to be changed when the C++ process responds to an inference
request. Having them at the top level means we do not need to
parse and manipulate the original response - we can simply cache
the inner object of the response and add the outer fields around
it when serializing it.
Replaces #88485
Companion to elastic/ml-cpp#2376