Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Move PyTorch request ID and cache hit indicator to top level #88901

Merged
merged 10 commits into from
Aug 3, 2022

Conversation

droberts195
Copy link
Contributor

@droberts195 droberts195 commented Jul 28, 2022

This change will facilitate a performance improvement on the C++
side. The request ID and cache hit indicator are the parts that
need to be changed when the C++ process responds to an inference
request. Having them at the top level means we do not need to
parse and manipulate the original response - we can simply cache
the inner object of the response and add the outer fields around
it when serializing it.

Replaces #88485
Companion to elastic/ml-cpp#2376

This change will facilitate a performance improvement on the C++
side. The request ID and cache hit indicator are the parts that
need to be changed when the C++ process responds to an inference
request. Having them at the top level means we do not need to
parse and manipulate the original response - we can simply cache
the inner object of the response and add the outer fields around
it when serializing it.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jul 28, 2022
@droberts195
Copy link
Contributor Author

Ah, the time taken should move up too, as that is calculated differently for cache hits.

@@ -174,10 +192,10 @@ void processErrorResult(PyTorchResult result) {

errorCount++;

logger.trace(() -> format("[%s] Parsed error with id [%s]", deploymentId, errorResult.requestId()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how processResult is synchronized and processErrorResult isn't, errorCount can be woefully incorrect due to race conditions.

That can be fixed in a different commit. This particular change looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's quite a simple change I addressed it in 1bdf3a0

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Aug 3, 2022
These tests will fail if elastic/ml-cpp#2376 with
them unmuted. elastic#88901 will follow up with the Java
side changes.
droberts195 added a commit that referenced this pull request Aug 3, 2022
These tests will fail if elastic/ml-cpp#2376 with
them unmuted. #88901 will follow up with the Java
side changes.
droberts195 added a commit to elastic/ml-cpp that referenced this pull request Aug 3, 2022
…2376)

Previously the inference cache stored complete results, including
a request ID and time taken. This was inefficient as it then meant
the original response had to be parsed and modified before sending
back to the Java side.

This PR changes the cache to store just the inner portion of the
inference result. Then the outer layer is added per request after
retrieving from the cache.

Additionally, the result writing functions are moved into a class
of their own, which means they can be unit tested.

Companion to elastic/elasticsearch#88901
@droberts195
Copy link
Contributor Author

@elasticmachine update branch

@droberts195 droberts195 merged commit 8c21d03 into elastic:main Aug 3, 2022
@droberts195 droberts195 deleted the refactor_pytorch_results branch August 3, 2022 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >non-issue Team:ML Meta label for the ML team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants