-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1MB payload latency on localhost #5676
Comments
It's been a long time since we've focused on performance, and I'm not sure what kinds of numbers to expect for a benchmark with those parameters. What is showing up in the client & server CPU profiles? |
Attached are the client and server pdf profiles. Hopefully something jumps out as been suspicious to someone. |
Thanks for the profiles. Nothing really stands out to me there. What are you testing for exactly in your scenario? Typically if you're only doing one RPC at a time ( |
We are trying to understand the latency as it relates to payload size between two process running on the same host. We want to add a proxy that talks GRPC to the application (like a side car) but running on the same host. But currently the extra latency is a show stopper. I am trying understand if the times I am seeing are reasonable or if I have just misconfigured something.
The benchmark proto definition is very simple, so the cost can't be in the marshaling. Times are in nanoseconds |
Maybe not the runtime CPU cost of marshaling, but it could be the cost of the allocations. Our benchmarks will be allocating 3MB per request and 4MB per response: 1. marshaling the request message (client; the request proto message is reused), 2. reading the received request (server), 3. unmarshaling the request (server) and 1. creating the response message (server), 2. marshaling the response message (server), 3. reading the response (client), 4. unmarshaling the response (client). You're achieving 500QPS * 7MB/Q, which is 3.5GB/sec in allocations. That actually seems pretty reasonable to me, but I'm not sure. |
If your real-world use case doesn't involve sending 1MB messages at 500QPS then you might slow down the rate of RPCs and get a more realistic measurement of latency. |
This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
I slowed the rate of RPCs down, putting various delays in between, but it didn't really seem to make any difference. I get similar performance when I try out the java-grpc, so I don't think it's anything related to the go implementation...just rather a limitation of GRPC and large payloads. Anyone has any ideas what else can be done to speed up (client and server both on localhost)? |
This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
Sorry, I meant to update here before it was auto-closed: I think for this, you ultimately will need something like #906. We've had other interest in this recently from some folks who might be able to do the implementation work and also implement a shared memory transport, so it's possible this could happen in the next few months. |
hi @dfawley |
I recently heard a proof of concept for an in-memory transport performed well, but I don't think it's close to landing as a PR any time soon. |
GRPC v1.48
Running the grpc benchmark on my laptop with: ./run_bench.sh -rpc_type unary -req 1000000 -resp 1000000 -r 1
I get results as follows:
The time taken seems a little more than I had anticipated (was hoping to be well under 1ms).
Trying to understand where the time is spent, I had the following suspects:
What do folks think?
The text was updated successfully, but these errors were encountered: