1MB payload latency on localhost #5676

tony-clarke-amdocs · 2022-09-29T02:30:35Z

GRPC v1.48

Running the grpc benchmark on my laptop with: ./run_bench.sh -rpc_type unary -req 1000000 -resp 1000000 -r 1

I get results as follows:

================================================================================
r_1_c_1_req_1000000_resp_1000000_unary_1664417694
qps: 506.4
Latency: (50/90/99 %ile): 1.972018ms/2.313624ms/2.685183ms
Client CPU utilization: 0s
Client CPU profile: /tmp/client_r_1_c_1_req_1000000_resp_1000000_unary_1664417694.cpu
Client Mem Profile: /tmp/client_r_1_c_1_req_1000000_resp_1000000_unary_1664417694.mem
Server CPU utilization: 0s
Server CPU profile: /tmp/Server_r_1_c_1_req_1000000_resp_1000000_unary_1664417694.cpu
Server Mem Profile: /tmp/Server_r_1_c_1_req_1000000_resp_1000000_unary_1664417694.mem

The time taken seems a little more than I had anticipated (was hoping to be well under 1ms).

Trying to understand where the time is spent, I had the following suspects:

User space -> kernel space -> User space for the network call. But using bufconn seems to suggest that this is not an issue.
Proto codec writing the proto to the wire format and back. However timing this separately, it only seems to account for ~10% of the time.
Http2 overhead. Reading this issue and running the sample app here seems to show that http2 is much slower than http1.1 (at least in this use case).

What do folks think?

The text was updated successfully, but these errors were encountered:

dfawley · 2022-10-04T17:24:01Z

It's been a long time since we've focused on performance, and I'm not sure what kinds of numbers to expect for a benchmark with those parameters. What is showing up in the client & server CPU profiles?

tony-clarke-amdocs · 2022-10-04T18:20:41Z

Attached are the client and server pdf profiles. Hopefully something jumps out as been suspicious to someone.
server.pdf
client.pdf

dfawley · 2022-10-04T20:00:18Z

Thanks for the profiles. Nothing really stands out to me there.

What are you testing for exactly in your scenario? Typically if you're only doing one RPC at a time (-r 1) then you would be using a small payload to measure latency. If you are testing for throughput, you would use a large (e.g. 1MB like your run), but with many RPCs concurrently. If you testing for QPS, you would use many RPCs concurrently but small payloads.

tony-clarke-amdocs · 2022-10-04T21:53:45Z

What are you testing for exactly in your scenario?

We are trying to understand the latency as it relates to payload size between two process running on the same host. We want to add a proxy that talks GRPC to the application (like a side car) but running on the same host. But currently the extra latency is a show stopper. I am trying understand if the times I am seeing are reasonable or if I have just misconfigured something.

Payload/Protocol	0	1000	10000	100000	1000000
TCP Grpc	119083	114288	122452	290807	1930755
UDS Grpc	97265	101345	128182	291941	1981637

The benchmark proto definition is very simple, so the cost can't be in the marshaling. Times are in nanoseconds

dfawley · 2022-10-04T22:07:10Z

The benchmark proto definition is very simple, so the cost can't be in the marshaling.

Maybe not the runtime CPU cost of marshaling, but it could be the cost of the allocations. Our benchmarks will be allocating 3MB per request and 4MB per response: 1. marshaling the request message (client; the request proto message is reused), 2. reading the received request (server), 3. unmarshaling the request (server) and 1. creating the response message (server), 2. marshaling the response message (server), 3. reading the response (client), 4. unmarshaling the response (client). You're achieving 500QPS * 7MB/Q, which is 3.5GB/sec in allocations. That actually seems pretty reasonable to me, but I'm not sure.

dfawley · 2022-10-04T22:09:20Z

If your real-world use case doesn't involve sending 1MB messages at 500QPS then you might slow down the rate of RPCs and get a more realistic measurement of latency.

github-actions · 2022-10-10T22:49:54Z

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

tony-clarke-amdocs · 2022-10-10T22:55:44Z

I slowed the rate of RPCs down, putting various delays in between, but it didn't really seem to make any difference. I get similar performance when I try out the java-grpc, so I don't think it's anything related to the go implementation...just rather a limitation of GRPC and large payloads. Anyone has any ideas what else can be done to speed up (client and server both on localhost)?

github-actions · 2022-10-17T01:08:48Z

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

dfawley · 2022-10-25T16:57:14Z

Sorry, I meant to update here before it was auto-closed:

I think for this, you ultimately will need something like #906. We've had other interest in this recently from some folks who might be able to do the implementation work and also implement a shared memory transport, so it's possible this could happen in the next few months.

vominhtrius · 2023-03-09T15:42:08Z

Sorry, I meant to update here before it was auto-closed:

I think for this, you ultimately will need something like #906. We've had other interest in this recently from some folks who might be able to do the implementation work and also implement a shared memory transport, so it's possible this could happen in the next few months.

hi @dfawley
Do you have any updates on "implement a shared memory transport" for grpc-go?

dfawley · 2023-03-09T15:57:13Z

I recently heard a proof of concept for an in-memory transport performed well, but I don't think it's close to landing as a PR any time soon.

tony-clarke-amdocs added the Type: Question label Sep 29, 2022

dfawley added the Status: Requires Reporter Clarification label Oct 4, 2022

github-actions bot added the stale label Oct 10, 2022

github-actions bot removed the stale label Oct 11, 2022

github-actions bot added the stale label Oct 17, 2022

github-actions bot closed this as completed Oct 24, 2022

github-actions bot locked as resolved and limited conversation to collaborators Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1MB payload latency on localhost #5676

1MB payload latency on localhost #5676

tony-clarke-amdocs commented Sep 29, 2022 •

edited

Loading

dfawley commented Oct 4, 2022

tony-clarke-amdocs commented Oct 4, 2022

dfawley commented Oct 4, 2022

tony-clarke-amdocs commented Oct 4, 2022

dfawley commented Oct 4, 2022

dfawley commented Oct 4, 2022

github-actions bot commented Oct 10, 2022

tony-clarke-amdocs commented Oct 10, 2022

github-actions bot commented Oct 17, 2022

dfawley commented Oct 25, 2022

vominhtrius commented Mar 9, 2023

dfawley commented Mar 9, 2023

1MB payload latency on localhost #5676

1MB payload latency on localhost #5676

Comments

tony-clarke-amdocs commented Sep 29, 2022 • edited Loading

dfawley commented Oct 4, 2022

tony-clarke-amdocs commented Oct 4, 2022

dfawley commented Oct 4, 2022

tony-clarke-amdocs commented Oct 4, 2022

dfawley commented Oct 4, 2022

dfawley commented Oct 4, 2022

github-actions bot commented Oct 10, 2022

tony-clarke-amdocs commented Oct 10, 2022

github-actions bot commented Oct 17, 2022

dfawley commented Oct 25, 2022

vominhtrius commented Mar 9, 2023

dfawley commented Mar 9, 2023

tony-clarke-amdocs commented Sep 29, 2022 •

edited

Loading