-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Protobuf and gRPC for internal communications #773
Comments
I'm wondering if you need the JSON at all. You might be able to use JavaScript Protobuf on the browser too. Not saying JSON isn't simpler, just an idea if it isn't straightforward. |
For Jaeger UI we might be able to use https://github.com/grpc/grpc-web. But I still think we need native JSON support, if only for jaeger-client-javascript to be able to report traces from the browser. I don't think it would be a good idea to insist on having grpc-web in the browser applications as a dependency, given that JSON is a lot more natural and easily available (possibly for other languages as well). |
There's gRPC Gateway, which can take care of providing transparent REST+JSON support: https://github.com/grpc-ecosystem/grpc-gateway |
Re jaeger-client-javascript, using protobufs instead of JSON is an option: https://github.com/dcodeIO/protobuf.js Has broad browser compatibility: http://dcode.io/protobuf.js/#compatibility |
So I briefly looked into this last night and the consensus was that gzipped JSON will almost always beat Javascript protobuf. |
@isaachier Can you refer to relevant material? And, beat in what sense, speed, size, etc? Also, some perf benchmarks from dcodeIO/protobuf.js (in node, not the browser). There is a very significant difference between the Google implementation and decodeIO's. Which implementation was used in your investigation?
|
OK consensus might be an overstatement. See this article: https://blog.wearewizards.io/using-protobuf-instead-of-json-to-communicate-with-a-frontend. |
For the record, I wouldn't use node as a good indicator of frontend performance. Node C++ extensions can easily add speed ups that aren't available in the browser. |
I believe we need to support JSON format for the following reasons:
|
Regarding Go rendering of duration in JSON (e.g.
Cons:
|
I wouldn't worry too much about the size of the span on the wire. As long as it doesn't hurt performance somewhere, people generally don't care. |
Fun fact:
https://developers.google.com/protocol-buffers/docs/proto3#json |
For duration, why not just use nanoseconds? |
Nanoseconds are fine in 64bit, but Javascript only understands 53bit and that's 100 days in nanos - uncomfortably low number. The trade-off is we either render 64bit duration to JSON as a string, e.g. |
I suppose another option would be to render int64 as a string in JSON, to avoid precision loss in Javascript. |
If this is our canonical storage model, too, end timestamps would make it more difficult to do duration based queries. What about storing the duration as milliseconds as a float? |
FYI, milliseconds as a float is the approach taken by the |
I'm wondering if we'd care about the precision for that. Otherwise, |
That struct is a regular nanos timestamp. It's already supported by protobuf. |
Fwiw, average span size in our deployment is between 400-500 bytes (in thrift), so we'd be adding 1% overhead on the data size by switching from duration to timestamp. |
My 2 cents from Prometheus experience.
Prometheus uses gogoprotobuf with no issue.
Prometheus also uses |
Seems the main benefits of using floats to store the duration in milliseconds is simplicity; ease to support duration based queries; it's a bit more aligned, semantically, with the property we're interested in storing and string parsing is not required in any context. |
This is configurable in the marshaler. I would recommend using Note that committing to a JSON representation limits your options for backwards compatible changes (no field renaming). This is a subtle difference from usual proto3 forward/backward compatibility benefits. |
Btw, GRPC allows client side load balancing by default. Just use GRPCLB. The default discovery is by using DNS naming and point to the load balancer. |
Signed-off-by: Yuri Shkuro <[email protected]> Fix lint Signed-off-by: Yuri Shkuro <[email protected]> Add gRPC endpoint to collector Signed-off-by: Yuri Shkuro <[email protected]> Add proper grpc server Signed-off-by: Yuri Shkuro <[email protected]> Add grpc to all-in-one Signed-off-by: Yuri Shkuro <[email protected]> adds the gRPC port to the docker-all-in-one Dockerfile (jaegertracing#773) (jaegertracing#967) Signed-off-by: Jan Heylen <[email protected]> Simple fixes (jaegertracing#999) * Fix test error Signed-off-by: Isaac Hier <[email protected]> * go fmt Signed-off-by: Isaac Hier <[email protected]> Implement PostSpans for collector gRPC handler Signed-off-by: Isaac Hier <[email protected]> Fix formatting Signed-off-by: Isaac Hier <[email protected]> Remove timeout Signed-off-by: Isaac Hier <[email protected]> Use expectedError variable Signed-off-by: Isaac Hier <[email protected]>
I am working on this now. I want to implement a minimal working solution using grpc communication between agent and collector and get it merged to master as experimental. So far I have rebased @isaachier PR #1042 and added proto for sampling. I will be submitting PR shortly. I will be posting my notes here what should be done afterward:
|
Would it move this along better if we start merging smaller pieces into master, e.g. my branch & Isaac's PR? I think the tests were outstanding. |
I think it would break me right now. I will submit a PR to master based on those two. Now fixing the coverage. |
Hey @pavolloffay. Glad to see this finally happening! Please let me know if you need any help here. |
Signed-off-by: Yuri Shkuro <[email protected]> Fix lint Signed-off-by: Yuri Shkuro <[email protected]> Add gRPC endpoint to collector Signed-off-by: Yuri Shkuro <[email protected]> Add proper grpc server Signed-off-by: Yuri Shkuro <[email protected]> Add grpc to all-in-one Signed-off-by: Yuri Shkuro <[email protected]> adds the gRPC port to the docker-all-in-one Dockerfile (jaegertracing#773) (jaegertracing#967) Signed-off-by: Jan Heylen <[email protected]> Simple fixes (jaegertracing#999) * Fix test error Signed-off-by: Isaac Hier <[email protected]> * go fmt Signed-off-by: Isaac Hier <[email protected]> Implement PostSpans for collector gRPC handler Signed-off-by: Isaac Hier <[email protected]> Fix formatting Signed-off-by: Isaac Hier <[email protected]> Remove timeout Signed-off-by: Isaac Hier <[email protected]> Use expectedError variable Signed-off-by: Isaac Hier <[email protected]> Agent - collector grpc Signed-off-by: Pavol Loffay <[email protected]> Working collector agent grpc Signed-off-by: Pavol Loffay <[email protected]> Remove what is not needed Signed-off-by: Pavol Loffay <[email protected]>
I am wondering whether we should rename
|
why do we need "host"? |
my bad there should not be host. The most important is scoping it under |
For now I suggest keeping it simple without scoping. and serve multiple services on the same port. |
For the query-service, would it be better to multiplex the GRPC and HTTP handlers on the same port as compared to hosting two endpoints? |
yes, it is mentioned in the implementation plan |
Had a discussion with @pavolloffay and he mentioned it would be better from a security perspective to have different ports for the two endpoints. Would make sense if we would want to expose the GRPC port to public traffic and keep tchannel private. |
Why is it better for security? I can see that arguments for endpoints of different functionality, such as query vs. submitting spans, but not so much for the same functionality using different formats. Anyway, we can certainly start with different ports. My preference is in the future to have the flexibility of the main() function recognizing that the same port is given for different protocols and automatically share the single server. |
I meant that if the HTTP endpoint of the query-service does not support SSL+authorisation, we would not be able to expose the multiplexed port publicly (unless of course we run it behind a reverse-proxy). Whereas GRPC would have HTTPS support. :) |
Very old issue, closing |
Problem
At the moment we have the following data models handled by the backend, often with two-way transformers between them:
/model
used by all internal APIs and business logicjaeger.thrift
as the main on-the-wire representation/model/json
is JSON used for communicating with the UI/model/json
with embedded Process used by ElasticSearch storage pluginzipkin.thrift
and Zipkin JSON accepted as input data models as wellWe also have limitations on routing traffic between agents and collectors because they use TChannel as the communication protocol.
Proposed Solution
Converge on a Protobuf 3 Model
By converging on a single Protobuf model we can consolidate some of these models, reduce the number of converters, and unblock / enable several other work streams:
/model/json
(the ElasticSearch plugin will need to retain the model and converters, which should be migrated into the es module)Use gRPC with Protobuf model
User-Facing Deliverables
Phase 1 - MVP
Goal: deprecate TChannel.
Phase 2
Phase 3
Implementation notes
We've talked about this a lot, but there are many moving pieces that must come together, so this is the meta issue to list them all out, in random order.
[]model.KeyValue
instead of[]*model.KeyValue
(gist)oneof
for KeyValue or keep sending the Type field explicitly as we do todaymodel.Span
. We need to decide if we want to keep them separate and do transformation or merge them into one by generating the core skeletons and then adding the additional business methods via companion hand-written source files. I am in favor of merging them to minimize data copying. The risk here is if future changes in proto/gogo would result in differently generated model, but that seems unlikely.KeyValue.VType=STRING
andSpanRef.Type=CHILD_OF
are omitted by default (as zero values)Implementation plan
encoding/json
The text was updated successfully, but these errors were encountered: