Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gRPC query service with OTLP model #76

Merged
merged 12 commits into from
Jun 25, 2021
112 changes: 112 additions & 0 deletions proto/api_v3/query_service.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
// Copyright (c) 2021 The Jaeger Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

syntax="proto3";

package jaeger.api_v3;

import "opentelemetry/proto/trace/v1/trace.proto";
import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";

option go_package = "api_v3";
option java_package = "io.jaegertracing.api_v3";

// Request object to get a trace.
message GetTraceRequest {
// Hex encoded 64 or 128 bit trace ID.
string trace_id = 1;
}

// A single response chunk holds spans from a single trace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge, a chunk at the moment always represents a single trace (a trace known at the query time).

I think this is not a matter of how it is currently implemented, but what we want to guarantee. The intention of the original API was to NOT have this guarantee, i.e. the service is allowed to mix spans from different traces in a single chunk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should guarantee it for these reasons:

  • the current API does not mix spans from different traces
  • the API that does not mix spans from different traces is easier to consume (e.g. for typical use-cases that we know)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like this guarantee in the API. The streaming & chunking API was primarily introduced for efficiency, but we're taking away server's ability to optimize its response. You could be loading a ton of small traces, e.g. 2 spans each, so this guarantee in the API would force the server to send tiny chunks, which is going to be suboptimal. On the other hand, there is a max chunk size in the server so there is always a possibility that a large trace will be split across several chunks, that largely takes away your "easier to consume" reason #2.

We can always introduce this guarantee later, but removing it would backwards incompatible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 5 years of the project history we haven't used the feature of streaming chunks with mixed traces. I don't think any other DT system behaves this way.

You could be loading a ton of small traces, e.g. 2 spans each, so this guarantee in the API would force the server to send tiny chunks, which is going to be suboptimal.

This is how Jaeger works right now and we haven't see any complaints/issues/use-case to change it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always introduce this guarantee later, but removing it would backwards incompatible.

It would be a breaking change one way or another. The current consumers do not expect that spans in chunks are mixed (e.g. UI does not).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this per my comments above but I have removed the guarantee of not mixing spans in one chunk per Yuri's request.

message SpansResponseChunk {
// A list of OpenTelemetry ResourceSpans.
// In case of JSON format the ids (trace_id, span_id, parent_id) are encoded in base64 even though OpenTelemetry specification
// mandates to use hex encoding [2].
// Base64 is chosen to keep compatibility with JSONPb codec.
// [1]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto
// [2]: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlphttp
repeated opentelemetry.proto.trace.v1.ResourceSpans resource_spans = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a resource (maybe this)?

is it possible for jaeger-query to return spans from more than one resource? If so, for my learning, what are some examples?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky to construct this from Jaeger spans because we denormalize the Process into each individual span. We do have some re-assembly logic when we return spans to the UI, but it's probably also valid to just return (resource, span) pairs as denormalized.

}

// Query parameters to find traces.
// Note that some storage implementations do not guarantee correct implementation of all parameters.
message TraceQueryParameters {
string service_name = 1;
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
string operation_name = 2;
// Attributes are matched against Span and Resource attributes.
// At least one span in a trace must match all specified attributes.
map<string, string> attributes = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clarify which attributes these are supposed to match? We've been pretty loose about it in the original API, leaving the interpretation to the storage. I.e. should all these attributes match on a single span, or could they match across spans? Do they match span attributes only or span logs as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The match should be done on tags and process tags. ES supports match on logs as well when kibana support is enabled (flat schema).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ES they must match on a single span. How is it done in cassandra? I think all storages should follow this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true in Cassandra, because it takes the tag=value string and looks up an index that just gives trace IDs. If more than one tag is provided, it could easily match on different spans.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment on the TraceQueryParameters that clarifies that some storage implementations might deviate.

Alright so C* does deviate as well. What was the original design for attributes? Match any attributes within trace or in a single span?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to see this: what would you expect as a user? If you specify two pairs of attributes, would you expect them to exist throughout the trace, or for all attributes to exist as part of the same span? I'm not quite sure I have an answer here... Here's one case advocating for attributes to exist throughout the trace:

  • root span has "userID=123", no other spans contain this
  • span (not root span) has error=true
  • SRE is looking for traces for user 123 with errors in it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to create an issue to address this, this PR can be merged as is.

// Span min start time.
google.protobuf.Timestamp start_time_min = 4;
// Span max start time.
google.protobuf.Timestamp start_time_max = 5;
// Span min duration.
google.protobuf.Duration duration_min = 6;
// Span max duration.
google.protobuf.Duration duration_max = 7;
jpkrohling marked this conversation as resolved.
Show resolved Hide resolved
// Maximum number of traces in the response.
int32 num_traces = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this as num_traces or use a more vague term like "search depth"? Because Cassandra storage does not guarantee num_traces in the response, which was often a source of confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use num_traces it is what the query API uses. If C* does not implement it (or any other storage) we should document that.

The goal is to make it clear for users what the parameter means.

}

// Request object to search traces.
message FindTracesRequest {
TraceQueryParameters query = 1;
}

// Request object to get service names.
message GetServicesRequest {}

// Response object to get service names.
message GetServicesResponse {
repeated string services = 1;
}

// Request object to get operation names.
message GetOperationsRequest {
// Required service name.
string service = 1;
// Optional span kind.
string span_kind = 2;
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
}

// Operation encapsulates information about operation.
message Operation {
string name = 1;
string span_kind = 2;
}

// Response object to get operation names.
message GetOperationsResponse {
repeated Operation operations = 1;
}

service QueryService {
// GetTrace returns a single trace.
// Note that the JSON response over HTTP is wrapped into result envelope "{"result": ...}"
// It means that the JSON response cannot be directly unmarshalled using JSONPb.
// This can be fixed by first parsing into user-defined envelope with standard JSON library
// or string manipulation to remove the envelope. Alternatively generate objects using OpenAPI.
rpc GetTrace(GetTraceRequest) returns (stream SpansResponseChunk) {}
joe-elliott marked this conversation as resolved.
Show resolved Hide resolved

// FindTraces searches for traces.
// See GetTrace for JSON unmarshalling.
rpc FindTraces(FindTracesRequest) returns (stream SpansResponseChunk) {}

// GetServices returns service names.
rpc GetServices(GetServicesRequest) returns (GetServicesResponse) {}

// GetOperations returns operation names.
rpc GetOperations(GetOperationsRequest) returns (GetOperationsResponse) {}
}
14 changes: 14 additions & 0 deletions proto/api_v3/query_service_http.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# This is an API configuration to generate an HTTP/JSON -> gRPC gateway for the
# OpenTelemetry service using github.com/grpc-ecosystem/grpc-gateway.
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
type: google.api.Service
config_version: 3
http:
rules:
- selector: jaeger.api_v3.QueryService.GetTrace
get: /v3/traces/{trace_id}
- selector: jaeger.api_v3.QueryService.GetTraces
get: /v3/traces
- selector: jaeger.api_v3.QueryService.GetServices
get: /v3/services
- selector: jaeger.api_v3.QueryService.GetOperations
get: /v3/operations