Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition to 128bit trace ids (still 64bit span ids) #1298

Closed
11 tasks done
codefromthecrypt opened this issue Sep 15, 2016 · 19 comments
Closed
11 tasks done

Transition to 128bit trace ids (still 64bit span ids) #1298

codefromthecrypt opened this issue Sep 15, 2016 · 19 comments
Labels
enhancement model Modeling of traces

Comments

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Sep 15, 2016

This tracks work needed to transition to 128bit trace ids (still 64bit span ids). The overall plan was decided in #1262

Wave 1: tolerate 128-bit trace ids

First step is to make it possible to receive 128-bit ids, even if the high bits are thrown out.
This decouples the rest of the project.

Wave 2: store 128-bit trace ids, but retrieve by lower 64-bits

Next step is to support storage of 128-bit ids. Once this is in, instrumentation can report them to zipkin. Understanding deployments will be mixed and not support 128-bit for a while, query remains 64-bit.

  • Add trace_id_high (default 0) to the thrift definition
  • Add traceIdHigh (default 0) to zipkin.Span and implement it in codecs
    • in json, it is simply a longer traceId field to accommodate the extra 16 hex characters
  • Ensure supported storage components are not lossy with regards to 128-bit ids
  • Ensure supported storage can query by 64-bit trace ids

Wave 3: Propagate and report 128-bit ids

Now that ids are storable, we can propagate and report them safely.

Wave 4: Support query by 128-bit id

While conflict is unlikely when querying trace ids by the lower 64-bits, we should be able to support queries and uniqueness by 128-bit.

  • add SpanStore.getTrace and getRawTrace with a 128-bit id
  • add tests that show traces that share the same lower 64 bits are not returned together
  • change zipkin-UI/api to accept the longer 128-bit (32char) hex id
@codefromthecrypt
Copy link
Member Author

Most libraries now accommodate 128bit trace ids. Next step is to make it possible to store them

@codefromthecrypt
Copy link
Member Author

@openzipkin/elasticsearch we've a bit of a crossroads wrt how to handle json. Right now, we do an equals query on .traceId. There's a prefix query, but no suffix one. What we want for mixed-deployments is right-most 16characters to equal an input. Any ideas on the index template or otherwise options for this?

@codefromthecrypt
Copy link
Member Author

on elasticsearch.. I think we could get by in the interim with a tokenizer that only captures lower bits, then use the existing equals terms query. Something like this..

        "traceId_tokenizer": {
          "type": "pattern",
          "pattern": "([0-9a-f]{1,16})$",
          "group": 1
        }

@mansu
Copy link

mansu commented Oct 25, 2016

Are we doing this on the read path or the write path?

@codefromthecrypt
Copy link
Member Author

Added much more detail to the plan and "wave 2" implementation #1353

@codefromthecrypt
Copy link
Member Author

Are we doing this on the read path or the write path?

Any settings like this would be in the index template, which would be on
the write path.

@codefromthecrypt
Copy link
Member Author

here's the final cut on elasticsearch. This will create 2 tokens for trace ids who are 128-bit and one if 64-bit. This allows existing tools which only support 64-bit ids (like SpanStore) to work, without limiting future tools from doing exact match on 128-bit. The cost is 2x the trace id tokens in sites who are 100% 128-bit. Since this is on a daily index basis, if the latter becomes a problem, one can either revise the index template manually, or we can as a general course. However, I suspect mixed-mode will exist for at least a year in many sites.

Ex this is the snipped of the analyzer in #1353 which supports mixed mode via custom tokenization. When the id is larger that 16 characters, it will end up creating 2 tokens (one for the original text and one for the truncated one). When the id is not larger than 16 characters, only one token is created for search purposes.

--snip--
        "traceId_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": "traceId_filter"
        }
      },
      "filter": {
        "traceId_filter": {
          "type": "pattern_capture",
          "patterns": ["([0-9a-f]{1,16})$"],
          "preserve_original": true
        }
      }
--snip--

@codefromthecrypt
Copy link
Member Author

added a couple notes, notably instrumentation will eventually need a flag to generate 128-bit traces (which implies sampling on 128-bits vs 64). Also, this implies adjusting the collector sampler so that it is consistent on 128bit as opposed to just 64. cc @basvanbeek

@codefromthecrypt
Copy link
Member Author

added issue twitter/finagle#564 to track finagle

@basvanbeek
Copy link
Member

the approach I took in zipkin-go-opentracing wrt sampling has been to just keep the sampler on the low 64 bit. There is plenty of resolution in the lower 64 bit to make the sampling decision and keeps it consistent in its behavior between 64bit and 128bit traces. I don't see the need or benefit of sampling on 128bit.

@codefromthecrypt
Copy link
Member Author

@basvanbeek I agree. removed updating sampling to 128-bit from the TODO section

codefromthecrypt pushed a commit to openzipkin/brave that referenced this issue Oct 27, 2016
Traditionally, Zipkin trace IDs were 64-bit. Starting with Zipkin 1.14,
128-bit trace identifiers are supported. This can be useful in sites that
have very large traffic volume, persist traces forever, or are re-using
externally generated 128-bit IDs.

If you want Brave to generate 128-bit trace identifiers when starting new
root spans, set `Brave.Builder.traceId128Bit(true)`

When 128-bit trace ids are propagated, they will be twice as long as
before. For example, the `X-B3-TraceId` header will hold a 32-character
value like `163ac35c9f6413ad48485a3953bb6124`.

Before doing this, ensure your Zipkin setup is up-to-date, and downstream
instrumented services can read the longer 128-bit trace IDs.

Note: this only affects the trace ID, not span IDs. For example, span ids
within a trace are always 64-bit.

See openzipkin/zipkin#1298
See openzipkin-contrib/zipkin-go-opentracing#31
@codefromthecrypt
Copy link
Member Author

working on the final lap in #1385 going under the assumption that we add an overloaded method of SpanStore.getTrace(long traceIdHigh, long traceId). We also need a parameter to tell control group-by, currently named groupByTraceIdHigh. None of this impacts the rest api or changes already made to codec.

If anyone has any style or naming guidance for these changes, do speak up!

@codefromthecrypt
Copy link
Member Author

Here's the other supporting change, which is a builder for StorageComponent. This is needed to move the "compatibility mode" parameter to where it can be generically assigned. This is important because in most storage engines, there's an additional cost to indexing on half of the 128-bit ID, so it should be possible to turn this off

  interface Builder {

    /**
     * Zipkin historically had 64-bit {@link Span#traceId trace IDs}, but it now supports 128-bit
     * trace IDs via {@link Span#traceIdHigh}, or its 32-character hex representation.
     *
     * <p>This setting allows you to look up traces who have 128-bit trace ids by the lower-64 bits
     * using {@link SpanStore#getTrace(long)}. In most implementations, this implies a second index
     * which can be more expensive in terms or storage or memory than only supporting 128-bit
     * lookups via {@link SpanStore#getTrace(long, long)}
     *
     * <p>This should be enabled until all instrumentation report 128-bit trace IDs consistently,
     * and {@link SpanStore#getTrace(long)} is no longer in use.
     */
    Builder mixedTraceIdLength(boolean mixedTraceIdLength);

    StorageComponent build();
  }

@codefromthecrypt
Copy link
Member Author

I've updated the name and the description of this setting. Please review as soon as you can, as the final wave of this effort is nearly complete.

Particularly, @michaelsembwever @mansu @basvanbeek and @shakuzen please review the below if you can, and let me know if there's anything about the name or description that seems confusing.


StorageComponent.Builder.upgradingTo128BitTraceId(boolean upgradingTo128BitTraceId)

Zipkin supports 64 and 128-bit trace identifiers, typically serialized as 16 or 32 character hex strings. When true, this setting only considers the low 64-bits (right-most 16 characters) of a trace ID when grouping or retrieving traces. This should be enabled when transitioning from 64 to 128-bit trace IDs and disabled after the transition is complete.

Details

Zipkin historically had 64-bit trace IDs, but it now supports 128-bit trace IDs via Span.traceIdHigh, or its 32-character hex representation. While instrumentation update to propagate 128-bit IDs, it can be ambiguous whether a 64-bit trace ID was sent intentionally, or as an accident of truncation. This setting allows Zipkin to be usable until application instrumentation are upgraded to support 128-bit trace IDs.

Here are a few trace IDs the help explain this setting.

  • Trace ID A: 463ac35c9f6413ad48485a3953bb6124
  • Trace ID B: 48485a3953bb6124
  • Trace ID C: 463ac35c9f6413adf1a48a8cff464e0e
  • Trace ID D: 463ac35c9f6413ad

In the above example, Trace ID A and Trace ID B might mean they are in the same trace, since the lower-64 bits of the IDs are the same. This could happen if a server A created the trace and propagated it to server B which ran an older tracing library. Server B could have truncated the trace ID to lower-64 bits. When upgradingTo128BitTraceId == true, spans matching either trace ID A or B would be returned in the same trace when searching by ID A or B. Spans with trace ID C or D wouldn't be when searching by ID A or B because trace IDs C and D don't share lower 64-bits (right-most 16 characters) with trace IDs A or B.

It is also possible that all servers are capable of handling 128-bit trace identifiers, but are configured to only send 64-bit ones. In this case, if upgradingTo128BitTraceId == true trace ID A and B would clash and be put into the same trace, causing confusion. Moreover, there is overhead associated with indexing spans both by 64 and 128-bit trace IDs. When a site has finished upgrading to 128-bit trace IDs, they should disable this setting.

See openzipkin/b3-propagation#6 for the status of known open source libraries on 128-bit trace identifiers.

@codefromthecrypt
Copy link
Member Author

one more bikeshed concern is the name of the parameter being described, particularly as we need to describe it as configuration potentially to non-english speakers.

  • upgradingTo128BitTraceId - describes well the act, but it is long and is lengthy converted to snake or upper underscore case
  • stepDownTraceIds - implies "stepping down" from 128 to 64-bit, which is a bit jargony. Ok to encode as an environment variable.
  • mixedTraceIdLength - discusses the problem itself, but not the context. Ex mixed length ids are fine regardless, provided they are in separate traces. Easy to encode as an environment variable.
  • ignoreTraceIdHigh - says what's happening, but not why or when. Easy to encode as an environment variable.

cc @abesto in case you have thoughts or other ideas..

@codefromthecrypt
Copy link
Member Author

strictTraceId is another option

@mansu
Copy link

mansu commented Nov 8, 2016

strictTraceId makes more sense than all the other options.

@codefromthecrypt
Copy link
Member Author

okie dokie

zipkin.storage.strict-trace-id as the config property, mapped to..
STRICT_TRACE_ID as the env variable (because we usually provide unscoped ones to make run statements less heinous).

default will be true as that will be cheaper.

unless I hear otherwise in the next day, these will be so!

@codefromthecrypt
Copy link
Member Author

going out as 1.15!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement model Modeling of traces
Projects
None yet
Development

No branches or pull requests

3 participants