Transition to 128bit trace ids (still 64bit span ids) #1298

codefromthecrypt · 2016-09-15T08:16:42Z

This tracks work needed to transition to 128bit trace ids (still 64bit span ids). The overall plan was decided in #1262

Wave 1: tolerate 128-bit trace ids

First step is to make it possible to receive 128-bit ids, even if the high bits are thrown out.
This decouples the rest of the project.

Tolerate 128bit trace ids when reading json and api requests - Fixed in Zipkin 1.11.1
Tolerate 128bit trace ids when reading X-B3-TraceId

Wave 2: store 128-bit trace ids, but retrieve by lower 64-bits

Next step is to support storage of 128-bit ids. Once this is in, instrumentation can report them to zipkin. Understanding deployments will be mixed and not support 128-bit for a while, query remains 64-bit.

Add trace_id_high (default 0) to the thrift definition
Add traceIdHigh (default 0) to zipkin.Span and implement it in codecs
- in json, it is simply a longer traceId field to accommodate the extra 16 hex characters
Ensure supported storage components are not lossy with regards to 128-bit ids
Ensure supported storage can query by 64-bit trace ids

Wave 3: Propagate and report 128-bit ids

Now that ids are storable, we can propagate and report them safely.

publish B3 library with 128-bit support (including binary representation)
Update instrumentation libraries to optionally start 128-bit traces See Track support of 128 bit X-B3-TraceId b3-propagation#6 for status

Wave 4: Support query by 128-bit id

While conflict is unlikely when querying trace ids by the lower 64-bits, we should be able to support queries and uniqueness by 128-bit.

add SpanStore.getTrace and getRawTrace with a 128-bit id
add tests that show traces that share the same lower 64 bits are not returned together
change zipkin-UI/api to accept the longer 128-bit (32char) hex id

See #1298

codefromthecrypt · 2016-10-11T02:01:29Z

Most libraries now accommodate 128bit trace ids. Next step is to make it possible to store them

codefromthecrypt · 2016-10-25T04:22:52Z

@openzipkin/elasticsearch we've a bit of a crossroads wrt how to handle json. Right now, we do an equals query on .traceId. There's a prefix query, but no suffix one. What we want for mixed-deployments is right-most 16characters to equal an input. Any ideas on the index template or otherwise options for this?

codefromthecrypt · 2016-10-25T04:37:11Z

on elasticsearch.. I think we could get by in the interim with a tokenizer that only captures lower bits, then use the existing equals terms query. Something like this..

        "traceId_tokenizer": {
          "type": "pattern",
          "pattern": "([0-9a-f]{1,16})$",
          "group": 1
        }

mansu · 2016-10-25T06:06:57Z

Are we doing this on the read path or the write path?

codefromthecrypt · 2016-10-25T10:28:14Z

Added much more detail to the plan and "wave 2" implementation #1353

codefromthecrypt · 2016-10-25T18:39:31Z

Are we doing this on the read path or the write path?

Any settings like this would be in the index template, which would be on
the write path.

codefromthecrypt · 2016-10-26T02:12:47Z

here's the final cut on elasticsearch. This will create 2 tokens for trace ids who are 128-bit and one if 64-bit. This allows existing tools which only support 64-bit ids (like SpanStore) to work, without limiting future tools from doing exact match on 128-bit. The cost is 2x the trace id tokens in sites who are 100% 128-bit. Since this is on a daily index basis, if the latter becomes a problem, one can either revise the index template manually, or we can as a general course. However, I suspect mixed-mode will exist for at least a year in many sites.

Ex this is the snipped of the analyzer in #1353 which supports mixed mode via custom tokenization. When the id is larger that 16 characters, it will end up creating 2 tokens (one for the original text and one for the truncated one). When the id is not larger than 16 characters, only one token is created for search purposes.

--snip--
        "traceId_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": "traceId_filter"
        }
      },
      "filter": {
        "traceId_filter": {
          "type": "pattern_capture",
          "patterns": ["([0-9a-f]{1,16})$"],
          "preserve_original": true
        }
      }
--snip--

codefromthecrypt · 2016-10-27T03:54:00Z

added a couple notes, notably instrumentation will eventually need a flag to generate 128-bit traces (which implies sampling on 128-bits vs 64). Also, this implies adjusting the collector sampler so that it is consistent on 128bit as opposed to just 64. cc @basvanbeek

codefromthecrypt · 2016-10-27T08:14:44Z

added issue twitter/finagle#564 to track finagle

basvanbeek · 2016-10-27T08:45:35Z

the approach I took in zipkin-go-opentracing wrt sampling has been to just keep the sampler on the low 64 bit. There is plenty of resolution in the lower 64 bit to make the sampling decision and keeps it consistent in its behavior between 64bit and 128bit traces. I don't see the need or benefit of sampling on 128bit.

codefromthecrypt · 2016-10-27T09:05:03Z

@basvanbeek I agree. removed updating sampling to 128-bit from the TODO section

Traditionally, Zipkin trace IDs were 64-bit. Starting with Zipkin 1.14, 128-bit trace identifiers are supported. This can be useful in sites that have very large traffic volume, persist traces forever, or are re-using externally generated 128-bit IDs. If you want Brave to generate 128-bit trace identifiers when starting new root spans, set `Brave.Builder.traceId128Bit(true)` When 128-bit trace ids are propagated, they will be twice as long as before. For example, the `X-B3-TraceId` header will hold a 32-character value like `163ac35c9f6413ad48485a3953bb6124`. Before doing this, ensure your Zipkin setup is up-to-date, and downstream instrumented services can read the longer 128-bit trace IDs. Note: this only affects the trace ID, not span IDs. For example, span ids within a trace are always 64-bit. See openzipkin/zipkin#1298 See openzipkin-contrib/zipkin-go-opentracing#31

codefromthecrypt · 2016-11-03T14:03:55Z

working on the final lap in #1385 going under the assumption that we add an overloaded method of SpanStore.getTrace(long traceIdHigh, long traceId). We also need a parameter to tell control group-by, currently named groupByTraceIdHigh. None of this impacts the rest api or changes already made to codec.

If anyone has any style or naming guidance for these changes, do speak up!

codefromthecrypt · 2016-11-05T02:39:18Z

Here's the other supporting change, which is a builder for StorageComponent. This is needed to move the "compatibility mode" parameter to where it can be generically assigned. This is important because in most storage engines, there's an additional cost to indexing on half of the 128-bit ID, so it should be possible to turn this off

  interface Builder {

    /**
     * Zipkin historically had 64-bit {@link Span#traceId trace IDs}, but it now supports 128-bit
     * trace IDs via {@link Span#traceIdHigh}, or its 32-character hex representation.
     *
     * <p>This setting allows you to look up traces who have 128-bit trace ids by the lower-64 bits
     * using {@link SpanStore#getTrace(long)}. In most implementations, this implies a second index
     * which can be more expensive in terms or storage or memory than only supporting 128-bit
     * lookups via {@link SpanStore#getTrace(long, long)}
     *
     * <p>This should be enabled until all instrumentation report 128-bit trace IDs consistently,
     * and {@link SpanStore#getTrace(long)} is no longer in use.
     */
    Builder mixedTraceIdLength(boolean mixedTraceIdLength);

    StorageComponent build();
  }

codefromthecrypt · 2016-11-08T04:31:45Z

I've updated the name and the description of this setting. Please review as soon as you can, as the final wave of this effort is nearly complete.

Particularly, @michaelsembwever @mansu @basvanbeek and @shakuzen please review the below if you can, and let me know if there's anything about the name or description that seems confusing.

StorageComponent.Builder.upgradingTo128BitTraceId(boolean upgradingTo128BitTraceId)

Zipkin supports 64 and 128-bit trace identifiers, typically serialized as 16 or 32 character hex strings. When true, this setting only considers the low 64-bits (right-most 16 characters) of a trace ID when grouping or retrieving traces. This should be enabled when transitioning from 64 to 128-bit trace IDs and disabled after the transition is complete.

Details

Zipkin historically had 64-bit trace IDs, but it now supports 128-bit trace IDs via Span.traceIdHigh, or its 32-character hex representation. While instrumentation update to propagate 128-bit IDs, it can be ambiguous whether a 64-bit trace ID was sent intentionally, or as an accident of truncation. This setting allows Zipkin to be usable until application instrumentation are upgraded to support 128-bit trace IDs.

Here are a few trace IDs the help explain this setting.

Trace ID A: 463ac35c9f6413ad48485a3953bb6124
Trace ID B: 48485a3953bb6124
Trace ID C: 463ac35c9f6413adf1a48a8cff464e0e
Trace ID D: 463ac35c9f6413ad

In the above example, Trace ID A and Trace ID B might mean they are in the same trace, since the lower-64 bits of the IDs are the same. This could happen if a server A created the trace and propagated it to server B which ran an older tracing library. Server B could have truncated the trace ID to lower-64 bits. When upgradingTo128BitTraceId == true, spans matching either trace ID A or B would be returned in the same trace when searching by ID A or B. Spans with trace ID C or D wouldn't be when searching by ID A or B because trace IDs C and D don't share lower 64-bits (right-most 16 characters) with trace IDs A or B.

It is also possible that all servers are capable of handling 128-bit trace identifiers, but are configured to only send 64-bit ones. In this case, if upgradingTo128BitTraceId == true trace ID A and B would clash and be put into the same trace, causing confusion. Moreover, there is overhead associated with indexing spans both by 64 and 128-bit trace IDs. When a site has finished upgrading to 128-bit trace IDs, they should disable this setting.

See openzipkin/b3-propagation#6 for the status of known open source libraries on 128-bit trace identifiers.

codefromthecrypt · 2016-11-08T05:27:41Z

one more bikeshed concern is the name of the parameter being described, particularly as we need to describe it as configuration potentially to non-english speakers.

upgradingTo128BitTraceId - describes well the act, but it is long and is lengthy converted to snake or upper underscore case
stepDownTraceIds - implies "stepping down" from 128 to 64-bit, which is a bit jargony. Ok to encode as an environment variable.
mixedTraceIdLength - discusses the problem itself, but not the context. Ex mixed length ids are fine regardless, provided they are in separate traces. Easy to encode as an environment variable.
ignoreTraceIdHigh - says what's happening, but not why or when. Easy to encode as an environment variable.

cc @abesto in case you have thoughts or other ideas..

codefromthecrypt · 2016-11-08T05:44:17Z

strictTraceId is another option

mansu · 2016-11-08T06:22:58Z

strictTraceId makes more sense than all the other options.

codefromthecrypt · 2016-11-09T00:53:40Z

okie dokie

zipkin.storage.strict-trace-id as the config property, mapped to..
STRICT_TRACE_ID as the env variable (because we usually provide unscoped ones to make run statements less heinous).

default will be true as that will be cheaper.

unless I hear otherwise in the next day, these will be so!

codefromthecrypt · 2016-11-12T04:27:32Z

going out as 1.15!

codefromthecrypt mentioned this issue Sep 15, 2016

RFC: Transition to 128bit trace ids (still 64bit span ids) #1262

Closed

codefromthecrypt pushed a commit that referenced this issue Sep 15, 2016

Tolerates 128bit trace ids in ZipkinRule

97e061e

See #1298

codefromthecrypt mentioned this issue Sep 15, 2016

Tolerates 128bit trace ids in ZipkinRule #1299

Merged

codefromthecrypt pushed a commit that referenced this issue Sep 15, 2016

Tolerates 128bit trace ids in ZipkinRule (#1299)

2b17d95

See #1298

basvanbeek mentioned this issue Oct 11, 2016

WIP: Support for 128 bit TraceIDs openzipkin-contrib/zipkin-go-opentracing#31

Merged

codefromthecrypt mentioned this issue Oct 25, 2016

128-bit trace ID part 2: store 128-bit trace ids, but retrieve by lower 64-bits #1353

Merged

codefromthecrypt mentioned this issue Oct 27, 2016

Supports 128-bit trace IDs, generating them on traceId128Bit(true) openzipkin/brave#258

Merged

codefromthecrypt closed this as completed Nov 12, 2016

mclarke47 mentioned this issue Jul 26, 2017

Zipkin follow ups envoyproxy/envoy#960

Closed

3 tasks

codefromthecrypt added the enhancement label Oct 26, 2018

codefromthecrypt added the model Modeling of traces label Oct 26, 2018

basvanbeek mentioned this issue Jul 4, 2024

OpenZipkin B3 Header behavior inconsistent to OpenTelemetry Propagator B3 openzipkin/b3-propagation#47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transition to 128bit trace ids (still 64bit span ids) #1298

Transition to 128bit trace ids (still 64bit span ids) #1298

codefromthecrypt commented Sep 15, 2016 •

edited

Loading

codefromthecrypt commented Oct 11, 2016

codefromthecrypt commented Oct 25, 2016

codefromthecrypt commented Oct 25, 2016

mansu commented Oct 25, 2016

codefromthecrypt commented Oct 25, 2016

codefromthecrypt commented Oct 25, 2016

codefromthecrypt commented Oct 26, 2016

codefromthecrypt commented Oct 27, 2016

codefromthecrypt commented Oct 27, 2016

basvanbeek commented Oct 27, 2016

codefromthecrypt commented Oct 27, 2016

codefromthecrypt commented Nov 3, 2016

codefromthecrypt commented Nov 5, 2016

codefromthecrypt commented Nov 8, 2016

codefromthecrypt commented Nov 8, 2016

codefromthecrypt commented Nov 8, 2016

mansu commented Nov 8, 2016 •

edited

Loading

codefromthecrypt commented Nov 9, 2016

codefromthecrypt commented Nov 12, 2016

Transition to 128bit trace ids (still 64bit span ids) #1298

Transition to 128bit trace ids (still 64bit span ids) #1298

Comments

codefromthecrypt commented Sep 15, 2016 • edited Loading

Wave 1: tolerate 128-bit trace ids

Wave 2: store 128-bit trace ids, but retrieve by lower 64-bits

Wave 3: Propagate and report 128-bit ids

Wave 4: Support query by 128-bit id

codefromthecrypt commented Oct 11, 2016

codefromthecrypt commented Oct 25, 2016

codefromthecrypt commented Oct 25, 2016

mansu commented Oct 25, 2016

codefromthecrypt commented Oct 25, 2016

codefromthecrypt commented Oct 25, 2016

codefromthecrypt commented Oct 26, 2016

codefromthecrypt commented Oct 27, 2016

codefromthecrypt commented Oct 27, 2016

basvanbeek commented Oct 27, 2016

codefromthecrypt commented Oct 27, 2016

codefromthecrypt commented Nov 3, 2016

codefromthecrypt commented Nov 5, 2016

codefromthecrypt commented Nov 8, 2016

Details

codefromthecrypt commented Nov 8, 2016

codefromthecrypt commented Nov 8, 2016

mansu commented Nov 8, 2016 • edited Loading

codefromthecrypt commented Nov 9, 2016

codefromthecrypt commented Nov 12, 2016

codefromthecrypt commented Sep 15, 2016 •

edited

Loading

mansu commented Nov 8, 2016 •

edited

Loading