This document provides rationale for the decisions made, mapping the
traceparent
and tracestate
fields to HTTP headers.
While HTTP headers are conventionally delimited by hyphens, the trace context
header names are not. Rather, they are lowercase concatenated traceparent
and
tracestate
respectively. The departure from convention is due to practical
concerns of propagation. Trace context is unlike typical HTTP headers, which
are point-to-point and do not propagate through other systems like messaging.
Different systems have different constraints. For example, some cannot read
case insensitively, and others forbid the hyphen character. Even if we could
suggest not using the same format for such systems, we know many systems transparently
copy HTTP headers into fields. This class of concerns only exist when we choose
to support mixed case with hyphens. By choosing not to, we open trace context
integration beyond HTTP at the cost of a conventional distraction.
traceparent
are requiredWe’ve been discussing to make parts of the traceparent
header optional. One proposal we declined was to allow trace-id-only traceparent
headers. The intended use was to save size for small clients (like mobile devices) initiating the call. The rationale for declining it was to avoid abuse and confusion. A suggestion that we want to discuss on saving size is to use binary format.
Making trace-flags
optional doesn’t save a lot, but makes specification more complicated. And it potentially can lead to incompatible implementations which do not expect trace-flags
.
tracestate
traceparent
. Arbitrary non-tracing system entries is a non use case.traceparent
format or an opaque string.tracestate
The specification calls for ordering of values in tracestate. This requirement allows better interoperability between tracing vendors.
A typical distributed trace is clustered - components calling each other are often monitored
by the same tracing vendor. So information supplied by the tracing system which originated a
request will typically be less and less important deeper in a distributed trace. Immediate
caller’s information on the other hand typically is more valuable as it is more likely being
monitored by the same tracing vendor. Thus, it is logical to move immediate caller’s
information to the beginning of the tracestate
list. So less important values will be
pushed to the end of the list.
This prioritization of tracestate
values improves performance of querying the value of
tracestate - typically you only need a first pair. It also allows you to meaningfully truncate
tracestate
when required instead of dropping the entire list of values.
tracestate
Two questions that comes up frequently is whether the tracestate
header HAVE TO
be mutated on every mutation of span-id
to identify the vendor which made this
change and whether two different vendors can modify the tracestate
entries in
a single component.
This requirement may improve interoperability between vendors. For instance,
a vendor may check the first tracestate
key and provide some additional value
for the customer by adjusting data collection in the current component via the
knowledge of a caller’s behavior. For instance, applying specific sampling policies
or providing an experience for customers to get data from the caller’s vendor. There
are more scenarios that might be simplified by strict mutation requirements.
Even though improved interoperability will enable more scenarios, the specification
does not restrict the number of mutations of tracestate
and doesn’t require the
mutation.
The main reason for not requiring the mutation is generic tracers. Generic tracers
are tracers which don’t need to carry any information via tracestate
and/or
don’t have a single back-end where this data will be stored. The only thing a
generic tracer can set in tracestate
is either a key with some constant, an
empty value or a copy of traceparent
. Neither of those details is particularly
interesting for the callee. But a requirement puts an extra burden and complexity on
implementors. Another reason for not requiring a mutation is that allowing
multiple mutations may require vendors to check for more than one key anyway.
Some back-end neutral SDKs may be implemented so that destination back-end
is decided via side-car or out-of-process agent configuration. In such cases
a customer may decide to enable more than one headers’ mutation logic in
in-process SDK. Another requirement for multiple mutations is that in a similar
environment where the back-end destination is decided via out-of-process
configuration - certain header mutations may still be required. An example may be
smart sampling mechanisms that rely on additional data propagation in
tracestate
.
Header should be small so providers can satisfy the requirement to pass the value all the time.
512 bytes looks like a reasonable compromise.
TODO: put more thoughts into it
Here are some rationals and assumptions:
msft
, goog
, etc).Based on these assumptions and rationals a maximum number of elements of 32 looks like a reasonable compromise.
Lowercase names have a few benefits:
Url encoding is low-overhead way to encode unicode characters for non-latin characters in the values. Url encoding keeps a single words in latin unchanged and easily readable.
Sign @
is allowed in a key for easy parsing of vendor name out of the tracestate key. The idea is that with the registry of trace vendors one can easily understand the vendor name and how to parse it’s trace state. Without @
sign parsing will be more complicated. Also @
sign has known semantics in
addressing for protocols like ftp and e-mails.
Versioning options are:
One variation is whether original or new header that you cannot recognize is preserved in tracestate
.
Option 2 is better. It’s easy, doesn’t restrict future version in any way and re-started trace should be understood by new systems. So only one “connection” is lost. And the lost connection issue can be solved by storing the original header in tracestate
. Drawbacks are also obvious. First, single old component always breaks traces. Second, it’s harder to transition without customer disatisfaction of broken traces.
Storing original value also has negative effects. Valid traceparent
is 55 characters (out of 512 allowed for tracestate
). And “bad” headers could be much longer pushing valuable tracestate
pairs out. Also this requirement increases the chance of abuse. When a bad actor will start sending a header with the version 99
that is only understood by that actor. And the fact that every system passes thru the original value allows this actor to build a complete solution based on this header.
trace-id
than span-id
. Assuming span-id
size or format may change without changing trace-id
. However the majority sees potential for abuse here. So we suggest to force future versions to be additive to the current format. And if parsing fails at any stage - simply restart the trace.TL;DR; There are many scenarios where collaboration between distributed tracing vendors require writing and reading response headers.
We can see that this can have value, but don’t think right now is the right time to standardize. We decided we would rather wait for
individual vendors to start to collaborate over response headers and later decide which scenarios are worth standardizing. Use of traceparent
and
tracestate
headers is not forbidden in response headers.
traceparent
can be used for use cases 1 and 3 (identity and deferred sampling).tracestate
-like header can be used for all use cases.