distributed-tracing

Trace context HTTP header format rationale

This document provides rationale for the decisions made, mapping the traceparent and tracestate fields to HTTP headers.

Lowercase concatenated header names

While HTTP headers are conventionally delimited by hyphens, the trace context header names are not. Rather, they are lowercase concatenated traceparent and tracestate respectively. The departure from convention is due to practical concerns of propagation. Trace context is unlike typical HTTP headers, which are point-to-point and do not propagate through other systems like messaging. Different systems have different constraints. For example, some cannot read case insensitively, and others forbid the hyphen character. Even if we could suggest not using the same format for such systems, we know many systems transparently copy HTTP headers into fields. This class of concerns only exist when we choose to support mixed case with hyphens. By choosing not to, we open trace context integration beyond HTTP at the cost of a conventional distraction.

All parts of traceparent are required

We’ve been discussing to make parts of the traceparent header optional. One proposal we declined was to allow trace-id-only traceparent headers. The intended use was to save size for small clients (like mobile devices) initiating the call. The rationale for declining it was to avoid abuse and confusion. A suggestion that we want to discuss on saving size is to use binary format.

Making trace-flags optional doesn’t save a lot, but makes specification more complicated. And it potentially can lead to incompatible implementations which do not expect trace-flags.

tracestate

Ordering of keys in tracestate

The specification calls for ordering of values in tracestate. This requirement allows better interoperability between tracing vendors.

A typical distributed trace is clustered - components calling each other are often monitored by the same tracing vendor. So information supplied by the tracing system which originated a request will typically be less and less important deeper in a distributed trace. Immediate caller’s information on the other hand typically is more valuable as it is more likely being monitored by the same tracing vendor. Thus, it is logical to move immediate caller’s information to the beginning of the tracestate list. So less important values will be pushed to the end of the list.

This prioritization of tracestate values improves performance of querying the value of tracestate - typically you only need a first pair. It also allows you to meaningfully truncate tracestate when required instead of dropping the entire list of values.

Mutations of tracestate

Two questions that comes up frequently is whether the tracestate header HAVE TO be mutated on every mutation of span-id to identify the vendor which made this change and whether two different vendors can modify the tracestate entries in a single component.

This requirement may improve interoperability between vendors. For instance, a vendor may check the first tracestate key and provide some additional value for the customer by adjusting data collection in the current component via the knowledge of a caller’s behavior. For instance, applying specific sampling policies or providing an experience for customers to get data from the caller’s vendor. There are more scenarios that might be simplified by strict mutation requirements.

Even though improved interoperability will enable more scenarios, the specification does not restrict the number of mutations of tracestate and doesn’t require the mutation.

The main reason for not requiring the mutation is generic tracers. Generic tracers are tracers which don’t need to carry any information via tracestate and/or don’t have a single back-end where this data will be stored. The only thing a generic tracer can set in tracestate is either a key with some constant, an empty value or a copy of traceparent. Neither of those details is particularly interesting for the callee. But a requirement puts an extra burden and complexity on implementors. Another reason for not requiring a mutation is that allowing multiple mutations may require vendors to check for more than one key anyway.

Some back-end neutral SDKs may be implemented so that destination back-end is decided via side-car or out-of-process agent configuration. In such cases a customer may decide to enable more than one headers’ mutation logic in in-process SDK. Another requirement for multiple mutations is that in a similar environment where the back-end destination is decided via out-of-process configuration - certain header mutations may still be required. An example may be smart sampling mechanisms that rely on additional data propagation in tracestate.

Size limits

Total size limit

Header should be small so providers can satisfy the requirement to pass the value all the time.

512 bytes looks like a reasonable compromise.

TODO: put more thoughts into it

Maximum number of elements

Here are some rationals and assumptions:

Based on these assumptions and rationals a maximum number of elements of 32 looks like a reasonable compromise.

Forcing lower case tracestate names

Lowercase names have a few benefits:

String encoding of names

Url encoding is low-overhead way to encode unicode characters for non-latin characters in the values. Url encoding keeps a single words in latin unchanged and easily readable.

Vendor name in a key

Sign @ is allowed in a key for easy parsing of vendor name out of the tracestate key. The idea is that with the registry of trace vendors one can easily understand the vendor name and how to parse it’s trace state. Without @ sign parsing will be more complicated. Also @ sign has known semantics in addressing for protocols like ftp and e-mails.

Versioning

Versioning options are:

  1. Pass thru unknown headers
  2. Re-start trace when you see unknown header
  3. Try to parse trace following some rules when you see unknown header

One variation is whether original or new header that you cannot recognize is preserved in tracestate.

Response headers

TL;DR; There are many scenarios where collaboration between distributed tracing vendors require writing and reading response headers. We can see that this can have value, but don’t think right now is the right time to standardize. We decided we would rather wait for individual vendors to start to collaborate over response headers and later decide which scenarios are worth standardizing. Use of traceparent and tracestate headers is not forbidden in response headers.

Use Cases

  1. Restart a trace and return new trace identification information to caller.
  2. Send Tenant ID/identity of the service so caller knows where to query telemetry from.
  3. Notify upstream to sample trace sending a sampling flag (+ sampling score) for delegated sampling.
  4. Report back data to the caller (like server timing, method name, application type and version). E.g. for HTTP call - caller only knows the url when server knows route information. Route information may be helpful to caller to group outgoing requests.

Open Issues

Potential Content of a Header

Problems*