Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-52744: Thread TraceContext as JSON inside debug_info #6230

Open
wants to merge 3 commits into
base: feature/perf
Choose a base branch
from

Conversation

GabrielBuica
Copy link
Contributor

@GabrielBuica GabrielBuica commented Jan 15, 2025

Adds functionality to marshal and unmarshal TraceContext in the tracing library.
Instead of passing only the traceparent in debug_info, the entire TraceContext is now passed as JSON.

This change enables the transfer of baggage across xenopsd boundaries, improving tracing propagation and debugging capabilities. This should also enable later use of baggage as means of passing the thread classification between components.

Latest BVT:211240

@GabrielBuica GabrielBuica force-pushed the private/dbuica/tracecontext-in-debuginfo branch from 9bc7086 to 39d5dfc Compare January 15, 2025 10:01

type baggage = (string * string) list
type baggage = (string * string) list [@@deriving yojson]
Copy link
Contributor

@edwintorok edwintorok Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually use rpcty for deriving (and then we can transform that into json with Rpcmarshal)
(and we can then also choose a more efficient serialization format in the future, I think when I last tested 'csexp' was faster than json)

Copy link
Contributor Author

@GabrielBuica GabrielBuica Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here ff91579.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that I agree with @edwintorok here; this is clearly unrelated to RPC, so I think deriving JSON is fine. However, I am not sure what happens here - what JSON are we expecting here and what is the relation with (string * string) list (a key-value list, I assume).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON is only used internally by xapi to serialize this field across calls between xapi and xenopsd.
The reason that using RPC instead of JSON would be better is that we can then more easily switch to more efficient formats if needed.
Nothing external to XAPI/xenopsd needs to parse these, so we can use whichever format is more efficient.

Although we could also use deriving sexp, and the csexp serialization directly too without going through RPC, in fact that would probably be more efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of baggage is metadata propagation across services, generally. Unless I'm mistaken, the line-separated traceparent stuff is also propagated, via xapi-storage-script, to SMAPIv3 storage plugins (?). I think JSON is the right choice, as it is more standard. There's serialisation efficiency problems in other parts of the product, but I don't think it's a huge concern here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we pass JSON, why do we need an assoc list? That is one my confusions. Or: are we representing the assoc list as JSON, so only using a small part of JSON to represent it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal format used by code is an assoc-list. The propagation of tracing across service boundaries involves marshalling these values in some way: in HTTP headers, you get traceparent and baggage entries (where baggage is encoded as a semicolon-delimited key-value list, e.g. k1=v1;k2=v2;...;kN=vN, see here).

For debug info, I think that's our own ad-hoc format for propagating across service boundaries - used for talking to both xenopsd and xapi-storage-script. Prior to baggage, only the traceparent was propagated, so I believe it was always just line-delimited (for example, it is passed via the dbg parameter for the methods used in SMAPIv3 - see here - for allowing storage plugins to submit spans that get properly parented). Moving towards a more standard format is a good idea generally.

The internal structure used to represent baggage is up for discussion (it was string Map.Make(String).t in an earlier effort, as far as I recall). The format used to propagate it across boundaries, using various transport mechanisms, is somewhat standardised for mainstream things like HTTP headers - for our own boundaries, like our rpclib interface dbg parameters, it can be whatever we want (I think we should favour mainstream formats).

Copy link
Contributor

@edwintorok edwintorok Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From mirage/ocaml-rpc@master...edwintorok:ocaml-rpc:csexp here is how to use CSexp instead of JSON.
In my benchmarks CSexp was faster than JSON:

(** {1 Convert S-expressions to and from strings using {!module:Csexp}, an efficient, yet still human-readable encoding.} *)

module Wire = Csexp.Make (Sexplib0.Sexp)

let wire_of_sexp = Wire.to_string

let sexp_of_wire input =
  match Wire.parse_string input with
  | Ok r -> r
  | Error (offset, message) -> raise (Parse_error { offset; message; input })

let to_string t = t |> sexp_of_t |> wire_of_sexp
let of_string s = s |> sexp_of_wire |> t_of_sexp

And then we can use @deriving sexp and wire_of_sexp to create the dbg string, and sexp_of_wire in the other direction.

(Note that Sexpressions in general are slower to serialize than JSON according to same benchmarks, so if we use SExp we need to ensure we use CSexp)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look over using CSexp

@GabrielBuica GabrielBuica force-pushed the private/dbuica/tracecontext-in-debuginfo branch from 5aec6e6 to ff91579 Compare January 16, 2025 09:55
ocaml/libs/tracing/tracing.ml Outdated Show resolved Hide resolved
let tracing =
Option.map (fun tp -> Tracer.span_of_span_context tp log) spancontext
Option.map (Fun.flip Tracer.span_of_span_context log) spancontext
in
{log; tracing}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, tracing here should have type TraceContext.t instead of Span.t. Otherwise, let's keep this in mind for a future PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make a separate PR. The underlying type is exposed in the mli file and it needs a few more changes.

Adds functionality to marshal and unmarshal `TraceContext` in the tracing
library.
Instead of passing only the `traceparent` in `debug_info`, the entire
`TraceContext` is now passed as JSON.

This change enables the transfer of baggage across xenopsd boundaries,
improving tracing propagation and debugging capabilities. This should
also enable later use of `baggage` as means of passing the thread
classification between components.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Refresh the trace_context with the correct traceparent when creating a
span with `start_tracing_helper` in `context.ml`.

This ensures the tracing of the context has the correct parent.
@GabrielBuica GabrielBuica force-pushed the private/dbuica/tracecontext-in-debuginfo branch from ff91579 to fc6d6e0 Compare January 23, 2025 09:29
@edwintorok
Copy link
Contributor

Note that we should merge the current feature/perf into master before merging more changes into feature/perf (otherwise we need to start from scratch with testing feature/perf), so lets wait with merging this PR a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants