cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

SAGA pattern observability

AndriiA
Visitor

Hello all,

One of our projects implements the SAGA pattern (choreography) where 3rd-party services in the service chain ignore or remove tracing headers, leading to the loss of a unified trace and the fragmentation of the trace into separate parts.

AndriiA_0-1725009991917.png

Instead, we would like to have the ability to track the entire lifecycle of such a transaction skipping 3rd parties where it possible. Therefore, we are considering using identifiers from the payload, which are available at any point in the transaction, as a binding component. We have considered two main options:

  1. User sessions: Each of the services available to us supports the same user session, using the payload ID as the session identifier.
  2. Distributed traces: Each of the services available to us generates a traceId based on a hash function of the payload ID if there is no tracing header in the incoming request.

However, all our experiments have been unsuccessful—OpenKits does not seem to support distributed sessions, and OneAgent does not allow generating traceId manually. If we disable OneAgent and send tracing information using OpenTelemetry, Dynatrace can display it as a unified trace, but it is unable to build any graphs due to the lack of metrics, making the analysis of such traces impossible.

Perhaps someone has ideas on how this can still be achieved, as Dynatrace is a powerful and flexible tool, and I hope I might be missing something?

 

 

3 REPLIES 3

Nick-Montana
Helper

Hi @AndriiA ,

 

I think this is a well-written and thought-provoking post! What programming language are you using? If the backend is written in Java/Spring have you looked through the OneAgentSDK (Java)? It a high degree of customization to the PurePath using the functions inside the created SDK instance. I suggest you continue testing/experimenting with an Active OneAgent & Agent SDK in order to maximize your vendor experience.

 

Are you deploying these services into containerized environments? Have you tried using the sendBizEvent() function inside OpenKit? You can send bizevents from each service and create a Business Flow application with Dynatrace Grail and correlate the bizevents with a manually created Id 

 

" If we disable OneAgent and send tracing information using OpenTelemetry, Dynatrace can display it as a unified trace, but it is unable to build any graphs due to the lack of metrics, making the analysis of such traces impossible"

While its recommended to use both Agent/Extension metrics, you can still get large amounts of metrics through OpenSource metric providers like OTEL and OpenKit. If you disable the OneAgent and exclusively export your manual traces I suggest to see what you can do with custom metrics via Meter Registries

 

 

OpenKit Metrics:

Client-Side Metric Expressions 

Action Count - builtin:apps.other.uaCount.geoAndApdex

Action Count - builtin:apps.other.uaCount.osAndApdex

Action Count - builtin:apps.other.uaCount.osAndVersion

Action Duration - builtin:apps.other.uaDuration.osAndVersion

Action Duration - builtin:apps.other.uaDuration.osAndVersion

Action Duration - builtin:apps.other.apdex.osAndVersion

Action Duration - builtin:apps.other.apdex.osAndGeo

Apdex - builtin:apps.other.apdex.osAndVersion

Apdex - builtin:apps.other.apdex.osAndVersion

Apdex - builtin:apps.other.apdex.osAndVersion

Crash Count - builtin:apps.other.crashCount.osAndVersion

Crash Count - builtin:apps.other.crashCount.osAndVersion-std

Crash Count - builtin:apps.other.crashCount.osAndGeo

Crash Count - builtin:apps.other.newUsers.os

builtin:apps.other.requestTimes.osAndVersion

builtin:apps.other.requestTimes.osAndVersion

builtin:apps.other.requestErrorRate.osAndVersion

builtin:apps.other.requestErrorRate.osAndVersion

Estimated Crash Free Users - builtin:apps.other.crashFreeUsersRate.os

Estimated Users affected by Crashes - builtin:apps.other.crashFreeUsersRate.os

Reported Error Count (by key user action, OS) - builtin:apps.other.apdex.osAndVersion

AndriiA
Visitor

Hello @Nick-Montana 

We use Java/Kotlin/JS-based services, both containerized and serverless. 
And we do not have a subscription to the Business Flow product.

My original question was more about how to correlate several separate traces into one. I was expecting that the SDK would allow us to override the traceId value so that we could correlate it with a unique value from the payload.

But it seems that at the moment this is not possible.



Hey,

Yeah so as you mention I don't think the distributed trace will be the method that can give you what you're looking for.

 

I do think your best option remains OpenKit instrumented user sessions. You mentioned that all your experimentation didn't work out. Can you provide more details on what you've done so far? 

Featured Posts