Performance Debugging 101 with Dynatrace – Tips and Tricks

andreas_grabner · ‎17 Mar 2026

Diagnosing Performance Issues Through Logs, Metrics, and Traces — Patterns I Keep Seeing (and You Probably Do Too) by Andi Grabner, Performance Enthusiast

Over the past years, I’ve spent a good chunk of my life staring at logs, metrics, and traces. I remember the early days with Dynatrace AppMon and the “3 clicks to root cause!” And honestly? While the tools have adapted and changed: I still love digging through all those signals. Every slowdown has its own story, every anomaly hides a pattern, and every distributed trace is like a treasure map for performance engineers like us.

In my latest Observability Lab (formerly I called them Clinics), I walked through one of these diagnostic journeys using the Dynatrace Services App. What I love most about our platform is how quickly it helps you go from “something feels slow” to “ah, that’s the root cause.”

👉 Here’s the link to the video if you want to follow along: Why Is My Service Slow? Root Cause Analysis with Dynatrace Distributed Tracing

But even without the UI in front of us, there are universal performance patterns that keep showing up across applications, teams, and technologies. I want to highlight some of these because once you learn to spot them, performance analysis becomes less guesswork and more… well… fun.

And since I always enjoy learning from the community, I’d love to hear which patterns you keep bumping into.

Patterns That Jump Out When You Combine Logs, Metrics & Traces

Some of the patterns every performance engineer should know how to identify

Inefficient Algorithms — The Silent Response Time Killers

I’ve seen it countless times: response time histograms shift to the right while throughput stays flat. That’s usually the moment my inner performance detective whispers: “We’ve got an inefficient code path somewhere.”

Identifying the slow methods or algorithms in a trace is easy!

Whether it’s:

O(n²) loops

Heavy data processing

A slow library call

…you’ll always find a single function or method dominating the trace.

Recursive or Repetitive Work

Nothing drains CPU (and patience) faster than recursion gone wild.

In the video example, thousands of image requests triggered the same resizing logic over and over again—even when the image size should have been cached. The traces made this obvious in seconds.

If you ever see a repeated call pattern stacking up like a staircase in a trace: congratulations, you’ve found your culprit.

The (In)Famous N+1 Query Problem

Ah yes… the classic.

N+1 to the Database is very common. Just look at the number of SQL Executions

You know you’ve hit an N+1 pattern when you see:

A single request spawning dozens or hundreds of identical DB queries

Spikes in DB time during load

Log spam with repeated SELECTs

Even after all these years, it’s still one of the most common performance issues in microservices.

Too Many Unnecessary Downstream Calls

Sometimes services call other services simply because… they always have. Legacy reasons. Historical reasons. “We didn’t question it” reasons. Or – misconfiguration of an API Gateway!

Traces reveal these instantly:

Waterfall-style call chains

Outgoing requests that don’t change the result

Chains of dependencies that slow everything down

Removing one unnecessary downstream call can speed up an entire user journey.

Missing (or Broken) Caching Layers

You would not believe how often I see this.

Whenever I notice:

Repeated identical incoming requests

Constant recomputation in logs

Cache hit ratios at zero

…I know we’re about to uncover a missing or misconfigured cache.

In the video case, the frontend continuously resized product images—thousands of times—because the resized version wasn’t being cached properly.

From the YouTube tutorial on detecting the reasons for slow requests!

Unhandled Exceptions and Error Loops

Nothing derails performance like an exception silently looping in the background.

While exceptions are useful for diagnostics they have an impact on performance

These typically show up as:

Sudden spikes in error logs

Traces abruptly terminating

Fallback logic being executed more often than expected

Even when these errors aren’t breaking anything, they’re usually slowing everything down. Also remember: Exceptions with full stack trace have to be kept in memory first. That results in higher memory utilization, garbage collection which then needs more CPU to clean up unused data!

Your Turn — What Patterns Do You Look For?

These are just some of the recurring patterns I’ve collected over years of analyzing distributed systems. But I know every engineer has their own mental shortcuts and “go‑to” indicators.

So I’d love to hear from you:

👉 What’s the first pattern you look for when something gets slow?

👉 What’s the most interesting performance issue you’ve ever diagnosed?

Drop a comment, send me a message, or share a story—let’s turn this into a growing list of performance wisdom from the community.

And if you want to try out everything I showed in the video hands-on, remember:

You can explore all of this in the Dynatrace Playground.

🎥 For more inspiration also watch my video on How to detect bad patterns in logs and traces!

Looking forward to your stories—and see you in the next Observability Lab!

— Andi