Community Voices
Explore thought pieces, announcements and product news
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
andreas_grabner
Dynatrace Guru
Dynatrace Guru

Diagnosing Performance Issues Through Logs, Metrics, and Traces — Patterns I Keep Seeing (and You Probably Do Too) by Andi Grabner, Performance Enthusiast 

 

Over the past years, I’ve spent a good chunk of my life staring at logs, metrics, and traces. I remember the early days with Dynatrace AppMon and the “3 clicks to root cause!” And honestly? While the tools have adapted and changed: I still love digging through all those signals. Every slowdown has its own story, every anomaly hides a pattern, and every distributed trace is like a treasure map for performance engineers like us. 

In my latest Observability Lab (formerly I called them Clinics), I walked through one of these diagnostic journeys using the Dynatrace Services App. What I love most about our platform is how quickly it helps you go from “something feels slow” to “ah, that’s the root cause.” 

 

andreas_grabner_0-1773660500855.png

 

👉 Here’s the link to the video if you want to follow along: Why Is My Service Slow? Root Cause Analysis with Dynatrace Distributed Tracing   

 

But even without the UI in front of us, there are universal performance patterns that keep showing up across applications, teams, and technologies. I want to highlight some of these because once you learn to spot them, performance analysis becomes less guesswork and more… well… fun. 

And since I always enjoy learning from the community, I’d love to hear which patterns you keep bumping into. 

 

Patterns That Jump Out When You Combine Logs, Metrics & Traces 

Some of the patterns every performance engineer should know how to identifySome of the patterns every performance engineer should know how to identify

 

  1. Inefficient Algorithms — The Silent Response Time Killers 

I’ve seen it countless times: response time histograms shift to the right while throughput stays flat. That’s usually the moment my inner performance detective whispers: “We’ve got an inefficient code path somewhere.” 

Identifying the slow methods or algorithms in a trace is easy!Identifying the slow methods or algorithms in a trace is easy!

 

Whether it’s: 

  • O(n²) loops 
  • Heavy data processing 
  • A slow library call 

…you’ll always find a single function or method dominating the trace. 

 

  1. Recursive or Repetitive Work 

Nothing drains CPU (and patience) faster than recursion gone wild. 

In the video example, thousands of image requests triggered the same resizing logic over and over again—even when the image size should have been cached. The traces made this obvious in seconds. 

If you ever see a repeated call pattern stacking up like a staircase in a trace: congratulations, you’ve found your culprit. 

 

  1. The (In)Famous N+1 Query Problem 

Ah yes… the classic. 

N+1 to the Database is very common. Just look at the number of SQL ExecutionsN+1 to the Database is very common. Just look at the number of SQL Executions

 

You know you’ve hit an N+1 pattern when you see: 

  • A single request spawning dozens or hundreds of identical DB queries 
  • Spikes in DB time during load 
  • Log spam with repeated SELECTs 

Even after all these years, it’s still one of the most common performance issues in microservices. 

 

  1. Too Many Unnecessary Downstream Calls 

Sometimes services call other services simply because… they always have. Legacy reasons. Historical reasons. “We didn’t question it” reasons. Or – misconfiguration of an API Gateway! 

Traces reveal these instantly: 

  • Waterfall-style call chains 
  • Outgoing requests that don’t change the result 
  • Chains of dependencies that slow everything down 

Removing one unnecessary downstream call can speed up an entire user journey. 

 

  1. Missing (or Broken) Caching Layers 

You would not believe how often I see this. 

Whenever I notice: 

  • Repeated identical incoming requests 
  • Constant recomputation in logs 
  • Cache hit ratios at zero 

…I know we’re about to uncover a missing or misconfigured cache. 

In the video case, the frontend continuously resized product images—thousands of times—because the resized version wasn’t being cached properly. 

From the YouTube tutorial on detecting the reasons for slow requests!From the YouTube tutorial on detecting the reasons for slow requests!

 

  1. Unhandled Exceptions and Error Loops 

Nothing derails performance like an exception silently looping in the background. 

While exceptions are useful for diagnostics they have an impact on performanceWhile exceptions are useful for diagnostics they have an impact on performance

 

These typically show up as: 

  • Sudden spikes in error logs 
  • Traces abruptly terminating 
  • Fallback logic being executed more often than expected 

Even when these errors aren’t breaking anything, they’re usually slowing everything down. Also remember: Exceptions with full stack trace have to be kept in memory first. That results in higher memory utilization, garbage collection which then needs more CPU to clean up unused data! 

 

Your Turn — What Patterns Do You Look For? 

These are just some of the recurring patterns I’ve collected over years of analyzing distributed systems. But I know every engineer has their own mental shortcuts and “goto” indicators. 

So I’d love to hear from you: 

👉 What’s the first pattern you look for when something gets slow? 

👉 What’s the most interesting performance issue you’ve ever diagnosed? 

Drop a comment, send me a message, or share a story—let’s turn this into a growing list of performance wisdom from the community. 

 

And if you want to try out everything I showed in the video hands-on, remember: 

You can explore all of this in the Dynatrace Playground. 

🎥 For more inspiration also watch my video on How to detect bad patterns in logs and traces! 

andreas_grabner_6-1773660500869.png

 

  

Looking forward to your stories—and see you in the next Observability Lab! 

— Andi