MCP Server Challenge entry #6: My very first App - A Kubernetes Cluster Performance & Capacity Report

dannemca · ‎24 Apr 2026

Inspired by this excelent video in Dynatrace YouTube channel, I started to put in practice something that I was always trying to do but never had the guts to complete. To create my very first App in Dynatrace.

I am not even close to a Developer. My background was always a tech guy, that solve problems applying Observability practices. But I am a SRE, and one of my competencies should be a Software Engineering. I can read code, I can troubleshoot and, sometimes, fix bugs, but I can not code. It is not fair I said that I can.

But thanks to AI, I can now vibe code.

Last year I have discovered the powerfull of MCP servers, during a presentation in the SREDay in Brazil (https://sreday.com/2025-campinas-q4/) and then figured out that Dynatrace has its own MCP Server to be used.

I have first played with the MCP Server and the MS Copilot to ask simple questions about the tenants I admin, but I never thought that I could use it for others complexes tasks, as create an entire app.

And then I saw the YT video, where, as always, @andreas_grabner , presents amazing features that people are creating within Dynatrace.

I had to try by myself and see if it will be that easy. And it was.

I have started creating the default app, following this doc.

The idea was to replicate a current Dashboard I have created, that provides an history of CPU and Memory Usage vs Request and Total, from 2 Kubernetes Clusters, allowing filter the results by Node Role. So I already had the required DQL queries.

timeseries usage_result = sum(dt.kubernetes.container.cpu_usage, default: 0, rollup: sum, rate: 1m),
  ...
| append[timeseries request_result = sum(dt.kubernetes.container.requests_cpu, rollup: sum, rate: 1m),
  ...
| append[timeseries total_result = sum(dt.kubernetes.node.cpu_allocatable, rollup: sum, rate: 1m),
  ...
| lookup [smartscapeNodes {K8S_NODE}
          | fields name, 
                   role =`tags:k8s.labels`[`node-role/$NodeRole:noquote`]
| filter isNotNull(role)
], sourceField:k8s.node.name, lookupField:name
| filterOut isNull(lookup.name)
| fieldsRemove lookup.name, k8s.node.name
| summarize { Usage = sum(usage_result[]), Request = sum(request_result[]), Total = sum(total_result[])}, by:{timeframe,interval}

In this DQL, I have some appends for new metrics and a lookup function to allow the filtering by Node Role.

So I have asked the Copilot to create a new page that display the result of this query, allowing the user to update the values based on the variable selection for the node role.

And just like that, I got the first custom page created:

I am not totally sure if the Copilot could get the same or similar query just by prompting the needs, since it has to add different metrics, and filtering by a node entity property, where the metrics look for cluster entity instead. Maybe yes, and I had to explain it all in the prompt. But since I already got it, I just used.

Till now I just have a Dashboard that is bit hard to edit. Apps should be smarter.

So I have asked the Copilot to act as a Performance & Capacity specialist, to analyze the data, compare with previous period and generate a report with recommendations of improvements.

And I got this:

Now, when we load the app and choose a Node Role, we get the graph with the CPU and Memory history consumption and we can quickly understand if the values we are seeing are good or not and what actions we can take to improve it, whether saving cloud provisioning money or adjusting the requests to avoid future resource congestion.

Of course, it was not so simple as ask and get it. Sometimes I was not so clear on my needs, and some hallucinations happened, but the Copilot could get back in path after few tries.

I will now enhance this, adding the recommendations by Namespaces and Workloads, with suggestion for limits and requests and even for pod numbers. Let's see.

Here are some lessons I learned with that:

Make sure you understand the Dynatrace querying capabilities.
Make sure you understand the data context of your environment.
Add one feature/function per time. Be clear with your prompts.
Try to understand the App project structure, for minors and quick adjustments by hand.
Don't be afraid of AI.

RTFM is still required. WTFV (watch the f video) is a must.

Site Reliability Engineer @ Kyndryl