Re: MCP Server Challenge entry #5: Observability Maturity Auditor

tracegazer · ‎22 Apr 2026

AI-Powered Tenant Assessment via Dynatrace MCP

Hi everyone! I built an automated observability maturity auditor that uses Claude AI + Dynatrace MCP to run a 15-agent audit across infrastructure, configuration, DEM, operations, and security — producing a scored HTML report with root cause analysis and actionable next steps. The entire audit runs from a single command: "audit tenant uhv42169".

See it in action:

drive.google.com/auditDynatrace

The Problem

As an consultant, I audit Dynatrace tenants regularly for clients across Latin America. Each audit follows a repeatable pattern:

Connect to the tenant
Check 12-15 observability dimensions
Score each dimension by severity
Write an SRE analysis with root causes and recommendations
Generate a professional report for the client

This process used to take 1-3 days manually. With the Dynatrace MCP server, I automated it down to ~15 minutes.

The Solution: AI-as-Auditor via MCP

The key insight is using CLAUDE.md as an executable playbook. Instead of writing Python code to call APIs, I wrote a markdown file that instructs Claude AI to execute the audit step by step, using the Dynatrace MCP server as its data source.

Architecture

User: "audit tenant uhv42169"
    ↓
Claude AI reads CLAUDE.md playbook
    ↓
Claude AI reads 15 agent definitions (agents/*.md)
    ↓
Dynatrace MCP Server ← execute_dql, list_problems, list_vulnerabilities, ...
    ↓
Claude AI evaluates findings, calculates scores, writes SRE analysis
    ↓
HTML Report with scores, findings, root cause, recommendations

MCP Tools Used (7 out of 20 available)

MCP Tool Used By Agents Purpose

get_environment_info	Setup	Verify connectivity, get tenant ID
execute_dql	01-09, 12, 15	Query entities, tags, management zones, services
list_problems	10, 11, 13	Problem history, MTTR, noise analysis, custom alerts
list_davis_analyzers	10	Verify Davis AI capabilities
list_vulnerabilities	14	Security posture assessment
get_kubernetes_events	15	K8s cluster health and event analysis
chat_with_davis_copilot	Exploration	Settings discovery (dt.setting workaround)

The 15 Audit Agents

Each agent is a markdown file defining: DQL queries or MCP tool calls, checks with PASS/WARN/FAIL/INFO criteria, blast radius (CRITICAL/HIGH/MEDIUM/LOW), remediation text, and analysis guidelines with root cause/recommendations/next steps.

Infrastructure
01. OneAgent & ActiveGate
02. Host Groups
15. Kubernetes Health

Configuration
03. Management Zones
04. Auto Tags
05. Manual Tags
06. Ownership
07. Security Context
10. Anomaly Detection
11. Problem Notifications
12. SLOs

DEM
08. Real User Monitoring
09. Synthetic Monitors

Operations
13. Problem History

Security
14. Vulnerabilities

Scoring System

Each finding is weighted by blast radius:

CRITICAL	Weight 4.0
HIGH	Weight 3.0
MEDIUM	Weight 2.0
LOW	Weight 1.0

Status scoring: PASS = 100% of weight, WARN = 50%, FAIL = 0%, INFO = excluded.
Agent score = (earned_weight / total_weight) × 100. Global score = average of all agents with data.

Real Audit Results: Tenant uhv42169

Global Score 36.6/100 — Critical gaps detected

34.5 Infra

28.6 Config

27.3 DEM

45.5 Ops

100 Security

Per-Agent Scores

# Agent Score Data Source

01	OneAgent & ActiveGate	73.6	execute_dql
02	Host Groups	0.0	execute_dql
03	Management Zones	0.0	execute_dql
04	Auto Tags	0.0	execute_dql
05	Manual Tags	50.0	execute_dql
06	Ownership	0.0	execute_dql
07	Security Context	N/A	dt.setting unavailable
08	RUM	N/A	No apps (intentional)
09	Synthetic Monitors	27.3	execute_dql
10	Anomaly Detection	83.3	list_davis_analyzers + list_problems
11	Problem Notifications	66.7	list_problems (CUSTOM_ALERT inference)
12	SLOs	0.0	execute_dql
13	Problem History	45.5	list_problems + execute_dql
14	Vulnerabilities	100.0	list_vulnerabilities
15	Kubernetes	30.0	get_kubernetes_events + execute_dql

How It Works: Step by Step

Step 1: CLAUDE.md as Executable Playbook

The core innovation is that CLAUDE.md IS the automation. No Python, no scripts, no SDK wrappers. The AI reads the playbook and follows it:

# CLAUDE.md — audit_mcp Playbook
When the user says "audit [tenant]", follow this sequence:

### Step 1: SETUP
1. Call get_environment_info to verify connectivity
### Step 2: COLLECT DATA (for each of the 15 agents)
1. Read the agent file from agents/NN_name.md
2. Execute each query or MCP tool call listed

### Step 3: EVALUATE — Apply checks, determine PASS/WARN/FAIL/INFO
### Step 4: CALCULATE SCORES — Weighted by blast_radius
### Step 5: SRE ANALYSIS — Root Cause + Recommendations + Next Steps
### Step 6: GENERATE REPORT — Interactive HTML with Mainsoft branding

Step 2: Agent Definitions as Markdown

Each agent is a self-contained markdown file. Here's a simplified example of Agent 10 (Anomaly Detection), which was redesigned to use MCP tools instead of the unavailable dt.setting:

# Agent 10: Anomaly Detection
Category: configuration | Blast Radius: HIGH
## MCP Tools
list_davis_analyzers — verify AI capabilities
list_problems(timeframe="30d") — check if detection fires
## Checks
davis_analyzers_available: PASS if ≥3 analyzers
anomaly_detection_firing: PASS if SLOWDOWN/RESOURCE problems exist
anomaly_problem_ratio: WARN if ≥60% anomaly-based (noisy)

Step 3: The Report

The generated HTML report features:

Global score gauge with semaphore coloring (green/yellow/red)
Category breakdown cards (Infrastructure, Configuration, DEM, Operations, Security)
Per-agent sections with mini-gauges, findings tables with sort/filter, and collapsible AI analysis
Each analysis includes Root Cause, Recommendations (prioritized), and Next Steps (with effort estimates)
Dark/light theme toggle, global search, print-to-PDF support
Responsive design for mobile viewing

Key Discovery: Working Around dt.setting

A significant challenge: fetch dt.setting is not available as a DQL data object, which initially blocked 4 agents (Security Context, Anomaly Detection, Problem Notifications, and partially Management Zones/Auto Tags/Ownership/SLOs).

The solution was to leverage other MCP tools creatively:

Anomaly Detection: Instead of querying settings, we check if list_davis_analyzers returns analyzers AND if list_problems shows anomaly-based problems are being generated. If both are true, anomaly detection is working.
Problem Notifications: If list_problems returns CUSTOM_ALERT category problems, it proves both alerting rules AND notification channels are configured (you can't have a CUSTOM_ALERT without both).
Security: list_vulnerabilities provides runtime security assessment without needing settings access.
Kubernetes: get_kubernetes_events reveals cluster health that entity queries alone can't show.

This turned a limitation into a feature — the audit now uses 7 different MCP tools instead of relying solely on DQL, making it more resilient and comprehensive.

Impact

Metric	Before (Manual)	After (MCP)
Audit duration	1-3 days	~20 minutes
Dimensions checked	8-10	15 (5 categories)
MCP tools used	N/A	7 tools
Report format	Google Slides / PDF	Interactive HTML (dark mode, search, filter)
Consistency	Varies by analyst	100% repeatable

What's Next

Settings API v2 integration: When the MCP server adds a settings tool, the remaining N/A agents will become fully functional
Multi-tenant comparison: Run audits across tenants and compare maturity scores
Trend tracking: Store historical scores to show improvement over time
Dynatrace Notebook export: Use create_dynatrace_notebook to push findings directly into the tenant
Automated remediation: Use send_event to create deployment events tracking fixes

Stack: Claude AI + Dynatrace MCP Server + Markdown playbooks
Source: Available on request — the entire framework is ~15 markdown files + 1 HTML template

Logs, Traces, Metrics... and a bit of sanity.

Julius_Loman · ‎23 Apr 2026

Hey @tracegazer ,

would mind sharing it? Reminds me of the tenant review https://github.com/dynatrace-oss/CustomerSuccess/tree/main

I internally tried to build a similar solution for Dynatrace Managed using Claude, but actually not using the agent approach, but generating the utility to provide such reports. (Managed installations are typically airgapped + customers have AI regulations).

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

tracegazer · ‎23 Apr 2026

Hi @Julius_Loman .. The current solution runs with the help of Claude and Dynatrace’s MCP. However, I have a previous solution that used Dynatrace APIs, which, as I understand, should also work in both SaaS and Managed environments. Let me check if I have it committed and published on Git.

Logs, Traces, Metrics... and a bit of sanity.

tracegazer · ‎24 Apr 2026

Hi @Julius_Loman , I’ve finally uploaded the repository. Here’s the URL:

git clone https://github.com/alanfuentes92/observability-auditor.git
cd observability-auditor/audit-mcp
cp mcp-config.example.json .mcp.json
Edit .mcp.json with your tenant URL and token
claude .
Then say: "audit this tenant" → An HTML report will be generated in the output/ folder

Feel free to reach out with any questions, feedback, suggestions, or ideas—everything is welcome.

Logs, Traces, Metrics... and a bit of sanity.

andreas_grabner · ‎11 May 2026

Hi. Really impressive use case!!

Just one thought. I know the challenge is about using the MCP - but - have you already tried looking into dtctl as an alernative or as an additional way to interact with Dynatrace? dtctl also allows you to easily access and modify configuration => https://github.com/dynatrace-oss/dtctl

Contact our DevRel team through devrel@dynatrace.com