AI
Everything around AI: AI observability, agentic AI, LLMs, MCP servers, and more
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

MCP Server Challenge entry #5: Observability Maturity Auditor

tracegazer
Helper

AI-Powered Tenant Assessment via Dynatrace MCP

 

Hi everyone! I built an automated observability maturity auditor that uses Claude AI + Dynatrace MCP to run a 15-agent audit across infrastructure, configuration, DEM, operations, and security — producing a scored HTML report with root cause analysis and actionable next steps. The entire audit runs from a single command: "audit tenant uhv42169".

See it in action:


The Problem

As an consultant, I audit Dynatrace tenants regularly for clients across Latin America. Each audit follows a repeatable pattern:

  1. Connect to the tenant
  2. Check 12-15 observability dimensions
  3. Score each dimension by severity
  4. Write an SRE analysis with root causes and recommendations
  5. Generate a professional report for the client

This process used to take 1-3 days manually. With the Dynatrace MCP server, I automated it down to ~15 minutes.


The Solution: AI-as-Auditor via MCP

The key insight is using CLAUDE.md as an executable playbook. Instead of writing Python code to call APIs, I wrote a markdown file that instructs Claude AI to execute the audit step by step, using the Dynatrace MCP server as its data source.

Architecture

User: "audit tenant uhv42169"
    
Claude AI reads CLAUDE.md playbook
    
Claude AI reads 15 agent definitions (agents/*.md)
    
Dynatrace MCP Server ← execute_dql, list_problems, list_vulnerabilities, ...
    
Claude AI evaluates findings, calculates scores, writes SRE analysis
    
HTML Report with scores, findings, root cause, recommendations

 

MCP Tools Used (7 out of 20 available)

MCP Tool Used By Agents Purpose

get_environment_info Setup Verify connectivity, get tenant ID
execute_dql 01-09, 12, 15 Query entities, tags, management zones, services
list_problems 10, 11, 13 Problem history, MTTR, noise analysis, custom alerts
list_davis_analyzers 10 Verify Davis AI capabilities
list_vulnerabilities 14 Security posture assessment
get_kubernetes_events 15 K8s cluster health and event analysis
chat_with_davis_copilot Exploration Settings discovery (dt.setting workaround)

The 15 Audit Agents

Each agent is a markdown file defining: DQL queries or MCP tool calls, checks with PASS/WARN/FAIL/INFO criteria, blast radius (CRITICAL/HIGH/MEDIUM/LOW), remediation text, and analysis guidelines with root cause/recommendations/next steps.

Infrastructure
01. OneAgent & ActiveGate
02. Host Groups
15. Kubernetes Health
Configuration
03. Management Zones
04. Auto Tags
05. Manual Tags
06. Ownership
07. Security Context
10. Anomaly Detection
11. Problem Notifications
12. SLOs
DEM
08. Real User Monitoring
09. Synthetic Monitors
Operations
13. Problem History
Security
14. Vulnerabilities

Scoring System

Each finding is weighted by blast radius:

CRITICAL Weight 4.0
HIGH Weight 3.0
MEDIUM Weight 2.0
LOW Weight 1.0

Status scoring: PASS = 100% of weight, WARN = 50%, FAIL = 0%, INFO = excluded.
Agent score = (earned_weight / total_weight) × 100. Global score = average of all agents with data.


Real Audit Results: Tenant uhv42169

Global Score 36.6/100 — Critical gaps detected
34.5 Infra
28.6 Config
27.3 DEM
45.5 Ops
100 Security

 

Per-Agent Scores

# Agent Score Data Source

01 OneAgent & ActiveGate 73.6 execute_dql
02 Host Groups 0.0 execute_dql
03 Management Zones 0.0 execute_dql
04 Auto Tags 0.0 execute_dql
05 Manual Tags 50.0 execute_dql
06 Ownership 0.0 execute_dql
07 Security Context N/A dt.setting unavailable
08 RUM N/A No apps (intentional)
09 Synthetic Monitors 27.3 execute_dql
10 Anomaly Detection 83.3 list_davis_analyzers + list_problems
11 Problem Notifications 66.7 list_problems (CUSTOM_ALERT inference)
12 SLOs 0.0 execute_dql
13 Problem History 45.5 list_problems + execute_dql
14 Vulnerabilities 100.0 list_vulnerabilities
15 Kubernetes 30.0 get_kubernetes_events + execute_dql

How It Works: Step by Step

Step 1: CLAUDE.md as Executable Playbook

The core innovation is that CLAUDE.md IS the automation. No Python, no scripts, no SDK wrappers. The AI reads the playbook and follows it:

# CLAUDE.md — audit_mcp Playbook
When the user says "audit [tenant]", follow this sequence:
### Step 1: SETUP
1. Call get_environment_info to verify connectivity
### Step 2: COLLECT DATA (for each of the 15 agents)
1. Read the agent file from agents/NN_name.md
2. Execute each query or MCP tool call listed
### Step 3: EVALUATE — Apply checks, determine PASS/WARN/FAIL/INFO
### Step 4: CALCULATE SCORES — Weighted by blast_radius
### Step 5: SRE ANALYSIS — Root Cause + Recommendations + Next Steps
### Step 6: GENERATE REPORT — Interactive HTML with Mainsoft branding
 

Step 2: Agent Definitions as Markdown

Each agent is a self-contained markdown file. Here's a simplified example of Agent 10 (Anomaly Detection), which was redesigned to use MCP tools instead of the unavailable dt.setting:

# Agent 10: Anomaly Detection
Category: configuration | Blast Radius: HIGH
## MCP Tools
list_davis_analyzers — verify AI capabilities
list_problems(timeframe="30d") — check if detection fires
## Checks
davis_analyzers_available: PASS if ≥3 analyzers
anomaly_detection_firing: PASS if SLOWDOWN/RESOURCE problems exist
anomaly_problem_ratio: WARN if ≥60% anomaly-based (noisy)

Step 3: The Report

The generated HTML report features:

  • Global score gauge with semaphore coloring (green/yellow/red)
  • Category breakdown cards (Infrastructure, Configuration, DEM, Operations, Security)
  • Per-agent sections with mini-gauges, findings tables with sort/filter, and collapsible AI analysis
  • Each analysis includes Root Cause, Recommendations (prioritized), and Next Steps (with effort estimates)
  • Dark/light theme toggle, global search, print-to-PDF support
  • Responsive design for mobile viewing

Key Discovery: Working Around dt.setting

A significant challenge: fetch dt.setting is not available as a DQL data object, which initially blocked 4 agents (Security Context, Anomaly Detection, Problem Notifications, and partially Management Zones/Auto Tags/Ownership/SLOs).

The solution was to leverage other MCP tools creatively:

  • Anomaly Detection: Instead of querying settings, we check if list_davis_analyzers returns analyzers AND if list_problems shows anomaly-based problems are being generated. If both are true, anomaly detection is working.
  • Problem Notifications: If list_problems returns CUSTOM_ALERT category problems, it proves both alerting rules AND notification channels are configured (you can't have a CUSTOM_ALERT without both).
  • Security: list_vulnerabilities provides runtime security assessment without needing settings access.
  • Kubernetes: get_kubernetes_events reveals cluster health that entity queries alone can't show.

This turned a limitation into a feature — the audit now uses 7 different MCP tools instead of relying solely on DQL, making it more resilient and comprehensive.


Impact

Metric Before (Manual) After (MCP)
Audit duration 1-3 days ~20 minutes
Dimensions checked 8-10 15 (5 categories)
MCP tools used N/A 7 tools
Report format Google Slides / PDF Interactive HTML (dark mode, search, filter)
Consistency Varies by analyst 100% repeatable

What's Next

  • Settings API v2 integration: When the MCP server adds a settings tool, the remaining N/A agents will become fully functional
  • Multi-tenant comparison: Run audits across tenants and compare maturity scores
  • Trend tracking: Store historical scores to show improvement over time
  • Dynatrace Notebook export: Use create_dynatrace_notebook to push findings directly into the tenant
  • Automated remediation: Use send_event to create deployment events tracking fixes

Stack: Claude AI + Dynatrace MCP Server + Markdown playbooks
Source: Available on request — the entire framework is ~15 markdown files + 1 HTML template

Logs, Traces, Metrics... and a bit of sanity.
3 REPLIES 3

Julius_Loman
DynaMight Legend
DynaMight Legend

Hey @tracegazer ,

would mind sharing it? Reminds me of the tenant review https://github.com/dynatrace-oss/CustomerSuccess/tree/main 

I internally tried to build a similar solution for Dynatrace Managed using Claude, but actually not using the agent approach, but generating the utility to provide such reports. (Managed installations are typically airgapped + customers have AI regulations).

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

Hi @Julius_Loman .. The current solution runs with the help of Claude and Dynatrace’s MCP. However, I have a previous solution that used Dynatrace APIs, which, as I understand, should also work in both SaaS and Managed environments. Let me check if I have it committed and published on Git.

Logs, Traces, Metrics... and a bit of sanity.

Hi @Julius_Loman , I’ve finally uploaded the repository. Here’s the URL:

git clone https://github.com/alanfuentes92/observability-auditor.git
cd observability-auditor/audit-mcp
cp mcp-config.example.json .mcp.json
Edit .mcp.json with your tenant URL and token
claude .
Then say: "audit this tenant" → An HTML report will be generated in the output/ folder

Feel free to reach out with any questions, feedback, suggestions, or ideas—everything is welcome.

Logs, Traces, Metrics... and a bit of sanity.

Featured Posts