AI
Everything around AI: AI observability, agentic AI, LLMs, MCP servers, and more
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

MCP Server Challenge entry #7: SDF Governance Guard v2

RWC
Participant

SDF Governance Guard  — Ready for Federal Scale v2

How We Built a Signal–Defect–Failure Classification Framework, and Extended It to Govern Remote Model Context Protocol (MCP) Server Interactions — Ready for Federal Scale

Author: Randy Chambers

Role: Dynatrace Practice Lead

Organization: Discipline Consulting Group LLC

Contact: rchambers@disciplineconsulting.com  |  540-645-1149

Submitted for the MCP Server Challenge — April 2026

――――――――――――――――――――――――――――――――――――――――

1. The Scenario — The Problem We Set Out to Solve

When Dynatrace redesigned its certification exam to scenario-based diagnostic reasoning in October 2024, it exposed a gap we saw firsthand. Our candidates were knowledgeable — they understood OneAgent, Davis Artificial Intelligence (AI), Smartscape, Grail, and the platform's architecture. But they couldn't solve scenario-based problems consistently because they lacked a structured diagnostic methodology. The training pipeline taught what things are. The exam tested how to use them to solve problems. Nothing bridged that gap.

For organizations operating in federal environments — where every automated action must be auditable, classified, and authorized — ungoverned AI agent access is a deployment blocker. You can't hand an AI agent 14 tools and say "figure it out." You need a classification framework that determines what the agent can see, what it can do, and what it cannot do, based on the data it's working with. This isn't hypothetical. Dynatrace is Federal Risk and Authorization Management Program (FedRAMP) authorized and actively pursuing federal market expansion. Federal agencies are already deploying Dynatrace in production environments governed by NIST, FISMA, and FedRAMP continuous monitoring requirements. When these agencies adopt the MCP Server for agentic operations, every AI agent action must satisfy the same compliance requirements as every human operator action. The governance gap isn't a future problem — it's a deployment prerequisite.

The Integrated Continuous Security Methodology (CSM)

We already understood this problem. For our federal customers operating hybrid scientific environments, we built the Integrated Continuous Security Methodology (CSM) — an operational cycle that keeps these environments defensible, auditable, and mission-ready. CSM treats security as an ongoing, measurable process rather than a one-time project. The cycle operates across five continuous phases: Detect, Respond, Remediate, Verify, Report. CSM connects telemetry, people, engineering, and governance so that threats are found quickly, handled consistently, and lessons are fed back into controls and documentation.

Naming the Classification Chain — Signal–Defect–Failure

At the same time, we noticed something about Davis AI itself. Internally, the platform operates on a classification pipeline that nobody had named:

  • Raw telemetry flows in from OneAgent — we called this Signal
  • Davis detects baseline deviations and generates events — we called this Defect
  • Davis correlates events into confirmed problems with root cause — we called this Failure

This classification pipeline — the baseline calculation, event correlation, and topology-aware root cause analysis engine that Wolfgang Beer has architected inside Davis AI — is the foundation that SDF formalizes. We didn't invent the classification logic. We named what was already there and extended it to govern the MCP Server boundary.

The MCP Server Launch

When the Dynatrace MCP Server launched at Perform 2026 — announced as "the connective tissue between agentic systems and Dynatrace Intelligence" and deployed internally as Customer Zero — it gave AI agents direct access to 14 tools and 6 agent-level capabilities. That's powerful. It's also ungoverned. When an AI agent connects to the MCP Server, it can call get_environment_info AND list_problems AND send_slack_message in the same session. Nothing in the MCP protocol itself distinguishes between reading baseline metrics and triggering remediation workflows. The agent sees tools. It doesn't see boundaries.

We set out to change that — first for certification, then for production operations, and now for AI agent governance through the MCP Server. To complement the CSM and extend governance to agentic AI, we built the SDF Governance Guard Framework.

CSM–Davis AI–SDF Alignment

CSM Phase

What Happens Operationally

Davis AI Function

SDF Layer

MCP Agent Permission

Detect

Continuous monitoring identifies anomalies in the hybrid environment

OneAgent ingests telemetry; Davis establishes and monitors baselines

Signal

OBSERVE — read metrics, topology, logs

Respond

Qualified anomaly triggers investigation and stakeholder notification

Davis generates events when baselines are breached; anomaly qualified but impact not confirmed

Defect

INVESTIGATE — create tickets, send notifications, query related entities

Remediate

Confirmed problem triggers pre-approved corrective action

Davis correlates events into problems with confirmed root cause and affected entity chain

Failure

REMEDIATE — execute pre-approved playbooks within defined blast radius

Verify

Confirm resolution; validate metrics return to baseline

Davis monitors for auto-resolution; telemetry confirms baseline recovery

Signal (return)

OBSERVE — confirm baseline recovery, validate remediation effectiveness

Report

Document the full lifecycle; feed lessons into controls

Complete audit trail across all classification layers; governance artifacts updated

All layers

AUDIT — full chain: tool call → data → SDF classification → permission → action → outcome

 

This alignment means the CSM cycle our federal customers already operate becomes enforceable through the MCP Server. When an AI agent calls an MCP tool, the SDF Governance Guard classifies the data, resolves the permission, and maps the action to the corresponding CSM phase — creating a single, unified governance model for both human operations and agentic AI.

NIST IR 8011 Structural Isomorphism

The National Institute of Standards and Technology Interagency Report (NIST IR) 8011 defines an automated security assessment methodology built on defect checks — systematic evaluations that determine whether a security control is operating as intended. NIST IR 8011's assessment pipeline follows a progression that is structurally isomorphic to SDF:

  • NIST IR 8011 collects security-relevant telemetry from automated data feeds = SDF Signal
  • NIST IR 8011 applies defect checks to identify control deviations = SDF Defect
  • NIST IR 8011 correlates defect findings into assessment determinations with root cause = SDF Failure

This structural isomorphism means SDF classification doesn't just align with Dynatrace's internal architecture — it aligns with the federal government's own methodology for automated security assessment.

The Customer Zero Mesh Point

Dynatrace deployed the MCP Server internally as Customer Zero. This is where three frameworks converge:

  1. CSM provides the operational cycle (Detect, Respond, Remediate, Verify, Report)
  2. SDF provides the data classification layer (Signal, Defect, Failure)
  3. The MCP Server provides the agent access layer (14 tools + 6 agents)

At the Customer Zero mesh point, every AI agent action flows through all three: the MCP tool call retrieves data, SDF classifies it, and the classification maps to a CSM phase — ensuring the agent operates within the same operational governance that human operators follow. This is what makes the framework ready for federal scale.

――――――――――――――――――――――――――――――――――――――――

2. What We Built — The Steps We Took

Step 1: We Built the Integrated Continuous Security Methodology (CSM)

Before the MCP Server existed, we built the CSM operational cycle for our federal customers' hybrid scientific environments. CSM established the governance baseline: Detect, Respond, Remediate, Verify, Report. Every automated action — whether performed by a human operator, a script, or an AI agent — must map to a CSM phase and produce an auditable record.

Step 2: We Named the Classification Chain — Signal–Defect–Failure (SDF)

We formalized what Davis AI already does internally as a three-layer classification taxonomy:

  • Signal = raw telemetry at baseline. Metrics, topology, logs — no anomaly detected. Maps to CSM Detect phase.
  • Defect = qualified anomaly. Davis has detected a deviation but hasn't confirmed impact. Maps to CSM Respond phase.
  • Failure = confirmed impact. Root cause identified, affected entity chain mapped. Maps to CSM Remediate phase.

Every piece of data accessible through the MCP Server sits on one of these three layers.

Step 3: We Built a Human-Executable Version of Davis AI's Causal Pipeline — LOCATE

The LOCATE diagnostic protocol — a six-step reasoning framework that mirrors how Davis AI root-causes problems:

  • Layer the observability data — identify what telemetry is relevant
  • Origin — identify where the anomaly originated
  • Context — establish baselines and determine if the deviation is real
  • Architecture — map the Smartscape topology and dependency chain
  • Trigger — identify the event that initiated the causal chain
  • Eliminate — confirm root cause by eliminating false leads

LOCATE is the human-executable version of Davis AI's deterministic fault-tree analysis. We use it to train practitioners AND to validate the reasoning path an AI agent should follow when operating through the MCP Server. When an agent follows the LOCATE protocol through MCP tools, its investigation maps to the CSM Respond and Remediate phases — creating a traceable reasoning chain.

Step 4: We Engineered a Complete Training Ecosystem — Seven Pillars

We didn't build a study guide — we built a systems-engineered training architecture with the same rigor Dynatrace applies to its own platform:

  • Pillar 0 (P0): The SDF Taxonomy and Lifecycle — the classification backbone
  • Pillar 1 (P1): The LOCATE Diagnostic Framework v4.0 — the reasoning engine
  • Pillar 2 (P2): LOCATE integrated with the 12-module Dynatrace Learning Path Framework (DLPF)
  • Pillar 3 (P3): Operationalizing LOCATE for deployment across 4 delivery models
  • Pillar 4 (P4): 53 Scenario Drills — a practice laboratory mapped to all 12 DLPF modules
  • Pillar 5 (P5): A structured 30-Day Mastery Guide
  • Pillar 6 (P6): The Complete Service Delivery Framework Lifecycle (Design, Build, Deploy, Operate, Evolve) — Aligned with Dynatrace's own Perform 2026 strategic direction of "closed-loop autonomous outcomes," the SDF Governance Guard ecosystem operates as a self-improving lifecycle — Design, Build, Deploy, Operate, Evolve — with feedback loops that continuously absorb platform updates, MCP Server changes, exam format evolution, and practitioner performance data. This is not a static governance product that depreciates. It is a compounding asset that increases in value with every platform release, every MCP tool addition, every partner deployment, and every federal customer who requires classification governance for autonomous operations.

Supporting infrastructure: 27 unified taxonomies in a Master Reference Catalogue, 9 governance artifacts aligned to the Department of Homeland Security (DHS) Systems Engineering Lifecycle (SELC), and a Cross-Pillar Traceability Matrix.

Step 5: We Deployed with Candidates and Measured Results

We deployed the SDF/LOCATE ecosystem with Dynatrace Practitioner exam candidates through Discipline Consulting Group. Candidates followed the 30-day structured mastery path, working through the SDF classification framework and the 53 scenario drills. The key differentiator: instead of memorizing platform features, candidates learned to classify observability data (Signal, Defect, or Failure), then apply the LOCATE protocol to reason through scenarios diagnostically. Candidates achieved exam scores of 85 and above. The ecosystem runs without requiring the original architect to deliver every session — facilitator-independent deployment.

Step 6: We Extended SDF to the Dynatrace–ServiceNow Integration Boundary

We mapped the SDF classification framework to the Dynatrace x ServiceNow strategic partnership (announced October 2025). Every one of the 6 certified ServiceNow integrations operates on SDF-classified data:

ServiceNow Integration

SDF Layer

CSM Phase

Function

Service Graph Connector

Signal

Detect

Topology synchronization

Event Management Connector

Signal → Defect

Detect → Respond

Qualified anomalies cross the boundary

Incident Integration App

Defect → Failure

Respond → Remediate

Confirmed problems with root cause context

Dynatrace Workflows for ServiceNow

All layers

All phases

Orchestration across SDF/CSM spectrum

Service Observability Connector

Signal

Detect

Context enrichment

Analysis AI Agent Connector

Failure

Remediate

Agentic root cause analysis

 

Step 7: We Extended SDF to the MCP Server — The Governance Guard

When the Dynatrace MCP Server launched at Perform 2026, we asked: does the same SDF classification framework that governs our training and the ServiceNow integration boundary also govern what AI agents can see and do through the MCP Server? The answer was yes. We mapped every one of the 14 MCP Server tools and 6 agent-level tools to their SDF classification layer and CSM phase, and built the SDF Governance Guard.

Step 8: We Validated with Three Practical Scenarios

We walked through three end-to-end MCP interaction patterns — Signal monitoring (CSM Detect), Defect investigation (CSM Respond), and Failure remediation (CSM Remediate) — demonstrating that SDF governance prevents both over-action (remediating noise) and under-action (merely alerting on confirmed outages). Each scenario includes the Verify and Report phases to complete the CSM cycle.

――――――――――――――――――――――――――――――――――――――――

3. The SDF Agent Permission Matrix

SDF Layer

CSM Phase

What the Agent Sees

What the Agent Can Do

What It CANNOT Do

Signal(Observe)

Detect

Baseline metrics, entity topology, logs, environment info

Read data, generate reports, explain trends, compare to baselines

Cannot create alerts, cannot send notifications, cannot trigger workflows

Defect(Investigate)

Respond

Davis events, vulnerabilities, Kubernetes warning/error events

Create investigation tickets, send notifications, recommend actions, query related entities for root cause hypothesis

Cannot execute remediation, cannot modify infrastructure, cannot auto-resolve

Failure(Remediate — Guardrailed)

Remediate

Davis problems with confirmed root cause and affected entity chain

Execute pre-approved remediation playbooks, create P1 incidents, trigger notification workflows

Cannot execute novel remediation without human approval, cannot exceed defined blast radius

 

Note: The Verify and Report phases close the CSM loop: after any Failure-level action, the agent re-queries metrics (Signal) to confirm baseline recovery (Verify), and the full classification chain is logged for audit (Report).

――――――――――――――――――――――――――――――――――――――――

4. MCP Server Tool Classification Map

4.1 MCP Server Tools (14 Tools)

MCP Tool

SDF Layer

CSM Phase

Permission Level

Governance Rule

get_environment_info

Signal

Detect

OBSERVE

Read-only. No action constraints.

get_entity_details

Signal

Detect

OBSERVE

Read-only. Returns topology context.

get_ownership

Signal

Detect

OBSERVE

Read-only. Returns ownership for notification routing.

get_logs_for_entity

Signal

Detect

OBSERVE

Read-only. Rate limiting recommended for large log volumes.

verify_dql

Signal (meta)

Detect

OBSERVE

Validates DQL syntax only. No data exposure.

execute_dql

Depends on query

Detect → Respond → Remediate

OBSERVE → REMEDIATE

Classification depends on query results. Metrics = Signal. Events = Defect. Problems = Failure. Agent must classify results before acting.

get_kubernetes_events

Signal + Defect

Detect + Respond

OBSERVE + INVESTIGATE

Normal K8s events = Signal. Warning/Error events = Defect. Agent must distinguish before acting.

list_vulnerabilities

Defect

Respond

INVESTIGATE

Returns CVEs — qualified anomalies requiring investigation, not immediate remediation.

get_vulnerability_details

Defect

Respond

INVESTIGATE

Deep vulnerability context. Investigation only — remediation requires change management.

list_problems

Failure

Remediate

INVESTIGATE + REMEDIATE

Returns confirmed problems with root cause. Agent can recommend and (if pre-approved) execute remediation.

get_problem_details

Failure

Remediate

INVESTIGATE + REMEDIATE

Deep problem context. Full governance rules apply.

send_slack_message

Defect + Failure

Respond + Remediate

NOTIFY

Channel routing must match SDF layer: Defect → investigation channel, Failure → incident channel.

create_workflow_for_notification

All layers

All phases

ORCHESTRATE

Created workflows must embed SDF classification checks.

update_workflow

Signal (meta)

Report

ADMINISTER

Administrative action — governance review recommended.

 

4.2 Agent-Level Tools (6 Agents)

Agent Tool

SDF Layer

CSM Phase

Permission Level

Grail Query Agent

All (query-dependent)

All phases

OBSERVE → REMEDIATE — depends on what's queried

DQL Explanation Agent

Signal (meta)

Detect

OBSERVE — explains queries, no data exposure

Help Agent

Signal (meta)

Detect

OBSERVE — product information only

Data Analysis Agent

All (query-dependent)

All phases

OBSERVE → REMEDIATE — depends on results classification

Root Cause Agent

Failure

Remediate

INVESTIGATE + REMEDIATE — specifically designed for problem analysis

Forecasting Agent

Signal + Defect

Detect + Respond

OBSERVE + INVESTIGATE — predicts future anomalies

 

――――――――――――――――――――――――――――――――――――――――

5. Three Practical Scenarios — How SDF Governance Works in Practice

These three scenarios demonstrate how SDF classification governs real MCP Server interactions through the complete CSM cycle — preventing both over-action and under-action.

Scenario 1: Signal-Level Monitoring — CSM Detect Phase

  1. AI agent calls get_environment_info — learns environment context.
  2. Agent calls execute_dql with a metrics query — receives CPU, memory, response time data.
  3. SDF Classification: Signal — all values within baseline.
  4. CSM Phase: Detect — continuous monitoring, no anomaly identified.
  5. Agent action: Reports "All systems nominal. CPU at 68%, within baseline range of 55–75%."
  6. Governance: OBSERVE permission only. Agent does NOT send notifications, create tickets, or trigger workflows.
  7. CSM Verify: Baseline confirmed — no deviation to resolve.
  8. CSM Report: Monitoring record logged with SDF classification: Signal. No escalation required.

Why this matters: Without SDF governance, an eager agent might flag 68% CPU as "high" and send an unnecessary Slack alert. SDF classification prevented that false positive from becoming operational noise. In the CSM model, the Detect phase completed cleanly — no escalation to Respond.

Scenario 2: Defect-Level Investigation — CSM Respond Phase

  1. Agent calls list_problems — receives a recent Davis event (not yet correlated into a problem).
  2. Agent calls execute_dql to query related events — finds a Slowdown event on Service X.
  3. Agent calls get_entity_details to check Service X topology — discovers Service X depends on Database Y.
  4. Agent calls get_logs_for_entity on Database Y — finds connection timeout errors.
  5. SDF Classification: Defect — qualified anomaly detected, but not confirmed as a problem by Davis.
  6. CSM Phase: Respond — anomaly qualified, investigation initiated, stakeholders notified.
  7. Agent action: Sends Slack message: "Investigating: Service X slowdown potentially caused by Database Y connection timeouts."
  8. Agent creates investigation ticket but does NOT trigger remediation.
  9. CSM Verify: Agent monitors for Davis correlation — if the anomaly self-resolves or Davis escalates to a Problem, the classification updates accordingly.
  10. CSM Report: Investigation record logged with full LOCATE reasoning chain: Layer (response time + DB logs), Origin (Database Y), Context (timeout pattern outside baseline), Architecture (Service X → Database Y dependency).

Governance: INVESTIGATE permission. Agent alerts and investigates but cannot remediate. The LOCATE protocol guided the investigation path, and the CSM Respond phase ensures the investigation is documented and traceable.

Scenario 3: Failure-Level Remediation — CSM Remediate Phase

  1. Agent calls list_problems — finds an active P1 Davis problem.
  2. Agent calls get_problem_details — confirms root cause: memory leak in Container Z, affecting Service X and all downstream consumers.
  3. SDF Classification: Failure — Davis has confirmed root cause and mapped the impact chain.
  4. CSM Phase: Remediate — confirmed problem triggers pre-approved corrective action.
  5. Agent checks: Does a pre-approved remediation playbook exist for "container memory leak"? Yes — restart affected pods.
  6. Agent calls create_workflow_for_notification to create a P1 incident notification workflow.
  7. Agent calls send_slack_message to the incident channel: "P1 CONFIRMED: Memory leak in Container Z. Root cause verified by Davis. Executing pre-approved remediation: pod restart."
  8. CSM Verify: Agent re-queries metrics via execute_dql after remediation window. CPU and memory return to baseline. Davis auto-resolves the problem. SDF classification returns to Signal.
  9. CSM Report: Full lifecycle documented — Detection (memory spike observed), Response (Davis event generated), Remediation (pre-approved pod restart executed), Verification (baseline recovery confirmed), Report (audit chain: tool calls, SDF classifications, permissions, actions, outcomes).

Governance: REMEDIATE permission within pre-approved scope. Agent acts but stays within guardrails. Novel remediation requires human approval. The complete CSM cycle executed through MCP tools, governed by SDF classification, and fully auditable.

These three scenarios demonstrate a principle that Wolfgang Heider will recognize from his work on progressive delivery and CI/CD pipeline architecture: classification-driven progression. Just as progressive delivery gates software releases through staged validation — ensuring each promotion is earned — SDF gates agent actions through staged classification. Signal → Defect → Failure. Each escalation is validated, each action is authorized, each outcome is auditable. The same engineering rigor that governs how code moves through delivery pipelines now governs how AI agents move through observability data.

――――――――――――――――――――――――――――――――――――――――

6. The Seven Governance Rules We Established

Every SDF-governed MCP interaction follows these rules. Each rule maps to a CSM principle — ensuring that agent governance and operational governance are unified.

Rule 1 — Classification Determines Permission. The SDF layer of the data determines the agent's authorized action scope. Signal = observe (CSM Detect). Defect = investigate (CSM Respond). Failure = remediate, guardrailed (CSM Remediate).

Rule 2 — No Escalation Without Classification. An agent cannot jump from observation to remediation without confirming the SDF classification changed. Every escalation must be traceable to a classification transition — mirroring the CSM requirement that every phase transition is documented.

Rule 3 — Guardrailed Remediation Only. Even at the Failure level, remediation is limited to pre-approved playbooks with a defined blast radius. Novel remediation requires human approval. This enforces the CSM principle that corrective actions must be authorized and bounded.

Rule 4 — Classification Auditability. Every agent action traces back to the SDF classification that authorized it. Full audit chain: tool call → data returned → SDF classification → permission resolved → action taken. This directly supports CSM Report phase requirements and federal audit compliance.

Rule 5 — Severity Mapping Consistency. Davis event severity maps consistently to notification routing. Defect-level events route to investigation channels (CSM Respond). Failure-level problems route to incident channels (CSM Remediate). No cross-routing.

Rule 6 — Topology-Aware Classification. SDF classification requires topology context. Agents must query entity relationships (via get_entity_details or Smartscape data) before making classification-dependent decisions. This ensures the CSM Respond phase includes full dependency analysis.

Rule 7 — Feedback Loop Integration. Every agent action outcome feeds back into classification refinement. If a Defect-classified event escalates to Failure, the classification history informs future pattern matching. This is the CSM Verify-to-Report feedback loop — lessons learned are fed back into controls and documentation. The framework self-improves.

――――――――――――――――――――――――――――――――――――――――

7. The Results We Achieved

Certification Results — Proving the Classification Framework

  • Candidates trained on SDF classification achieved exam scores of 85 and above — substantially outperforming the community baseline.
  • The SDF framework gave candidates a reasoning methodology applicable to any scenario-based question.
  • The LOCATE protocol eliminated the "I know the platform but can't solve the scenario" gap reported since the October 2024 exam redesign.
  • The 30-day structured mastery path proved repeatable across multiple candidate cohorts.
  • Facilitator-independent deployment: the ecosystem runs without requiring the original architect to deliver every session.
  • The Seven Pillars training architecture (P0–P6), the 53 scenario drills, and the 27 unified taxonomies provided the structural depth that made certification outcomes consistent and measurable.

Integration Architecture Results

ServiceNow Integration

SDF Layer

CSM Phase

Function

Service Graph Connector

Signal

Detect

Topology synchronization

Event Management Connector

Signal → Defect

Detect → Respond

Qualified anomalies cross the boundary

Incident Integration App

Defect → Failure

Respond → Remediate

Confirmed problems with root cause context

Dynatrace Workflows for ServiceNow

All layers

All phases

Orchestration across SDF/CSM spectrum

Service Observability Connector

Signal

Detect

Context enrichment

Analysis AI Agent Connector

Failure

Remediate

Agentic root cause analysis

 

  • Mapped all 17 Dynatrace workflow connectors (Kubernetes, AWS, Slack, GitHub, Jira, Microsoft 365, Microsoft Teams, PagerDuty, Red Hat Ansible, Microsoft Entra ID, Microsoft Azure, ServiceNow, GitLab, Snowflake, Jenkins, Email, Ownership) to SDF layers and CSM phases.
  • Validated structural isomorphism between SDF classification and NIST IR 8011's automated compliance assessment methodology.

MCP Server Governance Results

  • Classified all 14 MCP Server tools and 6 agent-level tools by SDF layer and CSM phase.
  • Developed 7 governance rules binding agent permissions to data classification and CSM operational governance.
  • Designed the SDF Governance Guard architecture: Classification Engine, Permission Resolver, Action Validator, Audit Logger, Feedback Collector.
  • Validated three practical scenarios demonstrating the complete CSM cycle executing through MCP tools, governed by SDF classification.

The Strategic Finding

The CSM operational cycle and the SDF classification taxonomy are structurally isomorphic — they describe the same governance logic at different layers. CSM governs operational process. SDF governs data classification. The MCP Server provides the agent access layer. At the Customer Zero mesh point, all three converge: AI agents operate through MCP tools, SDF classifies the data to determine permissions, and every action maps to a CSM phase — creating a unified governance model that makes autonomous operations auditable, deterministic, and ready for federal deployment.

――――――――――――――――――――――――――――――――――――――――

8. Why This Matters for Dynatrace — The Bigger Picture

Dynatrace Intelligence is fusing deterministic AI with agentic AI for autonomous operations. The MCP Server is the connective tissue that makes this accessible to every AI ecosystem. But autonomous operations without classification governance is like giving someone the keys to the car without traffic laws.

Andreas Grabner has championed observability-driven engineering and autonomous cloud operations for over a decade — the vision that organizations should be able to deploy, operate, and remediate with confidence because observability makes the invisible visible. SDF Governance Guard is the governance layer that makes that vision safe for agentic AI. When AI agents operate autonomously through the MCP Server, every action they take must be as observable, classified, and auditable as the telemetry they're analyzing. The autonomous cloud future that Andreas Grabner has been building toward doesn't just need observability — it needs classification governance. SDF provides it.

Signal–Defect–Failure classification IS those traffic laws. The Continuous Security Methodology IS the rules of the road.

What we built is not just a governance framework for the MCP Server — it's a unified operational model where human governance (CSM) and agent governance (SDF) operate on the same classification logic. The Customer Zero deployment proved it works. The 85+ exam scores proved the classification framework is learnable by humans. The three scenarios proved it governs agent behavior through MCP tools.

For federal environments — where FedRAMP authorization and NIST compliance require every automated action to be classified and auditable — the CSM + SDF Governance Guard makes the MCP Server deployable.

The platform already speaks Signal–Defect–Failure. The CSM cycle already governs how our federal customers operate. We unified both — and now the AI agents operate within the same governance model as the humans they support.

――――――――――――――――――――――――――――――――――――――――

9. The Federal Compliance Case

Dynatrace is FedRAMP authorized. Federal agencies are deploying the platform in environments governed by NIST, FISMA, and continuous monitoring mandates. As these agencies adopt the MCP Server for agentic operations, a compliance requirement emerges: All observability data in federal environments should be classifiable using a taxonomy that maps to NIST IR 8011's defect check methodology. SDF provides that mapping.

This is not a theoretical position — it's a structural fact. SDF classification is isomorphic to NIST IR 8011's automated assessment pipeline. Every MCP tool call that returns SDF-classified data produces output that is directly assessable under federal compliance frameworks. No translation layer required. No compliance bolt-on needed. The governance is native to the classification.

The implication for Dynatrace is strategic: if federal customers begin requiring SDF classification as a deployment standard for MCP-connected environments, the platform has a business incentive to formalize it — in documentation, in certification, and in the MCP Server architecture itself. Federal compliance requirements create top-down institutional pressure that complements the bottom-up community adoption already underway through initiatives like this MCP Server Challenge.

SDF Governance Guard isn't asking Dynatrace to build something new. It's asking Dynatrace to name and formalize what Davis AI already does — the same way we did. The classification pipeline exists. The compliance mapping exists. The MCP Server exists. SDF connects them into a governance model that makes autonomous operations deployable at federal scale.

――――――――――――――――――――――――――――――――――――――――

10. Repeatability and Resources

This framework is designed to be adopted by any organization connecting AI agents to Dynatrace via the MCP Server:

  • Repeatable: Apply SDF classification and CSM phase mapping to any MCP tool interaction using the permission matrix and governance rules in this submission.
  • Proven: 85+ exam scores validate the classification framework for human reasoning; the same logic governs agent reasoning and CSM operational cycles. The Seven Pillars architecture (P0–P6), the 53 scenario drills, the 30-day mastery path, and facilitator-independent deployment prove this isn't a prototype — it's a production-grade training and governance ecosystem.
  • Practical: Every MCP tool has a defined SDF classification, CSM phase mapping, and permission level — ready to implement.
  • Extensible: As new MCP tools are added, the SDF classification decision tree applies — classify the data layer, resolve the permission, map to the CSM phase, validate the action.
  • Auditable: Every agent action traces back to a classification decision and a CSM phase — full audit chain, structurally isomorphic to NIST IR 8011's defect check methodology, FedRAMP continuous monitoring-ready.
  • Aligned: Dynatrace Intelligence + MCP Server + CSM + SDF + NIST IR 8011 = autonomous operations governed at every layer — classification, permission, action, audit, and federal compliance.

The full ecosystem documentation — including the Integrated CSM Model, the SDF Connector Classification Map, the SDF for Agentic Operations Governance Framework, the NIST IR 8011 Compliance Crosswalk, the Seven Pillars Training Architecture, and the 53 Scenario Drill Library — is available upon request.

――――――――――――――――――――――――――――――――――――――――

11. A Note to the Judging Panel

To Wolfgang Beer, Wolfgang Heider, Gabriele HB, and Andreas Grabner — this submission is built on a conviction: the classification logic that already runs inside Davis AI is too important to remain unnamed and informal. Wolfgang Beer built the engine — the baseline calculation, event correlation, and root cause analysis pipeline — that makes SDF possible. Wolfgang Heider's work on progressive delivery and CI/CD architecture demonstrates that classification-driven progression is already a proven engineering pattern. Andreas Grabner's decade-long advocacy for observability-driven autonomous operations defines the exact future that needs classification governance. And Gabriele HB's commitment to product quality and community engagement is precisely the lens through which frameworks like SDF move from community innovation to platform capability.

We named what was already there. We formalized it. We proved it works — with 85+ exam scores, with a complete CSM operational cycle, with NIST IR 8011 structural isomorphism, and with three practical MCP scenarios that demonstrate classification-governed agent behavior. SDF Governance Guard is ready for the platform. We hope this submission demonstrates why.

――――――――――――――――――――――――――――――――――――――――

 

Randy Chambers

Dynatrace Practice Lead  |  Discipline Consulting Group LLC

rchambers@disciplineconsulting.com

1 REPLY 1

RWC
Participant

Here's my submission for the MCP Server Challenge: [paste your post URL here]
At Discipline Consulting Group, we built a Signal–Defect–Failure (SDF) classification framework that started as a certification training methodology — our candidates now score 85+ on the Dynatrace Practitioner exam using it. We then discovered that SDF classification governs every integration boundary between Dynatrace and its strategic partners (all 6 ServiceNow connectors, all 17 workflow connectors). So we extended it to the MCP Server: we classified all 14 MCP tools and 6 agent-level tools by SDF layer, and built a governance framework — the SDF Governance Guard — that uses data classification to determine what AI agents can see and do. For review or a comment: @wolfgang_beer, @wolfgang_heider, @GabrieleHB, @andreas_grabner

https://community.dynatrace.com/t5/AI/MCP-Server-Challenge-entry-6-SDF-Governance-Guard-v2/m-p/29838...


Full write-up with the Agent Permission Matrix, MCP tool governance map, three practical scenarios, and seven governance rules in the post. Looking forward to feedback!
— Randy Chambers, Dynatrace Practice Lead, Discipline Consulting Group LLC

Featured Posts