27 Apr 2026 02:01 PM
I'm monitoring a Windows Server 2025 host with OpenTelemetry Collector Contrib (v0.150.1) and sending metrics to Dynatrace via OTLP. Most metrics are flowing correctly (CPU, memory, disk, filesystem), but two key issues:
✅ system.cpu.utilization (10s) ✅ system.memory.utilization (10s) ✅ system.processes.count (10s, 5m, 1h) ✅ system.processes.created (10s, 5m, 1h) ✅ system.uptime (1h) ✅ system.cpu.load_average.1m/5m/15m (across intervals) ✅ system.memory.limit (1h) ✅ system.cpu.logical.count & system.cpu.physical.count (1h) ✅ All Windows perfcounter metrics (transformed to windows.* namespace) ✅ All system resource attributes (host.name, host.arch, os.type, os.description, os.version, os.build.id, etc.)
Verified: Metrics are reaching Dynatrace (visible in (configured in 3 separate scraper intervals) 2. ✅ Removed cardinality filtering to ensure all metrics pass through 3. ✅ Verified both system.processes.count and system.processes.created metrics are in the config 4. ✅ Verified resource detection is enabled with 13 attributes 5. ✅ Confirmed sending_queue and batch processors are configured 6. ✅ Checked that metrics have correct data types and values 7. ✅ Ensured all metrics flow without errors in collector log with 13 attributes 4. ✅ Confirmed sending_queue and batch processors are configured 5. ✅ Checked that metrics have correct data types and values 6. ✅ Ensured all metrics flow without errors
For Process metrics: system.processes.count and system.processes.created are configured, exported, and flow without errors. Why don't they appear in Dynatrace UI?
For Health Status: What does Dynatrace need to calculate and display health status for a Windows host?
Processors pipeline:
[batch] → [filter (idle CPU)] → [transform (cardinality cleanup)] → [cumulativetodelta] → [metricstransform (perfcounter renaming)] → [resourcedetection] → [otlp_http/dynatrace]
Exporter:
otel_dynatrace_only.yaml
extensions: health_check: endpoint: 0.0.0.0:13133 receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 hostmetrics/10s: collection_interval: 10s scrapers: cpu: metrics: system.cpu.utilization: enabled: true system.cpu.time: enabled: false memory: metrics: system.memory.utilization: enabled: true system.memory.usage: enabled: false network: metrics: system.network.io: enabled: true system.network.packets: enabled: true system.network.errors: enabled: true system.network.dropped: enabled: true system.network.connections: enabled: true processes: metrics: system.processes.count: enabled: true system.processes.created: enabled: true load: metrics: system.cpu.load_average.1m: enabled: true hostmetrics/5m: collection_interval: 5m scrapers: cpu: metrics: system.cpu.time: enabled: true memory: metrics: system.memory.usage: enabled: true disk: metrics: system.disk.io: enabled: true system.disk.operations: enabled: true system.disk.io_time: enabled: true system.disk.operation_time: enabled: true network: metrics: system.network.io: enabled: true system.network.packets: enabled: true system.network.errors: enabled: true system.network.connections: enabled: true system.network.dropped: enabled: true filesystem: include_devices: match_type: strict devices: ["C:", "D:"] metrics: system.filesystem.utilization: enabled: true system.filesystem.inodes.usage: enabled: true paging: metrics: system.paging.usage: enabled: true system.paging.operations: enabled: true process: mute_process_all_errors: true metrics: process.cpu.utilization: enabled: true process.cpu.time: enabled: true process.memory.usage: enabled: true process.memory.virtual: enabled: true process.disk.io: enabled: true processes: metrics: system.processes.count: enabled: true system.processes.created: enabled: true load: metrics: system.cpu.load_average.5m: enabled: true hostmetrics/1h: collection_interval: 1h scrapers: memory: metrics: system.memory.limit: enabled: true cpu: metrics: system.cpu.logical.count: enabled: true system.cpu.physical.count: enabled: true system: metrics: system.uptime: enabled: true load: metrics: system.cpu.load_average.15m: enabled: true processes: metrics: system.processes.count: enabled: true system.processes.created: enabled: true windowsperfcounters: collection_interval: 30s perfcounters: - object: "Processor" instances: ["_Total"] counters: - name: "% Processor Time" - name: "% Privileged Time" - name: "Interrupts/sec" - object: "Memory" counters: - name: "Available MBytes" - name: "% Committed Bytes In Use" - name: "Cache Bytes" - object: "Process" instances: - "svchost" - "lsass" - "csrss" - "services" - "sqlservr" - "w3wp" - "otelcol-contrib" counters: - name: "% Processor Time" - name: "% Privileged Time" - name: "Working Set - Private" - name: "Private Bytes" - name: "Thread Count" - name: "Handle Count" - object: "PhysicalDisk" instances: ["_Total"] counters: - name: "% Disk Time" - name: "Disk Bytes/sec" - name: "Avg. Disk Queue Length" windows_event_log: channel: System start_at: end processors: batch: timeout: 10s send_batch_size: 1024 filter: metrics: datapoint: - metric.name == "system.cpu.utilization" and attributes["state"] == "idle" transform: error_mode: ignore metric_statements: - context: datapoint statements: - delete_key(resource.attributes, "process.cgroup") where IsMatch(metric.name, "^process\\..*") - delete_key(resource.attributes, "process.command") where IsMatch(metric.name, "^process\\..*") - delete_key(resource.attributes, "process.executable.path") where IsMatch(metric.name, "^process\\..*") - delete_key(resource.attributes, "process.owner") where IsMatch(metric.name, "^process\\..*") - delete_key(resource.attributes, "process.parent_pid") where IsMatch(metric.name, "^process\\..*") - delete_key(resource.attributes, "process.command_args") where IsMatch(metric.name, "^process\\..*") - delete_key(datapoint.attributes, "device") where datapoint.attributes["device"] == "" filter/delete-metrics: metric_conditions: - datapoint.attributes["low-memory-process"] != nil cumulativetodelta: max_staleness: 25h metricstransform: transforms: - include: '^\\\\Processor\(_Total\)\\\\% Processor Time$' action: update new_name: windows.processor.time - include: '^\\\\Processor\(_Total\)\\\\% Privileged Time$' action: update new_name: windows.processor.privileged_time - include: '^\\\\Processor\(_Total\)\\\\Interrupts/sec$' action: update new_name: windows.processor.interrupts - include: '^\\\\Memory\\\\Available MBytes$' action: update new_name: windows.memory.available - include: '^\\\\Memory\\\\% Committed Bytes In Use$' action: update new_name: windows.memory.committed_usage - include: '^\\\\Memory\\\\Cache Bytes$' action: update new_name: windows.memory.cache - include: '^\\\\Process\(.*\)\\\\% Processor Time$' action: update new_name: windows.process.cpu_time - include: '^\\\\Process\(.*\)\\\\% Privileged Time$' action: update new_name: windows.process.privileged_time - include: '^\\\\Process\(.*\)\\\\Working Set - Private$' action: update new_name: windows.process.working_set_private - include: '^\\\\Process\(.*\)\\\\Private Bytes$' action: update new_name: windows.process.private_bytes - include: '^\\\\Process\(.*\)\\\\Thread Count$' action: update new_name: windows.process.threads - include: '^\\\\Process\(.*\)\\\\Handle Count$' action: update new_name: windows.process.handles - include: '^\\\\PhysicalDisk\(_Total\)\\\\% Disk Time$' action: update new_name: windows.disk.time - include: '^\\\\PhysicalDisk\(_Total\)\\\\Disk Bytes/sec$' action: update new_name: windows.disk.throughput - include: '^\\\\PhysicalDisk\(_Total\)\\\\Avg\. Disk Queue Length$' action: update new_name: windows.disk.queue_length resourcedetection: detectors: ["system"] system: resource_attributes: host.arch: enabled: true host.id: enabled: true host.name: enabled: true host.ip: enabled: true host.interface: enabled: true host.mac: enabled: true host.cpu.model.name: enabled: true os.type: enabled: true os.description: enabled: true os.name: enabled: true os.version: enabled: true os.build.id: enabled: true exporters: otlp_http/dynatrace: endpoint: "https://{environmentid}.live.dynatrace.com/api/v2/otlp" headers: Authorization: "Api-Token {DYNATRACE_API_TOKEN}" sending_queue: batch: min_size: 3000 max_size: 3000 flush_timeout: 60s service: extensions: [health_check] pipelines: metrics: receivers: [hostmetrics/10s, hostmetrics/5m, hostmetrics/1h, windowsperfcounters] processors: [batch, filter, transform, filter/delete-metrics, cumulativetodelta, metricstransform, resourcedetection] exporters: [otlp_http/dynatrace] logs: receivers: [windows_event_log, otlp] processors: [batch, resourcedetection] exporters: [otlp_http/dynatrace]
The host metrics themselves are healthy and complete. The question is whether Dynatrace requires additional signals, different metric frequencies, or specific configuration to calculate overall host health status.
Any guidance appreciated! 🙏
27 Apr 2026 04:09 PM
UPDATE: I also tried the the Dynatrace OpenTelemetry Collector Distribution. However, it seems that this distribution is not capable of collecting the Windows Event Logs.
Featured Posts