Dynatrace Managed Q&A
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Looking to upgrade from Dynatrace Managed to SaaS? See how

Proactive alerting for Dynatrace Managed Backup failures (Cassandra/Elasticsearch)

Hugo1984
Participant

Recently, our Dynatrace Managed on-prem backup failed for several days. While the status tile in the Cluster Management Console (CMC) turned red, we did not receive any proactive notifications, leading to a gap in our backup history.

I am looking for the most reliable way to alert our team when a backup fails. I have the following questions:

Native Alerting: Is there a built-in 'Event' or 'Problem' type in Dynatrace Managed that can be triggered specifically by backup failures (for both Cassandra and Elasticsearch)?

Log-based Alerting: If we choose to use logs for alerting:

In which specific log files (on which nodes) is the backup status (success/failure) recorded?

What is the recommended method to ingest these local cluster logs back into Dynatrace for log-based alerting without creating a loop?

API Approach: Is it considered best practice to poll the /clusterapi/v1.0/backups/status endpoint for monitoring, and can this be integrated into a standard Dynatrace Alerting Profile?

Alternative: Are there specific 'Self-Monitoring' metrics available that we can use to create a Custom Chart and Alert for backup health?

3 REPLIES 3

rastislav_danis
DynaMight Pro
DynaMight Pro

I haven't find other source about cluster backup status than Server.0.0.log logs on cluster nodes.

Alanata a.s.

t_pawlak
Leader

I’m not aware of a native built-in Problem/Event type dedicated specifically to failed Cassandra/Elasticsearch backups in Dynatrace Managed that you can directly wire into a standard Alerting Profile. From the documentation and Community threads I could find, backup health is mainly checked through CMC status, node logs, and administrative endpoints/commands, rather than through a ready-made backup-failure problem type.
here is topis releated with backup troublehooting:
Managed Cluster backup troubleshooting - most common issues 

From a practical perspective, the most reliable options today are:

Polling /clusterapi/v1.0/backups/status on a schedule, or running your own health-check script.
Log-based monitoring if you want to detect explicit backup failures from cluster logs.
Community feedback points mainly to Server.0.0.log on cluster nodes as the place where backup-related information can be found.
Here you have all endpoints:
Perform Infrastructure Health Checks in Dynatrace Managed 

Base on this topisc my recommendation would be:

  • poll the backup status via API or a script,
  • convert the result into a custom metric or custom event for alerting,
    and alert on that.
  • This is also consistent with older Managed health-check guidance in Community, which relies on scripts, REST checks, and node-level commands rather than on a built-in “backup failed” metric/event.

For logs:

  • check Server.0.0.log on the cluster nodes,
  • and also keep in mind that Managed backup troubleshooting often comes down to backup path/NFS visibility, permissions, and node-to-node communication, so it’s worth monitoring those conditions too—not only the final backup result

Regarding self-monitoring metrics, I think dedicated backup-health metric that supports out-of-the-box alerting not exist. IMO API-based monitoring + custom alerting rather than searching for a hidden built-in metric is better option

 

Hugo1984
Participant

Dear  t_pawlak

Thank you for your detailed answer. I will analyze it, and hopefully, I can easily implement the solution in our organization.

Featured Posts