16 May 2023 09:14 AM - last edited on 20 Jun 2023 02:01 PM by Karolina_Linda
Dear All,
Starting from yesterday, a new message is apeareaing in the CMC events
After that, the platform was down for more than 20 hours.
Did anyone faced the same?
The support team is not helping on this.
BRs,
Solved! Go to Solution.
16 May 2023 09:30 AM - edited 16 May 2023 09:31 AM
Hi @Malaik , this is quite a new detection we have introduced in version 1.258. See here:
To better help you in the event of an unsuccessful and incomplete start of all Dynatrace Managed services on a cluster node, we've added additional alerting mechanisms. If you're alerted, please try to carry out the suggested action before reaching out to support. This should generally reduce problem resolution time. At first, when a cluster node can't receive OneAgent traffic, the affected node is highlighted with a red tile on the cluster deployment overview page and in the corresponding row in the cluster node deployment page. Additionally, a cluster event message is generated with the following content:
Summary: "A cluster node can't receive OneAgent traffic"
Description: "The cluster node id
can’t receive OneAgent traffic. Try to restart the cluster node. If this doesn’t fix the problem, generate the support archive and provide it to the Support team."
----
Have you tried to restart the node? Did all services come up? Especially the ActiveGate?
16 May 2023 09:38 AM
Thanks a lot @Radoslaw_Szulgo
After restarting the nodes, all platform was down for more than 20 hours, services (Cassandra and Elastic) having a big pain to comeUp
16 May 2023 09:39 AM
It was, anyway probably not writing data and malfunctioning. The event made you aware there's sth wrong. Have you checked if you have enough disk space and RAM memory so services can start?
Is there already a ticket?
16 May 2023 09:53 AM
Thanks again,
Everything was checked and now all nodes are working.
Yes we have a ticket opened.
17 May 2023 09:52 AM
Can you share what was the root cause to the community?
17 May 2023 10:21 AM
We are not sure as of now, but we are suspecting the storage (NFS).
The storage was not available at this time, so Elastic and Cassandra were not available.
BRs,