cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Looking to upgrade from Dynatrace Managed to SaaS? See how

Adding cluster node fails with "Cannot prepare network settings..."

andre_vdveen
DynaMight Champion
DynaMight Champion

Hi, we get the following error when trying to add a new node.


Adding new node with ip 10.36.0.222 failed. Reason: Cannot prepare network settings on Dynatrace cluster nodes. Error: Adding IP of this machine ("10.36.0.222") to cluster node "10.36.22.240" failed.
Does anyone have any idea what causes this and how to get around it?
12 REPLIES 12

BertEvo
Visitor

Hi Andre,

Is it possible that your node has multiple NIC's?

If so, it happens that the installer is trying to use e.g. the backup or mgt interface.

Maybe try forcing the IP address to correct one? 

bindToNetworkInterface = <IP.address>

Grtz,

Bert

Hi @BertEvo, no that was one of the first things I checked.

We managed to get past the original error but the node installation failed again with various other errors, each time we manage to get past those after the firewall team allowed open communication between all node IPs and not restricting to any specifc cluster ports that are listed in the docs, but we just faced yet another failed installation - this time, after almost 2.5 hours, it failed stating the API token is invalid!? :mind_blown:

Managed is going to be the death of me, honestly! Wish I could move the client to SaaS already 🤣

I agree it is highly annoying, that if you try again the next day to install the token is no longer valid 😭

It would help to have a good prerequisites check instead of failing at every next step. Educational it is 🙂

Kind regards, Frans Stekelenburg                 Certified Dynatrace Associate | measure.works, Dynatrace Partner

Peter_Youssef
Champion

Hi @andre_vdveen 

Kindly ensure the cluster nodes are in the same data center / subnet and the below ports are properly configured and Multi-node installation requirements are fulfilled:

Hoping it helps.

BR, 

Peter

Peter_Youssef
Champion

Hello @andre_vdveen 

PFA troubleshooting path for resolution:

Node:

  • Checking the configuration files through the first stable cluster node.
  • Ensuring cluster token is valid.
  • Through the UI ensuring the first cluster node configurations are valid as per the actual server insights "IP Address, Host Name, cluster token,...etc".
  • IP Conflict validation if exist.
  • Checking the log entries through "/var/log/dynatrace/" to validate what might block the node from joining the cluster, the log will be the shortest path pinpointing to the actual root cause.
  • Checking the cluster initialization logs under "/var/log/dynatrace/cluster/"

Network:

  • Checking Network connectivity between the nodes.
  • Checking firewall rules configured between the nodes that might block the connectivity.
  • Checking DNS resolution via nslookup, If DNS resolution fails, you may need to adjust your DNS configuration or adding entries to "/etc/hosts" to resolve the hostnames locally would be an applicable solution.
  • Checking if there are load balancer or proxy issues, if exist it should be configured properly to ensure the stable connectivity and constant synchronization between nodes.
  • Restarting the Dynatrace nodes after applying the highlighted options.

Hoping it adds value.

KR, 

Peter

fstekelenburg
DynaMight Pro
DynaMight Pro

It looks like the new node is in another network segment and there may be a firewall(s) between them.
Check the install logs, and especially this page: Cluster node ports — Dynatrace Managed Docs

Make sure all nodes can communicate with eachother on the indicated ports.

(You do not notice this requirement when the nodes are in the same "rack" :-))

Kind regards, Frans Stekelenburg                 Certified Dynatrace Associate | measure.works, Dynatrace Partner

andre_vdveen
DynaMight Champion
DynaMight Champion

Thanks to everyone who's responded with ideas and suggestions.
We covered all those points already though even before your replies 😉 but still the installation fails...network and firewall teams are adamant there's nothing wrong on their side, but everything points to comms being blocked.

Yes, some of the node hosts are in a different network segment, but the firewall rules seems to be correct and we even see comms in the firewall logs, yet something interferes with the traffic. Apart from the required firewalld and nftables, there's nothing on the hosts that could cause this.

It is the strangest situation I've come across in my career...

So what's the current status? Node still can't join the cluster?

Maybe share the installation logs?

Does the new node have the same specs (CPU, storage, mem) as the existing node?

What is the OS version you are using @andre_vdveen?
Not by any chance related to the RHEL9 issue? selinux in permissive/enforcing?

Kind regards, Frans Stekelenburg                 Certified Dynatrace Associate | measure.works, Dynatrace Partner

I wish that was the case, @fstekelenburg! But alas, it is not that either...this host is on RHEL 8.8, same as the other two new nodes that installed without any issues.

Whats does the log file say? My first thought still is a (firewall) connection thing. Either somewhere in between or on the nodes themselves.

Other things I think of is time-out value, a thing to maybe look at and try is setting a higher time-out limit. Sometimes when transfer is slower it times out. Solved: Re: Failed to install secondary node - Dynatrace Managed - Dynatrace Community

And with local firewall off and/or selinux off or in permissive mode?
Solved: Re: Dynatrace Managed installation failed with error : Installation of Nodekeeper failed wit...

Kind regards, Frans Stekelenburg                 Certified Dynatrace Associate | measure.works, Dynatrace Partner

I was in a split datacenter situation with firewalls where getting the right changes was a bit of a challenge. I used these simple scripts to check the ports and traffic on both sides.

Test if the ports are open. This can be run on both sides, from new host and an existing cluster node. This should result in connect or refuse, not a time-out. For instance if you are testing from 10.10.10.1 and/to 10.10.20.11:

t="10.10.20.11"; for i in 8019 8020 8021 8022 5701 5711 9042 7000 7001 9200 9300; do echo "# $t $i"; (nc -v -z -w5 $t $i 2>&1| egrep -v "Version") ;echo "###"; done

Mind you, the ports are only open on running cluster nodes and only open to other cluster node members (internal firewall). Not all are open/in use.

So to actually tell of the connection attempts reach the other server, you can use this command: on the other/receiving node:

tcpdump -nn -i ens192 src host 10.10.10.1

 
The 'port test' can also be placed in a loop, and for instance ran from a tmux session, on both ends. So the firewall guys can check the logs and see the pass/drop 🙂 And for you to see if anything changed.

t="10.10.20.11"; while true; do for i in 8019 8020 8021 8022 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 9042 7000 7001 9200 9300; do echo "# $t $i"; (nc -v -z -w5 $t $i 2>&1| egrep -v "Version") ;echo "###"; done; echo "#################"; sleep 60; done

 
P.S. use "sudo" when applicable 😉

Kind regards, Frans Stekelenburg                 Certified Dynatrace Associate | measure.works, Dynatrace Partner

Featured Posts