27 Nov 2024 03:27 PM
Hi, we get the following error when trying to add a new node.
Adding new node with ip 10.36.0.222 failed. Reason: Cannot prepare network settings on Dynatrace cluster nodes. Error: Adding IP of this machine ("10.36.0.222") to cluster node "10.36.22.240" failed.
06 Dec 2024 06:26 PM
Hi Andre,
Is it possible that your node has multiple NIC's?
If so, it happens that the installer is trying to use e.g. the backup or mgt interface.
Maybe try forcing the IP address to correct one?
bindToNetworkInterface = <IP.address>
Grtz,
Bert
06 Dec 2024 07:40 PM
Hi @BertEvo, no that was one of the first things I checked.
We managed to get past the original error but the node installation failed again with various other errors, each time we manage to get past those after the firewall team allowed open communication between all node IPs and not restricting to any specifc cluster ports that are listed in the docs, but we just faced yet another failed installation - this time, after almost 2.5 hours, it failed stating the API token is invalid!?
Managed is going to be the death of me, honestly! Wish I could move the client to SaaS already 🤣
10 Dec 2024 10:23 AM
I agree it is highly annoying, that if you try again the next day to install the token is no longer valid 😭
It would help to have a good prerequisites check instead of failing at every next step. Educational it is 🙂
07 Dec 2024 04:28 PM
Kindly ensure the cluster nodes are in the same data center / subnet and the below ports are properly configured and Multi-node installation requirements are fulfilled:
Hoping it helps.
BR,
Peter
07 Dec 2024 04:53 PM
Hello @andre_vdveen
PFA troubleshooting path for resolution:
Node:
Network:
Hoping it adds value.
KR,
Peter
10 Dec 2024 10:05 AM
It looks like the new node is in another network segment and there may be a firewall(s) between them.
Check the install logs, and especially this page: Cluster node ports — Dynatrace Managed Docs
Make sure all nodes can communicate with eachother on the indicated ports.
(You do not notice this requirement when the nodes are in the same "rack" :-))
10 Dec 2024 03:23 PM
Thanks to everyone who's responded with ideas and suggestions.
We covered all those points already though even before your replies 😉 but still the installation fails...network and firewall teams are adamant there's nothing wrong on their side, but everything points to comms being blocked.
Yes, some of the node hosts are in a different network segment, but the firewall rules seems to be correct and we even see comms in the firewall logs, yet something interferes with the traffic. Apart from the required firewalld and nftables, there's nothing on the hosts that could cause this.
It is the strangest situation I've come across in my career...
10 Dec 2024 03:51 PM
So what's the current status? Node still can't join the cluster?
Maybe share the installation logs?
Does the new node have the same specs (CPU, storage, mem) as the existing node?
11 Dec 2024 03:39 PM
What is the OS version you are using @andre_vdveen?
Not by any chance related to the RHEL9 issue? selinux in permissive/enforcing?
11 Dec 2024 04:11 PM
I wish that was the case, @fstekelenburg! But alas, it is not that either...this host is on RHEL 8.8, same as the other two new nodes that installed without any issues.
12 Dec 2024 10:20 AM
Whats does the log file say? My first thought still is a (firewall) connection thing. Either somewhere in between or on the nodes themselves.
Other things I think of is time-out value, a thing to maybe look at and try is setting a higher time-out limit. Sometimes when transfer is slower it times out. Solved: Re: Failed to install secondary node - Dynatrace Managed - Dynatrace Community
And with local firewall off and/or selinux off or in permissive mode?
Solved: Re: Dynatrace Managed installation failed with error : Installation of Nodekeeper failed wit...
13 Dec 2024 01:15 PM
I was in a split datacenter situation with firewalls where getting the right changes was a bit of a challenge. I used these simple scripts to check the ports and traffic on both sides.
Test if the ports are open. This can be run on both sides, from new host and an existing cluster node. This should result in connect or refuse, not a time-out. For instance if you are testing from 10.10.10.1 and/to 10.10.20.11:
t="10.10.20.11"; for i in 8019 8020 8021 8022 5701 5711 9042 7000 7001 9200 9300; do echo "# $t $i"; (nc -v -z -w5 $t $i 2>&1| egrep -v "Version") ;echo "###"; done
Mind you, the ports are only open on running cluster nodes and only open to other cluster node members (internal firewall). Not all are open/in use.
So to actually tell of the connection attempts reach the other server, you can use this command: on the other/receiving node:
tcpdump -nn -i ens192 src host 10.10.10.1
The 'port test' can also be placed in a loop, and for instance ran from a tmux session, on both ends. So the firewall guys can check the logs and see the pass/drop 🙂 And for you to see if anything changed.
t="10.10.20.11"; while true; do for i in 8019 8020 8021 8022 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 9042 7000 7001 9200 9300; do echo "# $t $i"; (nc -v -z -w5 $t $i 2>&1| egrep -v "Version") ;echo "###"; done; echo "#################"; sleep 60; done
P.S. use "sudo" when applicable 😉