Solved: Re: Adding cluster node fails with "Cannot prepare network settings..."

andre_vdveen · ‎27 Nov 2024

Hi, we get the following error when trying to add a new node.


Adding new node with ip 10.36.0.222 failed. Reason: Cannot prepare network settings on Dynatrace cluster nodes. Error: Adding IP of this machine ("10.36.0.222") to cluster node "10.36.22.240" failed.

Does anyone have any idea what causes this and how to get around it?

BertEvo · ‎06 Dec 2024

Hi Andre,

Is it possible that your node has multiple NIC's?

If so, it happens that the installer is trying to use e.g. the backup or mgt interface.

Maybe try forcing the IP address to correct one?

bindToNetworkInterface = <IP.address>

Grtz,

Bert

andre_vdveen · ‎06 Dec 2024

Hi @BertEvo, no that was one of the first things I checked.

We managed to get past the original error but the node installation failed again with various other errors, each time we manage to get past those after the firewall team allowed open communication between all node IPs and not restricting to any specifc cluster ports that are listed in the docs, but we just faced yet another failed installation - this time, after almost 2.5 hours, it failed stating the API token is invalid!?

Managed is going to be the death of me, honestly! Wish I could move the client to SaaS already 🤣

fstekelenburg · ‎10 Dec 2024

I agree it is highly annoying, that if you try again the next day to install the token is no longer valid 😭

It would help to have a good prerequisites check instead of failing at every next step. Educational it is 🙂

Kind regards, Frans Stekelenburg Certified Dynatrace Associate | Cegeka.com, Dynatrace Partner

Peter_Youssef · ‎07 Dec 2024

Hi @andre_vdveen

Kindly ensure the cluster nodes are in the same data center / subnet and the below ports are properly configured and Multi-node installation requirements are fulfilled:

Hoping it helps.

BR,

Peter

Peter_Youssef · ‎07 Dec 2024

Hello @andre_vdveen

PFA troubleshooting path for resolution:

Node:

Checking the configuration files through the first stable cluster node.
Ensuring cluster token is valid.
Through the UI ensuring the first cluster node configurations are valid as per the actual server insights "IP Address, Host Name, cluster token,...etc".
IP Conflict validation if exist.
Checking the log entries through "/var/log/dynatrace/" to validate what might block the node from joining the cluster, the log will be the shortest path pinpointing to the actual root cause.
Checking the cluster initialization logs under "/var/log/dynatrace/cluster/"

Network:

Checking Network connectivity between the nodes.
Checking firewall rules configured between the nodes that might block the connectivity.
Checking DNS resolution via nslookup, If DNS resolution fails, you may need to adjust your DNS configuration or adding entries to "/etc/hosts" to resolve the hostnames locally would be an applicable solution.
Checking if there are load balancer or proxy issues, if exist it should be configured properly to ensure the stable connectivity and constant synchronization between nodes.
Restarting the Dynatrace nodes after applying the highlighted options.

Hoping it adds value.

KR,

Peter

fstekelenburg · ‎10 Dec 2024

It looks like the new node is in another network segment and there may be a firewall(s) between them.
Check the install logs, and especially this page: Cluster node ports — Dynatrace Managed Docs

Make sure all nodes can communicate with eachother on the indicated ports.

(You do not notice this requirement when the nodes are in the same "rack" :-))

Kind regards, Frans Stekelenburg Certified Dynatrace Associate | Cegeka.com, Dynatrace Partner

andre_vdveen · ‎10 Dec 2024

Thanks to everyone who's responded with ideas and suggestions.
We covered all those points already though even before your replies 😉 but still the installation fails...network and firewall teams are adamant there's nothing wrong on their side, but everything points to comms being blocked.

Yes, some of the node hosts are in a different network segment, but the firewall rules seems to be correct and we even see comms in the firewall logs, yet something interferes with the traffic. Apart from the required firewalld and nftables, there's nothing on the hosts that could cause this.

It is the strangest situation I've come across in my career...

BertEvo · ‎10 Dec 2024

So what's the current status? Node still can't join the cluster?

Maybe share the installation logs?

Does the new node have the same specs (CPU, storage, mem) as the existing node?

fstekelenburg · ‎11 Dec 2024

What is the OS version you are using @andre_vdveen?
Not by any chance related to the RHEL9 issue? selinux in permissive/enforcing?

Kind regards, Frans Stekelenburg Certified Dynatrace Associate | Cegeka.com, Dynatrace Partner

andre_vdveen · ‎11 Dec 2024

I wish that was the case, @fstekelenburg! But alas, it is not that either...this host is on RHEL 8.8, same as the other two new nodes that installed without any issues.

fstekelenburg · ‎12 Dec 2024

Whats does the log file say? My first thought still is a (firewall) connection thing. Either somewhere in between or on the nodes themselves.

Other things I think of is time-out value, a thing to maybe look at and try is setting a higher time-out limit. Sometimes when transfer is slower it times out. Solved: Re: Failed to install secondary node - Dynatrace Managed - Dynatrace Community

And with local firewall off and/or selinux off or in permissive mode?
Solved: Re: Dynatrace Managed installation failed with error : Installation of Nodekeeper failed wit...

Kind regards, Frans Stekelenburg Certified Dynatrace Associate | Cegeka.com, Dynatrace Partner

fstekelenburg · ‎13 Dec 2024

I was in a split datacenter situation with firewalls where getting the right changes was a bit of a challenge. I used these simple scripts to check the ports and traffic on both sides.

Test if the ports are open. This can be run on both sides, from new host and an existing cluster node. This should result in connect or refuse, not a time-out. For instance if you are testing from 10.10.10.1 and/to 10.10.20.11:

t="10.10.20.11"; for i in 8019 8020 8021 8022 5701 5711 9042 7000 7001 9200 9300; do echo "# $t $i"; (nc -v -z -w5 $t $i 2>&1| egrep -v "Version") ;echo "###"; done

Mind you, the ports are only open on running cluster nodes and only open to other cluster node members (internal firewall). Not all are open/in use.

So to actually tell of the connection attempts reach the other server, you can use this command: on the other/receiving node:

tcpdump -nn -i ens192 src host 10.10.10.1

The 'port test' can also be placed in a loop, and for instance ran from a tmux session, on both ends. So the firewall guys can check the logs and see the pass/drop 🙂 And for you to see if anything changed.

t="10.10.20.11"; while true; do for i in 8019 8020 8021 8022 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 9042 7000 7001 9200 9300; do echo "# $t $i"; (nc -v -z -w5 $t $i 2>&1| egrep -v "Version") ;echo "###"; done; echo "#################"; sleep 60; done

P.S. use "sudo" when applicable 😉

Kind regards, Frans Stekelenburg Certified Dynatrace Associate | Cegeka.com, Dynatrace Partner

fstekelenburg · ‎13 Jan 2025

Thanks for sharing this valuable feedback @andre_vdveen! This is very good to know, and a pointer for all Managed admin to address a contolled migration to the new methode. And not running into this wall as well.

Kind regards, Frans Stekelenburg Certified Dynatrace Associate | Cegeka.com, Dynatrace Partner

andre_vdveen · ‎08 Jan 2025

Support helped us resolve the issue while I was on leave!

It was caused by the older nodes using iptables and the new ones using nftables. Once we switched all over to nftables, the nodes were able to communicate and we could proceed with the installation.

The change was done on the file /etc/dynatrace.conf, where we had to update FIREWALL_TYPE to nftables, after which we had to execute /opt/dynatrace-managed/installer/reconfigure.sh on each node, one by one. Once this was done, comms was established and the installation was attempted again and succeeded.

This was a very confusing situation, partly because in older versions that was running on the original 3 nodes, Managed installers checked for iptables, but in version 1.186+, it wants to use nftables and firewall.d.
We tried switching all nodes to use nftables at the OS level, but of course we missed the part of updating the dynatrace.conf file to do the same, hence it was trying to add rules to iptables and not nftables, and therefore comms was failing because the rules were not added correctly.