Solved: Dynatrace Managed NAM ICMP (ping)

hyperdev · ‎02 Dec 2024

Looking for tips on how to monitor thru NAM a large number of hosts (15k).

Need to generate an event for each hosts going down or up.

PedroSantos · ‎02 Dec 2024

Well, as far as I know you can do this one of three ways:
Option 1: Add their IPs manually.

I assume this isn't a very attractive option given the number of hosts.

--

Option 2:

Use a filter expression. If all of these hosts belong to one or mroe host groups, for example, you can add it in the config:

More info on filter expressions can be found here.

You'll likely find something there that will serve your purpose, but if not...

---

Option 3:

Create a script that makes use of the V2 API and creates NAM en masse. On the old interface, go here:

And find this:

You'll need a token with the right permissions of course. But you can test the API there and then create a local script (Python maybe?) to call that API repeatedly adjusting the json body of the request for all your different monitors.

To make an error is human. To spread the error across all servers in an automated way is DevOps.

hyperdev · ‎02 Dec 2024

We just did the V2 api version.

2 Active gates

50 end points

ping cycle 5mn

Each with 700 max host per targetList

result:

10% hosts are pinged

Of the 10% cycle range from 300 seconds (Ok) to 2400 seconds (not Ok)

Method of exploring, on each active gate:

tcpdump --immediate-mode -l -i eth0 "icmp[0] == 8" | tee tcpdump/icmp_request_active_gate_X.txt

The expected flow from tcpdump should be 15000/300/2 per ActiveGate i.e. 25 lines per second.

Actual printout was sporadic.

Any idea what we did wrong ?

one json endpoint:

{
"description": "NON_PRODUCTION_0",
"enabled": true,
"entityId": "MULTIPROTOCOL_MONITOR-69CDF7C646DF6400",
"frequencyMin": 5,
"locations": [
"SYNTHETIC_LOCATION-03A5657E489F280A"
],
"name": "NON_PRODUCTION_0",
"performanceThresholds": {
"enabled": true,
"thresholds": null
},
"steps": [{
"constraints": [{
"properties": {
"operator": "=",
"value": 100
},
"type": "SUCCESS_RATE_PERCENT"
}
],
"name": "NON_PRODUCTION_0",
"properties": {
"EXECUTION_TIMEOUT": "PT3S",
"ICMP_NUMBER_OF_PACKETS": 3,
"ICMP_PACKET_SIZE": 8,
"ICMP_TIMEOUT_FOR_REPLY": "PT2S",
"ICMP_TIME_TO_LIVE": 255
},
"requestConfigurations": [{
"constraints": [{
"properties": {
"operator": "=",
"value": 100
},
"type": "ICMP_SUCCESS_RATE_PERCENT"
}
]
}
],
"requestType": "ICMP",
"targetFilter": null,
"targetList": [
"HOST_000",
....,
"HOST_700"
]
}
],
"syntheticMonitorOutageHandlingSettings": {
"globalConsecutiveOutageCountThreshold": 1,
"globalOutages": true,
"localConsecutiveOutageCountThreshold": null,
"localLocationOutageCountThreshold": null,
"localOutages": false
},
"tags": [{
"context": "CONTEXTLESS",
"key": "TAG_NON_PRODUCTION",
"source": "USER",
"value": null
}
],
"type": "MULTI_PROTOCOL"
}

Mizső · ‎03 Dec 2024

Hi @hyperdev,

Have you checked the NAM limitations?

https://docs.dynatrace.com/docs/shortlink/network-availability-monitoring#limitations

If yes you should raise a support ticket.

Best regards,

Mizső

Dynatrace Community RockStar 2024, Certified Dynatrace Professional

hyperdev · ‎03 Dec 2024

I need some explanations on the limitations:

The maximum number of network activities executed per network availability monitor is 1,000. Network activity is a single DNS request, single TCP request, or single ICMP packet.

How can I specify < 1000 single icmp packet on the json example above ?

Do I need to specify one target per step and < 1000 steps per json file ?

Thanks.

Jacek_Janowicz · ‎03 Dec 2024

Hi @hyperdev
Thanks for raising that question.

For NAM ping tests, you could define the number of packets used during a single test execution against a single target.
In your file, it is 3:
"ICMP_NUMBER_OF_PACKETS": 3,
As your target list contains 700 (to be precise, 701) targets,
"targetList": [
"HOST_000",
....,
"HOST_700"
]
That means sending 3 packets to each, which means 701 *3 = 2103 packets. So, to meet the condition, we recommend breaking down the configuration into 3 NAM ICMP monitors.

However, perhaps we need to go one step further. In your initial post, you have mentioned that your goal is:

@hyperdev wrote:

Looking for tips on how to monitor thru NAM a large number of hosts (15k).

Need to generate an event for each hosts going down or up.

I understand it as an expectation of a separate Problem (and notification in case of failure of any of your hosts). That makes me think that creating a separate NAM monitor for each of your hosts may work better. That may require increasing the limit of NAM monitors for a single environment, but we can handle that.

As 15k hosts is a really huge number, it may require a special approach and planning. I was wondering if you'd be interested in having a call with me and the team to discuss the details of your use case. I believe that after that we'll be able to propose the most accurate approach.

Best Regards,

Jacek

hyperdev · ‎04 Dec 2024

Would love to.

I am situated in UTC+2

Jacek_Janowicz · ‎06 Dec 2024

@hyperdev , I have sent you an email, on the address you used when registering on our community. Letting you know, just in case it is in a spam folder or something similar 🙂
I'll send it again to an alternative address that I think I may try to guess. In case my message still is not in your inbox, please ask the DT folks you're working with to contact me. Alternatively, give me the name of your CSM, and I will ask him to help organize the call for us.

Best Regards,

Jacek