02 Dec 2024 03:13 PM - last edited on 03 Dec 2024 07:28 AM by MaciejNeumann
Looking for tips on how to monitor thru NAM a large number of hosts (15k).
Need to generate an event for each hosts going down or up.
Solved! Go to Solution.
02 Dec 2024 04:02 PM
Well, as far as I know you can do this one of three ways:
Option 1: Add their IPs manually.
I assume this isn't a very attractive option given the number of hosts.
--
Option 2:
Use a filter expression. If all of these hosts belong to one or mroe host groups, for example, you can add it in the config:
More info on filter expressions can be found here.
You'll likely find something there that will serve your purpose, but if not...
---
Option 3:
Create a script that makes use of the V2 API and creates NAM en masse. On the old interface, go here:
And find this:
You'll need a token with the right permissions of course. But you can test the API there and then create a local script (Python maybe?) to call that API repeatedly adjusting the json body of the request for all your different monitors.
02 Dec 2024 04:37 PM
We just did the V2 api version.
2 Active gates
50 end points
ping cycle 5mn
Each with 700 max host per targetList
result:
10% hosts are pinged
Of the 10% cycle range from 300 seconds (Ok) to 2400 seconds (not Ok)
Method of exploring, on each active gate:
tcpdump --immediate-mode -l -i eth0 "icmp[0] == 8" | tee tcpdump/icmp_request_active_gate_X.txt
The expected flow from tcpdump should be 15000/300/2 per ActiveGate i.e. 25 lines per second.
Actual printout was sporadic.
Any idea what we did wrong ?
one json endpoint:
{
"description": "NON_PRODUCTION_0",
"enabled": true,
"entityId": "MULTIPROTOCOL_MONITOR-69CDF7C646DF6400",
"frequencyMin": 5,
"locations": [
"SYNTHETIC_LOCATION-03A5657E489F280A"
],
"name": "NON_PRODUCTION_0",
"performanceThresholds": {
"enabled": true,
"thresholds": null
},
"steps": [{
"constraints": [{
"properties": {
"operator": "=",
"value": 100
},
"type": "SUCCESS_RATE_PERCENT"
}
],
"name": "NON_PRODUCTION_0",
"properties": {
"EXECUTION_TIMEOUT": "PT3S",
"ICMP_NUMBER_OF_PACKETS": 3,
"ICMP_PACKET_SIZE": 8,
"ICMP_TIMEOUT_FOR_REPLY": "PT2S",
"ICMP_TIME_TO_LIVE": 255
},
"requestConfigurations": [{
"constraints": [{
"properties": {
"operator": "=",
"value": 100
},
"type": "ICMP_SUCCESS_RATE_PERCENT"
}
]
}
],
"requestType": "ICMP",
"targetFilter": null,
"targetList": [
"HOST_000",
....,
"HOST_700"
]
}
],
"syntheticMonitorOutageHandlingSettings": {
"globalConsecutiveOutageCountThreshold": 1,
"globalOutages": true,
"localConsecutiveOutageCountThreshold": null,
"localLocationOutageCountThreshold": null,
"localOutages": false
},
"tags": [{
"context": "CONTEXTLESS",
"key": "TAG_NON_PRODUCTION",
"source": "USER",
"value": null
}
],
"type": "MULTI_PROTOCOL"
}
03 Dec 2024 07:31 AM
Hi @hyperdev,
Have you checked the NAM limitations?
https://docs.dynatrace.com/docs/shortlink/network-availability-monitoring#limitations
If yes you should raise a support ticket.
Best regards,
Mizső
03 Dec 2024 11:39 AM
I need some explanations on the limitations:
The maximum number of network activities executed per network availability monitor is 1,000. Network activity is a single DNS request, single TCP request, or single ICMP packet.
How can I specify < 1000 single icmp packet on the json example above ?
Do I need to specify one target per step and < 1000 steps per json file ?
Thanks.
03 Dec 2024 12:24 PM
Hi @hyperdev
Thanks for raising that question.
For NAM ping tests, you could define the number of packets used during a single test execution against a single target.
In your file, it is 3:
"ICMP_NUMBER_OF_PACKETS": 3,
As your target list contains 700 (to be precise, 701) targets,
"targetList": [
"HOST_000",
....,
"HOST_700"
]
That means sending 3 packets to each, which means 701 *3 = 2103 packets. So, to meet the condition, we recommend breaking down the configuration into 3 NAM ICMP monitors.
However, perhaps we need to go one step further. In your initial post, you have mentioned that your goal is:
@hyperdev wrote:
Looking for tips on how to monitor thru NAM a large number of hosts (15k).
Need to generate an event for each hosts going down or up.
I understand it as an expectation of a separate Problem (and notification in case of failure of any of your hosts). That makes me think that creating a separate NAM monitor for each of your hosts may work better. That may require increasing the limit of NAM monitors for a single environment, but we can handle that.
As 15k hosts is a really huge number, it may require a special approach and planning. I was wondering if you'd be interested in having a call with me and the team to discuss the details of your use case. I believe that after that we'll be able to propose the most accurate approach.
Best Regards,
Jacek
04 Dec 2024 10:57 AM
Would love to.
I am situated in UTC+2
06 Dec 2024 08:17 AM
@hyperdev , I have sent you an email, on the address you used when registering on our community. Letting you know, just in case it is in a spam folder or something similar 🙂
I'll send it again to an alternative address that I think I may try to guess. In case my message still is not in your inbox, please ask the DT folks you're working with to contact me. Alternatively, give me the name of your CSM, and I will ask him to help organize the call for us.
Best Regards,
Jacek