There are multiple variants how to validate SSL certificates and alert on expiry. I've taken a look at all of them and missed a lack of automation. I therefor created another one that hopefully overcomes some of the limitations and is easier to use in large environments.
As we are not having this feature out of the box for a long time this might be useful.
Summarizing the various attempts and threads from:
SSL Certification expiration checks out of the box - Details? (@Larry R.)
Does Dynatrace monitor SSL certificate validation (@Akshay S.)
Monitor SSL certificate expiry and generate alert (@Dario C.)
(also the contributers @Július L., @Leon Van Z.)
What is different in this plugin?
Where to find it?
You can find the plugin at my personal github repository.
Solved! Go to Solution.
Thanks for sharing! Our services team created a similar one but instead of using synthetic tests as input it took a csv as well as automatically discovering all https endpoints from incoming/outgoing service calls.
That one does require some configuration though and does use DDUs to track the endpoints.
Using the synthetic monitor configuration seemed logical. Can't rely too much on the services as they are more likely to change and eventually there are no services and one would still perform synthetic tests (or test something that is not even covered by DT on the backend).
Though I use the detected services approach to automatically configure RUM applications at scale...waiting for DT to bring back automatic application detection 🙂
Thanks for sharing !
I will try it.
The best solution (in my opinion) was to develop our own AG plugin (based on ssl and openssl library) in order to be able to manage our own groups of certificates and the associated alert thresholds. (I don't really like to use synthetics for this kind of monitoring).
The concern also (for me) to use events is that the problem will be automatically closed after 15 minutes (max 120 minutes) and therefore would not be compatible with an execution schedule higher than 120 minutes (or we have to manage this in the script and many events will be created every day until the certificate is renewed)
Hi @Aymeric B.,
actually this AG plugin uses OpenSSL in the background to fetch the certificates. Any solution that gets certificates from remote servers is some kind of synthetic monitoring. Unless you do cert checks locally on the filesystem (which is hardly controllable on large heterogeneous environments) IMO.
There is no issue with problems closing after 15 minutes. You can actually set the timeoutduration higher and also simply refresh the problem when needed. So my approach is to set the timeout to longer than the check interval, then the problem will be simply refreshed and no additional ones will be created.
Hi @Reinhard W.
We had specific needs for this AG plugin (management of assignment groups for the ticketing tool, different thresholds according to the type of certificates, ...).
Regarding the events management, the documentation indicated that the maximum timeout was 120 minutes for an event , so i have decided to configure a custom event in order not to manage a situation where the execution interval would be greater than the maximum timeout.
(but you're absolutely right, it's indeed possible to manage the refresh of the event in the script, maybe I've been a little lazy.^^)
Hi @Aymeric B.
you just gave me a great idea on how to do the refresh better, will include that in my plugin.
For the different thresholds for different groups of certificates. This could be covered with different instances of the plugin. In case you know on which sites (synthetic monitors) you have which certificates, you could assign different tags in DT and then the different instances of the plugin would pick up those sites with separate thresholds.
Hello @Reinhard W.
In how to use the below sentence written.
"that is able to access the sites you want to monitor."
I am a bit confused about this, thus need your assistance to clear my concept before using the plugin. We have a few eChannel applications monitoring with the Synthetic Browser.
Do you mean the Environment AG should be able to reach that publically available DNS?
Thanks for the feedback (@Aymeric B.). I've added functionality to the plugin so that it now also checks previously created problems/events and if their state is still satisfied (outside of the normal long cert check interval). It will so so by fetching the event/problem and check if it is close to expiry (the max. 120 minutes). It this is the case it will check those hosts and make sure the problem is refreshed, or if the failure condition doesn't exist anymore close the problem.
Additionally I added proxy support for the plugin. This can be useful in cases where direct access to the sites to check isn't possible. The plugin will only use TLSv1.2 for security reasons.
@r_weber this looks good, it makes sense to leverage the Synthetic checks. However, there are cases where a client doesn't use Synthetic, and being able to add domains/FQDNs manually still makes sense. I guess for those, I'll continue to leverage @leon_vanzyl's plugin 😎
Question: does the plugin work for SaaS and Managed? I've set it up on a Managed instance, but I'm not sure what to add to the 'Dynatrace Tenant UUID to report to' field, the env. ID? Incl. or excl. the /e/? Or the entire env. URL, incl. the https://FQDN/e/?
Hi @andre_vdveen ,
you can always configure your synthetic monitors (but disable them) and the plugin would pick them up as well 🙂
As the plugin uses the remote execution engine it pushes the notification vie the AG API. This API is always https://localhost/e/<tenantid> regardless of managed or saas this API urls is the same. so you only need the tenant ID & API key.
Hi @r_weber thanks for the feedback, didn't realize it will still work even with the monitors disabled! 🙂
I've deployed the plugin as I normally would, but I couldn't get it working, no matter what I tried. Upon checking the logs, I noticed this:
Token is missing required scope. Use one of: ExternalSyntheticIntegration (Create and read synthetic monitors, locations, and nodes)
Turns out the token permissions on the documentation need some updating, as it only mentions:
After adding the missing token scope, it is working now - very cool, thank you! 😉
You said the Synthetic monitors can be disabled and the certificate checks should still work, correct?
Perhaps I'm doing something wrong, but if I have a Synthetic test set up but disable it, the certificate check never seems to execute/check that monitor, even though the threshold value (85 days till expiry, threshold set to 90 day) is valid?
If I enable the Synthetic monitor, let it run once, and then disable it, it seems to work OK (it raises a problem for the certificate's expiry date) but I can't confirm if it will close the problem once the certificate is updated while the monitor is disabled.