How to write a simple python plugin that will be invoked from Dynatrace managed (SaaS) and will restart a service (lets say restart IIS) on Windows VM that has dynatrace OneAgent installed. Do I need to know bunch of Dynatrace managed APIs, etc? Is there a step by step guide for beginners who are new to dynatrace managed? Does anyone has an example to do this? I am new to dynatrace managed and python development. Thanks!
This page explains the steps required to create your first plugin:
If you're stuck or you want to have a quick start you can reach out to our services department and they can you help out. The Dynatrace knowledge required is quite limited, but the more Python knowledge you have, the better.
I have looked at the above documentation and was able to play with it. But I am not sure if it helps me as to what I am trying to do below:
I have a manual tag configured on IIS process/services running on my Windows server that has a dynatrace agent installed on it, and when IIS goes down I get email notification that IIS is down, GREAT!! Now on the other hand I have a python script that I successfully tested on that windows server and able to start the IIS on this windows server locally. My question is where do I need to add/configure that script in dynatrace OneAgent and in dynatrace managed server, so when IIS goes down on windows server the python script should execute and starts the IIS? I have a feeling that I am overlooking some key piece of information that would help me solve my puzzle :).
I believe this is currently not possible with Dynatrace python plugins. Those plugins are used exclusively for pushing data (metrics) into Dynatrace using Oneagent API. Oneagent is not aware of any "problems" that occur in the environment. I'm also not aware of any documented option how to execute commands on monitored systems using Dynatrace oneagent. And I hope there is not such option as it will be a huge security concern.
What you can do is to write your own (daemon) script, that will be called using HTTP request as soon as problem occurres using problem notifications. Your custom URL will be called with payload you define as soon as Dynatrace detects a problem. It behaves just like the email notfication you already have. This however implies your server running your script must be reachable from tenant. Another option is to poll Dynatrace tenant regularly for IIS problems, but API calls are (will be limited).
You could have a python plugin which runs on the host level (so it's always active, and not only whenever a specific process is up and running). The plugin would check for IIS, and if it isn't up and running it will start IIS. That script will then run every minute.
I'd suggest you to look into a powershell script instead though as it's not what the plugins are meant to do.
If you want to start it based on a Dynatrace notification, have a look at the answer from Julius.
Thanks Julius and Mike. If I understand both of you correctly, the summary of your responses suggest setting up a cron job or scheduled task, or polling the server for the problem notification to start the IIS if it's down.
We get a notification email when IIS is down, similar way why Dynatrace server can't send instructions to Oneagent running on the IIS server to execute a python script to start IIS server on demand?
From what you guys mentioned it seems like communication is one way only, from Oneagent to dynatrace server. What needs to be done for communication from dynatrace server to Oneagent, so it can perform actions? Poking holes in the firewalls is one thing needed that I can see. But does the Oneagent has the smartness to take orders from the dynatrace server and perform actions accordingly? I know it can be a security risk, but if we are willing to take the security risk, does the functionality exists, a communication channel from dynatrace server to Oneagent to perform actions on demand?
The out of the box communication is only one way. You'd have to create your own communication link to the agent by manually pushing a file or something to the server which runs the agent, and then poll it from the plugin. It seems like a bit of a hassle though.
I'm quite sure oneagent does not have any functionality of triggering actions from tenant. This is in contrast with legacy monitoring tools. Most of them had this functionality. I can imagine that having such functionality would trigger huge security concern in traditional enterprises. Mainly if you have the Dynatrace SaaS.
I personally always had some problems triggering automatic repair actions based on monitoring. If possible, I'd recommend to focus on root cause of that IIS crash instead of restarting it automatically.
While exploring other options I noticed that there is a way in Dynatrace Application Monitoring (AppMon) version 6.5 to create a generic execution plugin. Since I haven't used the AppMon at all so in a nutshell can you summarize what would I be losing or gaining by shifting my focus for autohealing through python script using AppMon? Is there a URL that you can point me to for differences between Dynatrace Managed (SaaS) and Dynatrace Application Monitoring (AppMon) version 6.5 when it comes to Autohealing through generic execution plugin. Appreciated your time!
Isn't this a kind of auto-remediation that you describe here Rehan? IIS goes down -> problem -> restart action. I suggest to take a look at "custom integration" / webhook within Problem Notification settings.