One of the features that most clients are requesting me lately involves in some form the issue of "Business Hours". These needs occur typically in the context of internal applications, which are more important from, e.g., 9 to 5.
I have created this question so you can comment on how you have approached this issue? Following are some notes I have compiled the last weeks, in interactions with two clients we have.
Dynatrace has the concept of "Maintenance windows", and while it is a totally different concept from Business Hours, they are not radically different. Dynatrace has positioned "Maintenance windows" for notifications and alerting, which makes total sense. Business Hours are more typically associated with reporting, and no such option exists at the moment.
Business Hours is a concept that exists in several solutions, and since I’m an old Keynote Systems user, I was quite used to producing reports, for periods like 8AM to 8PM, Mondays through Fridays. Of course, Business Hours is not even the best name for such a need, as someone might be interested in performance during, e.g., Sundays.
There are a few parts of the platform that incorporate features that are somehow related. The best one I can remember is the graph for “Peak activity intervals”, available in the User behavior section of Applications, which always produces hourly values, even if we select e.g. last month of data:
One way I have got around this for some of the requests my clients have made, is doing this through the API. This can be done, and I have done it, for metrics that can be averaged from the values obtainable from the API. These include most of Host metrics, where generically the values era averaged. But this approach cannot be done for anything done through medians, and calculated stuff, like Apdex.
I’m pretty sure that it can’t be done in USQL also. There is an HOUR() function, but it can’t be used as a WHERE clause. I have introduced a RFE for this at:
Somethings can be done with USQL though, using DATETIME(). Queries like the following might be used to obtain graphs like the one above for applications. One such example might be:
SELECT DISTINCT DATETIME(starttime, 'HH') as hour, count(*) FROM useraction order by hour asc
With Synthetics, "Maintenance windows" can be excluded from availability calculations, as stated in the URL below. Despite that, it comes with another side-effect. If you define in Settings that you want "Maintenance windows" to be excluded from availability calculations, then users won’t be able to calculate the availability 24/7...
Regarding synthetics, another approximation for Business Hours is scheduling the measurements to only occur during the period where they really matter for the application in cause. At the moment, this only can be done through the API, activating and deactivating through API calls. This does have some drawbacks in my opinion, as it introduces one more point of failure, which wouldn’t exist if it was available in the platform. In any case, this procedure has the added value of reducing DEM consumption.
One important aspect should be noted though. A lot of the aggregated values that are given by Dynatrace might not differ greatly between how they are calculated today, and how they would be with Business Hours. One such example would be median values of response times. Imagine an application with 1 million requests from 9 to 5, and 10000 requests after hours. The median values would be:
• 9 to 5: position 500000 in the sorted array of values
• all day: position 505000 in the sorted array of values
Given the distribution with most application response times, I would say the difference would be small. In any case, this certainly does not apply in a lot of other cases.
In my opinion, the timeframe selector would ideally support this issue, and thus this would be available platform wide. I have introduced a new RFE regarding this at:
How do you deal with this Business Hours issue on your side?
@Antonio S. these are great points and valid arguments. I totally agree with the ability to allow the timeframe selector footprint to be platform wide. For our industry we are a 365 24/7 service provider. Our Business hours are 24 hours, so we dont really narrow down a scope of 9 to 5 as we want to see everything at all hours. We do leverage Maintenance Windows, and we do suppress alerts as well as turn a blind eye to detection during that time frame. I have found that if you simply suppress the alert notification, you run the risk of NOT alerting on an issue outside of the Maintenance Window.
We had an Issue where Host A we being taken down for an update. A window was put in place from 1am to 5am, more then enough time for the update to be applied and host restarted. During this time we didnt want to be alerted on High CPU, the host restarting and so on. SO we set the Maintenance Window. But the host ran into complications at restart, As a result, 5 am came and went, Dynatrace had noticed the host was offline at 4:30 but suppressed the alert, now its 6am, the host is still down but teams weren't notified. Further investigation showed that it would be best to disable anomaly detection for that time frame, in essence putting a cloak over the Agent, and when the time arrives for that agent to view the host, take the cloak off, where the oneagent would then look at the data now and compare it to prior and then triggers the alert because this host is still down even after the 5am window expiration.
I will be sure to give your RFE an upvote!
@AntonioSousa hey, do I get you correctly that you want to be able to use a defined business hours range, which is then considered when looking at metrics for example.
Such that the metric does only show data relevant within the define business hours?
Or did I miss anything here?
I believe that the ideal would be to define "filters" in the timeframe selector periods like the following:
These would be filters, and so you could select "last month", with say a filter "Monday to Friday", and this would only count the values for weekdays of last month...
If you have low usage out of business hours, Apdex will not vary a lot. I have done precise calculations in a client, and it's formula is in a certain way "robust" to that changes. We've had even one case where a certain downtime, when taken out, even got the Apdex better. Thinking it was a flaw in the calculation, we discovered that the outage was network related, in a way that the clients that managed to access had a better experience than usual... So you have to be careful for what the expectations are, and what the results will tell you if this gets implemented.
to be honest I don't think this is a RUM or application only topic and I am currently checking internally who would be the right person to be approached with this, since iMo it's a cross solution and cross capability topic.
My personal thoughts regarding the idea:
To move this forward my suggestion is to convert this conversation to a product idea and let people vote on this so that it gets priority!
The idea of this post was to discuss how Dynatrace users approach the issue that I mentioned in the post. At the time, I put in an RFE, and am glad that the old link that I posted above redirects to the correct RFE in this new Community platform:
I do believe it's a global platform issue, better dealt with probably at the timeframe selector. But it should have profound impact across the platform, so I believe it's not an easy one. But I would say the idea is not taking data out of the platform, only filtering the data when certain users might ask for it.