FYI the topics covered in this deck apply more to Dynatrace anomaly detection as opposed to baselining in AppMon (though they probably do share a number of aspects around baselining I would bet).
So, there is difference between Anomaly Detection and Baselining? Can you pl. throw some light, James? Also Is predictive analysis (buzz word I hear a lot on peoples mouth) possible in future releases?
Just noticed this when coming back looking for those slides. I don't have much as to what is planned going forward but my thinking is that baselining is just looking at past trends and creating an expected range going forward. So at it's simplest: we'll have between 90 and 100 users at 12PM and 5-10 users as 12 PM or something like that (same could be applied to response times or failure rates). This is possible in AppMon for specific BTs and nearly everything is baselined in Dynatrace.
Anomaly detection is like a root cause or fault domain isolation where it considers things like baseline violations but also servers not responding and other events, but instead of stopping there it tries to determine where within the potentially very complex system the problem started at, whether it be a bad deploy or some random backend server having network problems. So baselining and anomaly detection can be separate but baselines can be one factor considered within anomaly detection.
Thanks Alex. This is a great slide by @Alois R. I hope Dynatrace gets these intelligent components in Future.
Why I am asking? Because I want to create an intelligent plugin for anomaly detection. I am aware that I can include transactions in Baselining in OOTB. But I don't want that (BTs with high split count are not being stored). Also, If there an separate program for baselining then using this I can baseline almost any BT.
Currently I am thinking about Kalman Filter. But that slide would help me.
I'm not sure if Kalman-Bucy is the right tool for all the things you could monitor with appmon. It basically assumes that there is some model, some machine, governed by a set of parameters, and that these parameters change over time to make the model behave differently. I think you'll have a hard time of modelling all the different possible kinds of metrics like this.
Our approach is purely statistical.
For response times, there is basically no sensible model, it's pretty much random, depending on client, network and server state. There isn't really a single (theoretical) machine behind it that could explain the values, therefore we use percentiles for this measure, to stay distribution-independent.
The failure rate is in general binomially distributed, so we use this distribution in this case.
The throughput is Poisson-distributed.
These three metrics are all that we monitor with a baseline for now.
I'm not sure if it's possible to provide meaningful models for other kind of measures. But I'd be interested in hearing your thoughts and seeing your plugin. Thanks,
So did some self-learning and got to understand that Baseline is some sort of moving average. Also the relevant topics for prediction would be studying ARIMA model. And for Anomaly the best technique is the STL (Seasonal-Trend decomposition using Loess).