Curiously, has anyone here considered or even attempted an integration of Dynatrace as a monitoring data source into Stackstate? If so, could you share any insights/experience?
We are currently discussing potential benefits of such an integration with regards to improving Dynatrace's predictive capabilities (which IMO are rather lacking) and establish the basis for true, data-agnostic AIOps.
Solved! Go to Solution.
The question is what kind of data you would like to push there from dynatrace. You have in general only option to send there metrics via API. I don‘t think, in many case you will be able to use predictions of what happen on environment. From my experience, most of issues are showing up rapidly without many clues right before. Something like that cannot be predicted. For example because of issues with time server, some of servers will has problems with synchronization of time. After some time of working for those servers, gap is long enough to break Netflix Zuul, and requests are stop working 🙂
Thanks a lot for the feedback, Sebastian.
As for the data being pushed it would expect that would contain metrics, events and topology information, but that is certainly a topic to be discussed and to which I don't have a clear answer at this time.
Also, I'm well aware that it's not possible to predict every failure - it would be naive to assume that.
I think one of the assumptions here is that there might be significant untapped potential in combining/augmenting APM data with "foreign" domains such as marketing, social and even physical domains. In this case data-agnostic AIOps solutions would act as an aggregator for different domain-centric data. In the end one could perhaps imagine a solution implementing a general business operations support model not unlike that used for weather forecasts 🙂 After all, it's not "only" about predicting application outages but for example also user behavior and market dynamics.
But even if we stick to the APM domain there are scenarios where outages could be predicted with some confidence using very simple algorithms. Think for example of linear timeseries extrapolation to detect possible, future threshold breaches (i.e. increasing resource usage such as disk, memory, object pools etc.). Of course there is a risk for over-alerting unless the AI model has deeper knowledge about the observed system (i.e. code analysis, configuration, common fault patterns etc.) but this is only to illustrate a simple example for a "predictive capability" that would have given us a head start in some cases (admittedly one would not need an AI for that)...
Did you tried api predict mode support for timeseries? Ofcourse it’s only set of data pointy, need some engine to process but in general some predictions are done in Dynatrace, based on them baselines are built. I’m only curious if extremalnie platform will indeed add anthing more that we have right now. I don’t have experience with Stakstate, so this is only my assumption 🙂
IMO the current baseline predictions are not really predictions in the sense that staff is alerted BEFORE an issue is expected to occur but instead "only" a way to calculate baselines/thresholds smarter (based on learned patterns of the last 7 days).
I haven't tried the "predict" flag for querying timeseries via the API and to be honest this is the first time I hear about this parameter - so thanks for that 🙂 But as you already pointed out, this requires separate code/tools for querying and alerting which might not be feasible or desirable in many cases.
Stackstate now provides a built-in integration option for ingesting Smartscape topology information via its Agent Stackpack:
We already made some good progress while testing this against a tenant:
One use case we are pursuing going forward is the automatic merging and correlation of/with arbitrary event, topology and CMDB data streams obtained from different "silos" and domains within our company (Dynatrace being only one of them) and thus enabling true domain-agnostic AIOps.