03 Jun 2024 02:27 PM - last edited on 28 Jun 2024 10:14 AM by Michal_Gebacki
Our Organization uses Dynatrace in a very unique when it comes to our Real User Monitoring. There is a dedicated focus on our Visually Complete Time (VCT) for our customers actions. Looking at VCT allows us to raise an early flag/alert that our customers are experiencing a slowdown with things loading up on their screens. Coupling VCT with all the anomaly detection from Dynatrace it puts us at the forefront of being aware to what our customers are experiencing. As great as this method is, it is also flawed. RUM data is wholly dependent on the end user from Device OS and versions, to their ISP and even connection. Think about your everyday life, as you walk into your local grocery store your cellular connection might be limited just by way of the building you are in. Next thing you know, you find yourself waiting for that text message to go through just to clarify which pasta sauce your significant other needed.
The Issue at hand – Is it really an Issue?
We noticed a trend where teams were being brought into triage calls due to a spike in the VCT. These spikes in VCT would occur during the off hours which not only disturbed staff work life balance but also their sleep. Upon investigating these issues, many of them started due to the users poor connection on their device. Our charts were focused in on the Average VCT over the given timeframe, what can be good and bad. As you can imagine, during the ‘After Business Hours’ we see a drastic decline in the number of users, and as a result, low performing users will have a greater impact on our VCT times.
Our RUM Achilles heel – Users poor downlink speeds.
Upon joining these triage calls for the high VCT, we found a commonality. Users that had a poor downlink connection experienced drastically longer load times, which makes sense. As you can see, a drastic spike in VCT was cause by 1 user was a drastically poor connection during the ‘After business hours’ time frame.
Solutioning – Keeping the same design, just altering the data sets.
We were determined to find a solution for this occurring issue. Initially we thought USQL since we can directly target the downlink values and connections, however USQL if for completed sessions, which puts you about -30mins from the initial slowdown event. USQL would have been great as we could put all the data we needed together and formulate a single tile for VCT just like our teams are use to seeing and interacting with. With that we went back to the drawing board, determined to making a solution.
Drawing Board – Brainstorming leads to a lightning strike!
I drew a mockup of what the ideal solution would be for us, and I set out to find a way to achieve it. Simply a chart of VCT values and the associated downlinks. I knew if I could create the table I could create a single value tile as well. The simplistic idea was to take the RUM session value for VCT and associated Downlink for Live users, Not just completed users.
Building the solution – a firm foundation.
I turned to the User Sessions and the Associated application and shifted to defined session properties. We defined a custom property looking at the JavaScript Variable: navigator.connection.downlink.
We created this not just as an action string but also as a double property which will give us added query capabilities.
Now granted just making the session property alone didn’t solve our need as VCT metrics out of the box will not allow you to filter on Session Properties via the Data Explorer. So we shifted our attention back to the application page in Dynatrace. Dynatrace did have a perfect table already set up, basically exactly what I drew out days prior:
With our double property we were able to even define out a range, to include or exclude downlink values and exclude poor connections from our data sets. For example, if we only want to look at users 8mbt/s and greater. While this table is great, it cannot be tossed as a tile on a dashboard, but the solution is right there “Create Metric”. We defined out our range of downlink and formulated a metric based off that. Now we not only have our VCT, but we also have the VCT of premium connections, which now allow us to say, “If the best connection is having high VCT, then investigate on our end.”
As you can imagine, you will need to set the session property for each of your defined applications and create the metric as you desire. Make sure you include your application name and any other identifying information in the metric as if you copy and paste the name across 5 Applications, you will have 5 metrics without any clear text as to which application the data is representing.
We have been doing data validation the past few weeks. Verifying the validity of the current alerts. We have not changed Team members dashboards, but created our own dashboard to do an apples to apples check. If we see a spike in the original VCT metric, do we see that same spike at the same time for the new VCT with the best connection? The validation data has shown us that often, poor connected sessions raise our VCT whereas the better connections were unaffected with out any rise in VCT. In this case below, it reaffirms that yes, this spike is valid as the new metric reflects the rise in VCT.
Single Value Result: (Granted the average is less than the actual rise due to the timeframe)
In the next several weeks we will continue to get data sets and get buy in from senior leadership to adopt this new methodology.
I hope you find this interesting and of value 😊
03 Jun 2024 02:53 PM
Great tip!
I would only add that besides bandwidth, and when dealing with mobile devices, some older equipment can really lead to much higher times, so that I have seen alarms generated by these types of low end devices...
Clients sometimes add that these devices shouldn't be considered 🤣 but I do tell them that they should check their sites/apps so they can eventually tackle these type of users, with limited resources.
03 Jun 2024 03:27 PM
Agreed, and this method can be used for all the 'negates' you want. I have tracked down users before that had been using an old blackberry 🤣.
03 Jun 2024 03:49 PM
Please share where you found the javascript property pack, don't see that as an option on my instance.
03 Jun 2024 06:15 PM
Correction - its not a property pack its a custom defined property:
05 Jun 2024 04:11 PM
It is part of the Javascript framework, accessible on all browsers.
NetworkInformation - Web APIs | MDN (mozilla.org)
04 Jun 2024 07:52 PM
Great work, thanks for the write-up and explanation! I can see us using this with many of our clients.
I'm curious: how did you figure out the JS variable needed? Is that documented in the DT docs, via Google searches or Dev tools in the browser?