Solved: Windows 10 VDI Process & 18,000 concurrent sessions

StoleSursum · ‎13 Jan 2023

I searched the forums previously for answer to no avail.

Asking about options to Dashboard a compiled and installed enterprise application consisting of two primary binaries running across 18,000 concurrent non-persistent VDI Windows 10 machines. Standard and advanced process rules in settings relegated to Java or environment variables without option to define mybinary1.exe, mybinary2.exe.

I can pull process information from each VDI but what I need, the ask, is a usable dashboard where I can convey top 100 machines (example) where MyBinary1.exe consumes the most Memory and CPU.

Better yet -

# if MyBinary1.exe processes running in my environment between these hours and over this timeframe.

# of crashes in Windows OS for this specific process

# of instances running across the enterprise as single value

List goes on and on relative to what we could do if it were possible to create a custom monitor, object, group or whatever for specific processes that run on Windows OS.

It captures this information per host (aka VDI) but in my case the value of the data is relevant if I can see what that process is doing across nearly 20,000 Windows 10 non-persistent VDI. DT agent resides on every machine so it's a huge amount of potentially valuable data.

One simple and basic example is new code updates. We generally push these out with SCCM and it might be a minor or major release. I want to see or show the difference of that same named version file before and after we push a new release.

Next, I need to see the overall impact relative to CPU and Memory consumption relative to my overall capacity and relative to future capacity planning.

Next, another example, beta testing - Business Unit A wants to deploy "PotentialNewAppXXX" to the environment. Initially, we deploy PotentialNewApp to 1000 machines. Why 1000 machines? Well, 100 machines is better than 10 when you have nearly 20000 VDI users and wanting to get a better feel for performance impact. I can't get a feel for performance of that new software deployment unless I can measure that Windows executable across all 100.

Next, discovery aka security of undefined. Ideally, I want to catalog all of our KNOWN processes and create a KNOWN and APPROVED processes running on 20000 VDI desktops. I would like to pull reports that show UNKNOWN processes where those EXE files do not match the approved.

I've given a few basic examples and I'm sure that others can see the value of this capability. Other tools provide this out of the box. But I'm not using other tools and do not wish to purchase additional tools to do something I believe should be doable. Perhaps I just don't, yet, know how to do it.

I'm hoping someone in this forum has done it and perhaps we can exchange Dashboard JSON files or point me in the direction. This doesn't seem to exist with the existing documentation as I've gone through all of it and including Dynatrace University.

To clarify, I'm using the word process as synonymous to a windows executable process that would show up in Process Monitor on the Windows machine. I can easily view that process on any one VDI out of the 20000 with no problem. Sure, I can pull up that process group as well and down toward the bottom I could sort by CPU usage and sift through 2000 pages but that has no value relative to my ask.

I have deep monitoring enabled on that process and more metrics relative to that process. Unfortunately, none of the available views to the Windows process running is there an option to pull that to a dashboard and I have full administrative rights.

I appreciate the time.

Any help is appreciated. If there is someone on the forum looking to do the same or done it already please reach out. I'm open to a collaborative effort given the incalculable value as it pertains to measuring these types of things and across every monitored machine. Including additional and side discussions as to how we might take it further and sharing of knowledge.

As I stated earlier, perhaps I'm just missing it and overlooked something.

-Murphy

linkedin.com/in/vcissgroup

Julius_Loman · ‎15 Jan 2023

Too many questions in a single post 😁 Nevertheless with Data Explorer you are limited to 100 series within a single chart. So for example to find the top 100 hosts where your process consumed most of the CPU you should be able to cover with the following metric selector:

builtin:tech.generic.cpu.usage
:filter(and(or(in("dt.entity.process_group_instance",entitySelector("type(process_group_instance),entityName.equals(~"My Awesome Java Process~")")))))
:parents
:splitBy("dt.entity.host")
:sort(value(avg,descending))
:limit(100)

Just adapt the entitySelector to cover your application. For metrics - you can use similar queries for memory or suspension (since you mention you have deep monitoring enabled).

Number of crashes - this becomes a little bit tricky. As far as I know, there is not a built-in metric for that. It's only event which unfortunately cannot be added to a dashboard (yet). But you can easily query the number of such events using the Events v2 API and push a metric back to Dynatrace using Metrics API v2.

Having the number of running instances - I believe you can use the builtin metric builtin:tech.generic.processCount directly for that (with filtering for your process groups).

All the filters depend on how your process groups are organized - if you use host groups for example. You can leverage tags or release monitoring with version detection probably based on environment variables. It's not designed for windows VDi monitoring but it may fit your case. For example it provides you directly with an overview such as crashes.

If you will use environment variables as version detection strategy, you will then have tags populated. Then you can use filtering such as:

type(PROCESS_GROUP_INSTANCE),name("My Awesome App"),tag("[Environment]DT_RELEASE_VERSION:1.1.2")

In Data explorer for filtering metrics only for particular release.

Getting the data for all processes (from the OS perspective) is not really feasible right now to evaluate approved and unapproved processes. Dynatrace collected detailed data only about important processes or the ones you specifically detect with Declarative process grouping (not deep monitored). For other processes, you might now have the information collected and they just go into a placeholder (other processes).

I hope this helps you to achieve your goals - Even though Dynatrace is not designed to monitor workstations, I believe you can get most of the answers you need.

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

StoleSursum · ‎17 Jan 2023

Julius-

I appreciate the response and this is what I needed to get started. The initial snippet of code is working and returned top 100. The subsequent questions relate to the first one and I appreciate the information provided. I'll close this thread.

Appreciate the time,

-Murphy

linkedin.com/in/vcissgroup