I am working on a customer that has lot of applications that use very OLD .Net CLR versions. This customers application also has a special characteristic where aplications work like a micro services, where a single server has 100+ .Net processes. This makes it impossible for our agents to collect measures from WMI.
Historically this applications has high WaitOne and recv times, which makes me think that this is due to a capacity problema. As I cannot collect all the information about the environment (thread and memory) I cannot tell for sure what is going on.
This application is highly dependent on remote services, all of them accessed through VIPs. I noticed that in most of the cases one of the VIPs is responsible for 44% of the transaction time.
The image below shows that we have 44% of PRAPP observed tier contribution. This tier has been configured to measure all the calls to http://prapp.
My main doubt here is, this is that the VIP that controls the remote calls to all the micro services (called several times on the same PurePath) and in serveral cases I can see a lot of time on WaitOne() and recv(). Can I state to the customer that they have a balancing problem for this application tier, or should I be more cautios on this? I can see that WaitOne() and recv() times are accounted for other tiers and not for the PRAPP observed tier. My main concern here is that the High Observed Tier my be related to the WaitOne() and recv() times and not to load balancing configuration it self.
Could some please help me further understand how I could better investigate this? Please see attached a single purepath session to make it more clear of what is going on.prapp-observed-tier.dts
Can you share the purepath (after removing confidential string)?
Pl. note that Transaction flow alone is not way to make whole picture. You need to have look at the individual purepath nodes to pin point what nodes on which user agent are causing what type of response break up (GC, IO, CPU, Sync/Async Wait etc.).
Also, the transaction flow is highly dependent upon how the tiers are bunched up. (Two functionally different agent must never be grouped together).
Further, to pin point the issue related to load balancing you need to analyze particular timeframe.
Let me know if you need more help. (May be net meeting ;))