We have an issue with some J2EE containers - everytime they call a webservice, Appmon detects an exception. The exception does not block the web service call from processing or cause any issue, but it does create hundreds of thousands of exceptions in the Exception dashlet every hour.
I have forwarded the issue to the vendor of the J2EE container, but they back off by saying that there is no proof that these exceptions have any impact on performance or else.
I am not quite sure myself if I should be concerned. I've been told that these exceptions parsing do consume a lot of CPU when there is such a number of exceptions taking place. But I am not sure if that is the case.
The exceptions stack traces are not being written to the J2EE logs, so I suppose that they are being handled by the J2EE container.
I would like to know if I should be concerned about the resources waste caused by such a high number of exceptions, and if it is worth it to try to have these exceptions removed/fixed.
Solved! Go to Solution.
There are no data that will tell you how much overhead on your response time is because of exceptions. But each exception has some time for throw. It can be for example 20ms. If you have it thrown 100 000 times during an hour (I know such cases) you are waisting globally 2000s. Everything depends of scale you are talking about 🙂
I support everything Sebastian said. I did a series of tests a few years ago about the cost of throwing an exception and concluded it's only an issue at scale. However there is additional cost to have AppMon capture these exceptions. This is small on an individual exception basis, but can add up at scale. So perhaps you could put in an ignore for this specific exception and see how that affects the overall response time. Of course you'd only be measuring the delta of having AppMon measure it, the original exception still exists. Another option is to unplace the entire Exception sensor. Of course there's downsides to this in loosing visibility into exceptions so I only suggest this as a test to determine the AppMon overhead of monitoring large numbers of exceptions. My conclusion was that you would not see any measurable difference in transaction time if you're only talking about 1000's of exceptions, but if you got into 10k/100k+ then there would be a measurable difference.
For what it's worth, we do see lots of vendors throw exceptions and swallow them, never to be seen by the calling application. Perhaps this is a philosophical point, but this approach raises questions about the right way to use exceptions in controlling execution flow.
Circling back on this question as I think there is some value in it when trying to consider TCO of an application. I'm seeing one particular exception occur roughly 150k/hour during peak periods. I'm relatively sure it's a misconfig in the way an external app was integrated, however I'm at a loss as how to quantify the impact and sell "fixing" this to the development group that owns it. Especially as we look to move these applications into hosted environments that have a dollar cost associated with disk/cpu utilization.
So what is best practice for those of us using Dynatrace? How do we best quantify impact and indirectly cost associated with that impact?
During one of my Exception studies, I quantified an exception at about 10ms. That's not small when considering something that could be handled much faster using a different coding approach.
As for 'justifying' the fix, that might depend on what motivates your dev team and PM team. For example, is 'clean code' a goal, or are they comfortable with 'dont touch it, it works..." approach. What is the opportunity cost to fix this issue, in other words, what other issue is not going to get fixed. I assume you don't have infinite development resources in your organization.
Using AppMon, look at the number of times this exception is thrown on a per user-action basis. Then you can quantify the total cost to the users experience. This might total up to be a value worth arguing needs fixing to provide the user a better experience. Is the associated user transaction performing within acceptable parameters, or does the business office want it faster?
Finally, is the exception being thrown in code you don't control, but being caught and swallowed in your code? Most of the cost of the exception is in the throwing, not the handling, so if you don't control the throwing code, you're not going to gain much, unless you talk to the code vendor who's throwing it.
Hope this helps spawn some ideas.