cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Many skipped events in dynaTrace 6.0 and SocketTimedOutException for FrontEnd server

kedar_saravate
Newcomer

Hi,

We are using dynaTarce 6.0 to monitor performance of our application.

However we have encountered 2 issues when dynaTrace server was up and monitoring our application.

1) Many skipped events found in Corelation.0.0.log file in log files of system information.

 

2014-05-26 18:03:22 INFO [com.dynatrace.diagnostics.server.datacenter.g] CorrelationEngine statistics from Mon May 26 17:03:19 CEST 2014 to Mon May 26 18:03:22 CEST 2014:

Total skipped events on server:               3381417

Skipped events because path was already complete:                     0

Skipped events because path was already removed:                     1676732

Skipped events because of missing transformation:                       0

Skipped events because of an invalid first node:                              0

Skipped events because of event protocol errors:                           0

Skipped exit events because it cannot be paired:                             1289305

Skipped nodes because of PurePath truncation:                                              0

Skipped end path events as PurePath did not start before:         207691

Skipped start path events because PurePath was already complete:      164416

Skipped start path events because PurePath was already removed:       43273

Skipped start path events because root agent did not exist:                       0

 

2014-05-27 04:05:51 INFO [com.dynatrace.diagnostics.server.datacenter.g] CorrelationEngine statistics from Tue May 27 03:05:46 CEST 2014 to Tue May 27 04:05:51 CEST 2014:

Total skipped events on server:               921840

Skipped events because path was already complete:                     0

Skipped events because path was already removed:                     368736

Skipped events because of missing transformation:                       0

Skipped events because of an invalid first node:                              0

Skipped events because of event protocol errors:                           0

Skipped exit events because it cannot be paired:                             184368

Skipped nodes because of PurePath truncation:                                              0

Skipped end path events as PurePath did not start before:         184368

Skipped start path events because PurePath was already complete:      184368

Skipped start path events because PurePath was already removed:       0

Skipped start path events because root agent did not exist:                       0

***********************************************************************************************************************************************************************************

2) We have noticed SocketTimedOutException in FrontendServer.0.0.log file in log files of system information

 

2014-05-26 08:12:24 INFO [DefaultSessionManager] successfully created session (passive): WS002971:Encrypted:Deflate:Client

2014-05-26 18:02:50 WARNING [BaseRequestListener] error while handling request: class java.net.SocketTimeoutException: Read timed out, while executing request: class com.dynatrace.diagnostics.server.frontend.clientrequest.requests.LongTermRequestForwarder$ForwardingLongTermRequest (connection=Fetched-WS002971:65217:Inbound:Encrypted:Deflate:ea140136-87f6-48f6-9429-6166e1aa382e): com.dynatrace.diagnostics.communication.tcp.connection.BaseRequestListener notifyRequest:74

com.dynatrace.diagnostics.communication.tcp.exception.CommunicationException: class java.net.SocketTimeoutException: Read timed out, while executing request: class com.dynatrace.diagnostics.server.frontend.clientrequest.requests.LongTermRequestForwarder$ForwardingLongTermRequest

                at com.dynatrace.diagnostics.communication.tcp.core.DefaultRequest.handleException(DefaultRequest.java:15)

                at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession$SessionRequest.handleException(DefaultSession.java:756)

                at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection.handleException(ManagedSocketConnection.java:55)

                at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection.access$000(ManagedSocketConnection.java:26)

                at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection$RequestWrapper.handleException(ManagedSocketConnection.java:74)

                at com.dynatrace.diagnostics.communication.tcp.connection.SocketConnection.handleException(SocketConnection.java:153)

                at com.dynatrace.diagnostics.communication.tcp.connection.SocketConnection.executeRequest(SocketConnection.java:143)

                at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection.executeRequest(ManagedSocketConnection.java:97)

                at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession.executeRequestOnConnection(DefaultSession.java:630)

                at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession.executeRequest(DefaultSession.java:602)

                at com.dynatrace.diagnostics.communication.tcp.session.DefaultService.executeRequest(DefaultService.java:121)

                at com.dynatrace.diagnostics.server.frontend.clientrequest.requests.BaseGenericRequestForwarder.forward(SourceFile:20)

                at com.dynatrace.diagnostics.server.frontend.clientrequest.FrontendServerForwardingClientRequestListener.handleUnhandledRequest(SourceFile:86)

                at com.dynatrace.diagnostics.sdk.communication.SessionRequestListener.handleRequest(SourceFile:103)

                at com.dynatrace.diagnostics.communication.tcp.connection.BaseRequestListener.notifyRequest(BaseRequestListener.java:57)

                at com.dynatrace.diagnostics.communication.tcp.session.DefaultService.notifyRequest(DefaultService.java:97)

                at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession.notifyRequest(DefaultSession.java:468)

                at com.dynatrace.diagnostics.communication.tcp.session.DefaultSessionManager.handleSessionServiceRequest(DefaultSessionManager.java:702)

                at com.dynatrace.diagnostics.communication.tcp.session.DefaultSessionManager.notifyRequest(DefaultSessionManager.java:641)

                at com.dynatrace.diagnostics.communication.tcp.connection.DefaultSocketListener.notifySocketActive(DefaultSocketListener.java:94)

                at com.dynatrace.diagnostics.communication.tcp.connection.DefaultSocketListener.notifySocketActive(DefaultSocketListener.java:29)

                at com.dynatrace.diagnostics.communication.tcp.socket.DefaultSocketHandlerRunnable.notifySocket(DefaultSocketHandlerRunnable.java:116)

                at com.dynatrace.diagnostics.communication.tcp.socket.DefaultSocketHandlerRunnable.handleRequest(DefaultSocketHandlerRunnable.java:102)

                at com.dynatrace.diagnostics.communication.tcp.socket.DefaultSocketHandlerRunnable.execute(DefaultSocketHandlerRunnable.java:55)

                at com.dynatrace.diagnostics.util.threads.StatusRunnable.run(StatusRunnable.java:46)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

                at java.lang.Thread.run(Thread.java:744)

Caused by: java.net.SocketTimeoutException: Read timed out

                at java.net.SocketInputStream.socketRead0(Native Method)

                at java.net.SocketInputStream.read(SocketInputStream.java:152)

                at java.net.SocketInputStream.read(SocketInputStream.java:122)

                at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)

                at sun.security.ssl.InputRecord.read(InputRecord.java:480)

                at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)

                at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)

                at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)

                at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)

                at java.io.BufferedInputStream.read(BufferedInputStream.java:254)

                at com.dynatrace.diagnostics.sdk.io.IoStatsAwareInputStream.read(IoStatsAwareInputStream.java:19)

                at com.dynatrace.diagnostics.sdk.io.DataIOUtil.readInt(DataIOUtil.java:407)

                at com.dynatrace.diagnostics.sdk.communication.SizeAwareChunkingInputStream.c(SourceFile:105)

                at com.dynatrace.diagnostics.sdk.communication.SizeAwareChunkingInputStream.b(SourceFile:81)

                at com.dynatrace.diagnostics.sdk.communication.SizeAwareChunkingInputStream.reInit(SourceFile:36)

                at com.dynatrace.diagnostics.sdk.communication.SizeAwareChunkingInputStream.create(SourceFile:29)

                at com.dynatrace.diagnostics.sdk.communication.ResponseReaderImpl.startReading(SourceFile:36)

                at com.dynatrace.diagnostics.server.frontend.clientrequest.requests.LongTermRequestForwarder$ForwardingLongTermRequest.executeRequestResponse(SourceFile:45)

                at com.dynatrace.diagnostics.server.frontend.clientrequest.requests.LongTermRequestForwarder$ForwardingLongTermRequest.executeRequestResponse(SourceFile:22)

                at com.dynatrace.diagnostics.sdk.communication.ManagedRequest.executeRequestResponse(SourceFile:58)

                at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession$SessionRequest.executeRequestResponse(DefaultSession.java:727)

                at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection$RequestWrapper.executeRequestResponse(ManagedSocketConnection.java:69)

                at com.dynatrace.diagnostics.communication.tcp.connection.SocketConnection.executeRequest(SocketConnection.java:135)

                ... 21 more

  

Can anyone tell us a reason for skipped events and socket timed out exception?

Thanks,

Kedar Saravate

 

9 REPLIES 9

andreas_grabner
Dynatrace Leader
Dynatrace Leader

Hi

First question that I have for you: are you part of the Early Access Program for dynaTrace 6.0? If so - have you also sent this feedback to eap-support@compuware.com?

As for your questions: the skipped events are probably caused by PurePaths that exceeded the maximum number of PurePath Nodes. If PurePaths run very long and produce a lot of nodes (>10000 is the new default in 6.0) the dT Server will truncate that PurePath. The agents will still keep sending events but these events will then be skipped because the dT Server already "ended/truncated" that PurePath. If that is the case in your scenario you should see PurePaths that run into the 10000 Node Limit. Simply open your PP Dashlet and look for those with the largest Size. If that is the case you want to check what these PurePaths are: are they batch jobs that run forever? Have you instrumented too many methods which produce too many nodes? Did you pick the wrong entry point method?

The SocketTimeout error is something that our engineers that are working on dT 6.0 would definitely be interested in. Thats why I asked you in the beginning on whether you are part of the Early Access Program. Please send your support archive to the above mentioned email and also reference this forum post in your email so that our engineers know that we have talked about this

Andi

kristof_renders
Dynatrace Pro
Dynatrace Pro

Hi Andi,

Thanks for your feedback. This customer is indeed on 6.0 EAP, I have been working with them.

From my recollection, they do not have a lot of very large PP. Some PP have a somewhat larger size (in the thousands) but should not reach to 10k limit. I will forward this to eap-support.

Thanks!

manfred_heching
Participant

We've identified the SocketTimeoutException's as a bug, will be fixed in GA. The skipped paths are ongoing work. Thx for reporting.

sean_mcconville
Organizer

Hello, did you get any further with the purepaths? 

I have started seeing the same issue in our new 6.0 environment (max length purepath i can find in the timeframe is 4k with an average of 200), but see several hundred skipped events and purepaths in the overview dashboard. 

Hi Sean. What type of agents do you have in your environment? Java, .NET, Apache, Mobile, ...?

sean_mcconville
Organizer

Hi Andreas...Just java/apache (and host) for us. 

 

All at 6.0.0.7000

andreas_grabner
Dynatrace Leader
Dynatrace Leader

Do you see any corrupted PurePaths? If so - it would be interesting to see where these PurePaths get corrupted. Maybe there is a special framework/appserver you are using where our sensors have an issue correctly sending all event. In the past I remember we ran into issues of special threading or webservice frameworks. So - please check if you have some corrupted PurePaths.

Other than that I assume your dynaTrace server is healthy? No problems with CPU and Memory - no issues with network bandwidth?

If that all doesn't get you further you can always create a support ticket so that our engineers can have a look at it

sean_mcconville
Organizer

Hi Andreas, 

No corrupt purepaths, no unfinished ones, nothing obvious. 

CPU is sitting quite happy around 40% memory consumed is less than 3GB.

I see no evidence of the infrastructure having any problem, the only evidence of the problem is analyser queue spikes / buffer spikes / skipped events / skipped purepaths and the occasionally loss of data from my graphs (what looks like dips in the count of measures on graphs)

I wonder if it is anything to do with the new sizing option? It seems to have automatically set us to "medium" based on the server stats. When we had been scoped as large before.

Would this affect the way dT actually performs...even if there are resources there available to do more?

I'll open a support case.

Thanks

 

Sean

smith
Newcomer

Sean/Andreas -

I am opening a support ticket for a similar issue.  Was there ever a resolution to your first issue of 1) Many skipped events found in Corelation.0.0.log file in log files of system information ??

I also see no evidence of the server having any problems, the only evidence of the problem is skipped total events / skipped events / skipped pure-paths (only for some of our system profiles) and the occasionally loss of data from graphs.

Thanks,  Sherry