cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

This product reached the end of support date on March 31, 2021.

Datacenter consolidations - need help fine-tuning/predicting application performance with DNA

Tomasz_Szreder
Advisor


@Shakti Sareen from Stryker posed yesterday a number of serious questions with regard to using DNA in profiling an on-premises versus a web-based application and fine-tuning their environment for better performance (SMB). They are now in the process of planning datacenter migrations/consolidations and are looking for performance considerations. Sounds like a perfect use case for DNA!


I felt it’s time to reach for the collective knowledge of our Community.
DNA Power Users - can I ask you for your insights or tips? Thanks!



Had some doubts and queries related to traces capture by me today. attaching the same



Just to give you a background that we are doing consolidation of many DC into few, hence lot of application migration activity going on, hence many application team want us to do prediction exercise from latency perspective before migration.



One of the application about which I had a query is accessed two ways over web and through exe on the desktop. All users access the application locally till now. But after the migration odd 50ms latency will be added.



Now traces which I captured for web, traces of both client and server are almost same with no difference. But traces captured when application accessed over exe on desktop have very different processing sending node time if compared both client and server traces. I have attached the traces for your review.



1) Let me know which trace client or server side to be used for further analysis and showcasing to application team. Client trace is “EM00027827L_LaunchAdmin” and server “DUITTS01_LaunchAdmin”.



2) I saw vast difference when application accessed over web and through exe. Over web it was fast hardly 2 seconds to open the app page but through exe it has taken 32 seconds. Just to add through exe only admin have rights



3) Also I can see in server side trace and client side traces have errors, is there any interesting think to recommend here?



4) Which trace to be selected for doing prediction exercise? Latency details 50 ms and bandwidth 4 mbps load 50% defined in prediction page and got response time increased to 545 sec for client trace and 507 sec for server trace.



5) Do SMB protocol on best version keeping in mind server is hosted on Windows 2008 R2 Enterprise.



6) Also I can see few SMB threads which has taken too much time almost 32, 28 15 seconds and so on, what we can recommend here to tune this? Also many traces having duration as 1 second can impact over latency, so what suggestion can be best suited here?



7) SMB protocol read block size looks like 65536 bytes, kindly confirm if I interprt right?


After reviewing the traces if you find something interesting to recommend would be good.

10 REPLIES 10

gary_kaiser
Dynatracer
Dynatracer

I would use the client trace to analyze and predict the .exe access. The
server trace includes packets captured with TCP offloading enabled, which makes it more difficult to interpret,The most important characteristics for prediction are app
turns and bytes. Both traces have similar values for these metrics; over 8,500
app turns and about 14MB.

For prediction, each app turn incurs the latency of the link.
So 8,500 x 0.050 = 425 seconds. Add to this the impact of bandwidth and
processing delay, and response time will likely be greater than 500 seconds.

For the errors, I don’t think they’re adding any significant
delay to the process – although with the introduction of network latency, they
may be more problematic. You can check the service packs on your Windows Server
machine; here’s a link that mentions the object not found errors; it may or may
not apply.

https://support.microsoft.com/en-ca/kb/2628582

I also wouldn’t worry about the long-running threads; they
don’t block subsequent threads, and so are not a problem.

One of the constraints on performance is the read block size
in use. This is 32KB; you can identify this by looking at the threads that
transfer a lot of data, using the Server Payload Bytes column – the maximum is
32KB. Look at the thread that reads tw32.exe, a 4.4MB file. There are almost
200 app turns to read this file, 32K (or less) at a time. Once you add 50 ms.
of latency to the network, this will take an additional 10 seconds to read from
the remote server. Increasing the block size will help a lot, as long as there
is enough bandwidth to carry handle the additional load. Or ask a different
question; if this is an executable, does it need to be downloaded from the
server each time? Or can it be stored at the client PC, with a quick check for
version currency at the start?

Thanks for the info Gary. It really helped me moving forward.
Would like to answer the last section:
User have a link on desktop "\\xxxxxx\tisoware$\Bin32\tw32.exe", the exe file is on the server xxxxxx.

Let me know is there any way we can recommned them different way in accessing the exe so that performance improvement can bee seen

I think Gary recommended
that tw32.exe is run locally instead of opening it from a network share. Is it
an answer to your question?

Actually, we had a similar
problem in the lab – centrally maintaining a significant number of developer
tools/helper applications so that they can be quickly accessed remotely on any
of our virtual machines. The solution in the case of some larger apps was
replacing shortcuts to remotely stored .exes with similarly named scripts
responsible for fetching the application to a local hard drive and running it
locally. This way, first run takes a little longer due to the copy time, and
each subsequent run is fast. What do you think?

Every time user click on the link \\xxxxxx\tisoware$\Bin32\tw32.exe, server xxxx being contacted and it took same amount of time. there is no exe on user machine which can first connect to server and it's always through link contact server and open exe. Will this impact performance when system moved over WAN from LAN?

Yes, it is
inevitable after moving the server from LAN to WAN. SMB is a very chatty protocol
and the increased latency affects greatly file copy/load times.

There are
also tools like WAN optimizers that attempt to mitigate the problem
transparently, though I’m not an expert on this.

I’m quite
sure newer SMB versions shipped with newer Windows releases offer improved
performance (Windows 7/2008 R2 – SMB 2.1, Windows 8/2012 – SMB 3, …), but to
make use of it you must make sure both the server and the client support a
given SMB version. So, even if you host the file on a Windows Server 2012 R2,
accessing it from a Windows Vista machine won’t give you the performance boost
of the new SMB 3 protocol.

In fact,
RTP/RTP Sweep is the tool to estimate impact of several network characteristics
in a different environment.

Quick tips
before using RTP:

  • Use trace captured in the local network
  • Use bandwidth
    estimator and latency finder to determine new network parameters
  • Use RTP Sweep
    to find out which network parameters affect performance the most

Bear in mind
that RTP estimations are valid for a given connection only (Client A – Server S;
other clients B, C or D may have different characteristics, so you have to
capture another trace and run a new set of estimations). Gary’s an expert on
RTP, if you feel like learning more.

Tomasz_Szreder
Advisor


Just wanted
to comment on the jumbo frames in one of the traces – DUITTS01_LaunchAdmin.opx. As Gary mentioned, it is not very useful for performance analysis in DNA.


I notice
that every now and then many DNA users ignore the warning of network task
offloading. This is a Microsoft enhancement aimed at improving network performance
in modern Windows environments, but it comes at the cost of losing the actual picture
of network packets flow on the wire.


We
discourage capturing traces on machines with task offloading enabled, because that
can blur the analysis and calculations, e.g. network time effect in CNS. This
is because what DNA sees in the trace (frames with large payload – “jumbo” frames)
can be very different from what travels through the wire (multiple small frames
for each large one).

How to
identify that network task offloading was enabled at the time of capture




  • If you open Error Analysis for trace DUITTS01_LaunchAdmin.opx, you’ll notice multiple
    mentions of a warning
    Network: Frames larger than MTU;

  • In Thread Analysis, the
    thread group
    SMB2: Read File=Bin32\tw32.exe
    FileId=190000001f5:ffffffff00000025
    is represented by a violet bar which
    indicates the same problem – jumbo frames. In the column
    Avg. Server Frame Size,
    there are values of 16522 or larger;

  • In Packet Trace there are packets
    with large payload (for example packet #156).

Checking
and disabling task offloading


If possible, please make it a rule when installing agents on new machines,
unless not planning to do performance tuning based on traces captured there.


To find out whether task offloading is enabled, run


netsh int ip show global | findstr /i /c:"Task Offload"


To disable it:


netsh int ip set global taskoffload=disabled


More details here:
https://community.dynatrace.com/community/display/DNA124/Disabling+network+task+offload


Regards


Tomasz

But in this case client and server residing locally and trace captured from both sides i.e. client and server. Hence for analysis we can take client trace since it does not contain "

warning Network: Frames larger than MTU;" AND nothing to do with task offloading.

Also today I captured the trace back again after increasing the block size to 64 KB on server but was not able to capture client side trace due to issue mentioned in the ticket raised today and currently under your supervision.

But can I verify threads if they have increased their block size from 32 kb to 64 kb or not from server trace?

As I can see in server trace that block size is still the same, so is that Ok because thread level info related to server payload bytes is similar in client and server traces captured earlier hence thinking for verifying the block size with server trace be OK.

Also since this application accessed through link hence reason of not increasing the block size also dependent on setting at client machine as well?

shakti_sareen1
Contributor

In my previous note I have mentioned about how user accessing the application currently, so any views on it. also if in case for any thread server payload bytes are zero that what does that mean? because in one more capture after launch , I can see zero server payload bytes mentioned for all threads linked to that capture but thread have some bytes linked to it and app turns one for each making that total 190 app turns that what does this means? let me know if I need to share the trace

Let's explain the zero payload bytes problem through JIRA.

shakti_sareen1
Contributor

@Gary Kaiser @Tomasz Szreder As Gary mentioned by tuning the read block size by increasing it to 64KB, configuration was done on the server and reboot was initiated. Again trace being capture but no change, all remain same in terms of block size read and other attribute. As already mentioned this app accessed through link hence on opening link application get launched on the client PC so do any changes need to be done on client pc also and anything else need to be checked to make this block size increased? any recommendation please?