cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

DCRUM on Virtual

carl_kyonka
Newcomer

We have strong motivation to move as much activity off physical servers and onto virtual (VMWARE). Actually we will have to persuade our virtualization specialist we cannot go PtoV. This includes our CAS, ADS and AMD servers. Looking at CPU and memory stats (we use SV), the CAS and ADS might fit in 4 engine, 24 GB servers. Our AMD would probably fit in a similar sized guest. We have a tap aggregator (my phrase) and we only need to connect the AMD to that aggregator.


I think the CAS and ADS are pretty good candidates for virtualization. I am not so sure about the AMD. There are lots of good reasons to virtualize, such as complying with company directions, recovery, ability to upgrade in parallel, our ESX servers can handle 10G, our current AMD hardware has 1G, etc.


Has anyone done this? What CPU and memory specs?


Are there ways to size the guest systems? Mbps processed?


Are there reasons that some or all of these systems should NOT be virtualized?


(I did do a search on this forum and looked at the 12.1 hardware manual.)


 

11 REPLIES 11

ulf_thorn222
Inactive

 

Hi Carl

There's pretty extensive coverage in the PDF's, particularly in the "Virtual Environment Administration guide". That covers both things to consider if you are deploying DC RUM in a VM or if you want to monitor something deployed in VM.

What I tend to recommend is to never have 2 of the pieces in the same VM host as we can chew up I/O (potentially up to 10k IOPS) when we get data from the AMD. Notice that this isn't sustained but a peek or burst when the data comes over from the AMD. This is something you need to make clear to the VM admins. Further (depending a bit on what version) I tend to recommend trying to have the AMD external. This typically needs to be assessed in relation to available bandwidth/ports.

Hi Carl,


I believe you will find all Virtual Environment related information here:

Virtual Environment Administration Guide

Regards,

 

Grzegorz

pawel_brzoska
Inactive

Carl,

here're some considerations you may want to take into account:

  • both probes and report servers are not perfect candidates fr virtualization because they generate constant load and have constant high resource demand
  • because of the above (except very lightly loaded systems) to ensure proper operation VM admin would have to allocate resource reservation which usually negates or diminishes the factor of cost saving on physical hardware
  • In addition to that VM admin needs to expect and take into account the performance impact that VM guests with our tools will have on the co-hosted guests
  • there are several limitations to virtualized AMD functionality - custom driver can not be used and ssl keys can be kept on on the disk (no VM FIPS compliant solution is available today)
  • last but no least: when sizing the appropriate resource allocation for report servers, VM admin needs to remember that this is a warehouse database system, so assigning certain amount of RAM and CPU is not enough as the main bottleneck typically lays in storage IO bandwidth. This parameter will shape the performance of the whole reporting engine and if is not sufficient you will have massive performance problems
  • Note that there's no good way of assigning HDD IO resources in VMware and by definition these resources are scarce  as typically single server-grade SCSI disk system needs to be shared among several guest machines.
  • there's a free CAS benchmark tool available that helps you assess current speed of whole SQL subsystem (either local or remote) and compare it to the one running on reccomended physical hardware. For highly loaded systems VM admin needs to ensure that result is close to the benchmark otherwise the capacity of such report server will be much smaller than if it was running on dedicated physical hardware

ted_mintus
Newcomer

CAS and ADS could be candidates for virtualization, however, I would strongly take note of the cautionary comments from Pawel and Ulf.

The decision to use a virtual AMD or a physical AMD in a customer’s environment is a complicated one involving a number of factors, such as the supporting network architecture, whether the virtual environment is built on blades or standalone servers, the number of nodes in the cluster, number of available NICs on the host, and the version and license level of vSphere deployed.  I would highly recommend using a physical AMD first, using RSPAN or ERSPAN techniques to present packets of interest to the physical AMD.  These techniques require the support and use of other network components.  If your newtork architecture does not support the use of these techniques, you may be forced to use a third party solution such as from Gigamon to bring intra-host packets to a physical AMD or use multiple virtual AMDs in order to acquire intra-host packets.  Due to the nature of simple port mirroring or internal RSPAN within a VMware environment, source and destination ports must reside on the same host.  This means that you would need a virtual AMD per host which is not subject to migrations.  You would probably find that this is not as cost affective as using a physical AMD.

 

carl_kyonka
Newcomer

Thank you all so far. These comments are on point. I did find the manual mentioned and it does seem to allow for virtualization. I believe it is possible to configure VMWare guests so they do not land on the same ESX host (and also that they do co-locate). I am downloading the install files for DCRUM 12.1, hoping that the CAS database benchmark comes with it since I did not find it independently. (Or should I have downloaded 12.2?)

The comments suggest to me that virtualizing the CAS and the ADS is reasonable. We are a smaller installation. We have separate physical servers for CAS and ADS, both running most of the time at about 1% of 24 engine servers. Disk reads and writes are usually below 3 MB/sec. I/Os less than 50/sec. (But there are spikes at night up to 42MB/sec and 1,000 I/Os/sec) Our one AMD peaks at 600 Mbps, typically running at 400. We do use Gigamon and that delegates a lot of complexity to the network team. It also reduces the number of inputs to the AMD.

I would like to virtualize the AMD because it would avoid a physical upgrade that costs enough to be a worry. Also, a virtual environment would make it easy to install a new version in parrallel, using the Gigamon to pipe the traffic of interest to both AMDs. (I know this would be true of physical AMDs too.) We are using 3 connections between Gigamon and the AMD, but that could be collapsed if needed.


Given the smaller size of my installation, would it be feasible to attempt to run the AMD virtual?

Hi Carl

Virtualizing your AMD could mean giving up 600 Mbps of the VM chassi port to the AMD input traffic. I'm not clear on how many ports you have in the chassie(s) or the capabilities of those. But alos realize that the virtual AMD has less "chewing" power thatn a physical so you can potentially need 2.                     

robert_cotter
Newcomer

 

From my experience in testing AMD11.7 and AMD 12.1 in a lab enviroment, the use of a Physical AMD is more reliable for when things go wrong than host in a VM due to resourcing issues as identified by earlier posters. The only time i would recommend from experience and testing to use a Virtual AMD is when your looking at inter VM traffic on the same physical host but ensuring that the virtual switch connecting the VM's is upto spec is hard and very important.

The Virtual AMD also comes at a price and it can be very high some times depending on the decode(s) used, the volume of traffic it is processing and the amount of network noise traversing the virtuals switch its getting its data from.

Good luck and test the setup well before commiting to your managment is my recommendation.

 

 

 

 

praveen_begur
Dynatrace Organizer
Dynatrace Organizer

Hi,

I have a prospect customer in India who has just decided to purchase Dc RUM. Client asked me to recommend hardware/software for deployment and hence I am seeking your help.

Client will have 2 AMD's (One per Data Center) and 1 Report Server. 

Both AMD's will be getting traffic Before the Firewall (after firewall deployment is not in scope now). 

Expected traffic is about 500 mb ps.

 

For AMD: I have recommended physical AMD and Client has decided to purchase HP DL380 G9 - Full Banded and this is as per recommended specs (as in AMD install guide).

 

For Report Server: Client is pushing for Report Server and SQLServer db to be in VM (VMWare ESX). Storage will be on SAN and Client told me that their SAN speed is quite good (I dont have the numbers). 

I have recommended below specs for the Report Server VM:

1. ONE Server Grade VM Box for Reporting Server (executes CAS, ADS, RUM Console, SQLServer DB):
a. CPU, RAM, and Disk: 64 GB RAM, 12 Core CPUs, 800 gb (allocate more space if required since we are monitoring Enterprise-wide traffic)
b. Operating System – Windows Standard Server 2012 64 Bit OS.
c. Database – Windows Standard 2012 SQLServer 64 Bit OS

 

My Queries:

  1. For this VM, how much disk i/o speed should I recommend? 
  2. Will this Dc RUM Report Server be performant? 
  3. Shall I accept VM for Report Server OR should I recommend Client to adopt physical hardware for Report Server (I need to properly justify it)? 

 

I have seen the above discussions and responses (from Pawel B as well)

 

I have also seen 'DCRUM_VirtualEnvironmentAdmin.pdf' and found that page 19 'Report Server in VMware Environment' is very basic and is missing the important info that I need.

 

 

Erik_Soderquist
Dynatrace Pro
Dynatrace Pro

We have many customers who have the report servers and the SQL servers in virtual environments.  I would make sure to make clear that the virtual specs are starting points and may need to increase based the actual data complexity in their environment, and I would actually scale back the number of cores recommended.

 

VMware ESX (as I understand it) has the virtual machine wait until the specified number of cores are actually available, even though the virtual machine might be able to run all it needs with half, and the CAS and SQL hosts in DC-RUM environments are usually memory bound rather than CPU bound.  I would recommend 2-4 cores, but specifically note that the MHz available to the CAS and SQL hosts not be capped.  With that configuration, the ESX host will happily run runnable threads on additional cores for the processing clock cycles without specifically waiting for a higher core count to be available.

 

As to the disk I/O speed... I don't have a recommendation for that; it varies too much.  With DC-RUM being geared so much around the 5 minuter intervals, the SQL host typically needs very high burst write speed, and the bursts are difficult to measure, especially since most SANs are rated against sustained read/write speeds.  The ESX host's and the SAN's read/write buffers will come into play, and it usually is a question of whether or not the buffers are overrun, and whether or not the SAN is capable of writing the buffers out between bursts.

 

-- Erik

matthew_eisengr
Inactive

Praveen,

Lots of experts on this thread and I don't claim to be one, but I do have physical AMDs connected to VM'd CASs and the performance on the reporting works just fine as long as your VM host isn't slammed. Originally we started out with 64GB of RAM per VM and 2 Cores assigned to each CAS with all disks located on a SAN. When the data would come in from the AMDs, the CAS's CPU would spike to 100% on both cores for quite a while as it chunked through the processing. Clearly that was a bottleneck so we increased it to 4 cores, it didn't leave it pegged like before and we left it at that. I think more than 4 cores actually works against you as it increases the complexity of the hosts sharing so the VMs would be constantly contending for the same cores (or so I've read). 

As Erik stated, the memory is where we always ran into an issue.

I normally start with these settings and then tweak until I'm happy:

64GB box:

Always leave 4GB for OS (gives windows server room to breathe)

Always leave 4GB for RUM Console

Then I split the rest of the memory 50/50 between CAS JVM (28GB) and SQL (28GB).

I let these run for a few days then check my CAS memory settings in the diagnostics reports. I take the highest peak during those days (say it was 18GB) and add 20% head room (so even 20GB) and give whatever I took away from the CAS (-8GB) and add it to the SQL instance (28+8= 36GB)... or vice versa if you have a hungry CAS.

Good luck!

praveen_begur
Dynatrace Organizer
Dynatrace Organizer

Many thanks to Matthew and Erik. 

I have a further query regarding SQLServer licensing and vCPU for this customer that I will ask in a separate posting.