cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Server Cache Limit Exceeded - Revisited

genesius_jarom1
Organizer

I know this question has been asked before here:

https://answers.dynatrace.com/questions/118885/ser...

However, I am new to this product and need a little more direction.

Here are the points from the previous post answers with my question/comments.

@Ulf
Thornander
– “You should also check your "Server IP Range". This is
done by going to the advanced properties and changing those HTTP://YOURSERVER.COM/ATSCON and
selecting "Advanced Properties Editor" and then specifying the
"Accepted Server IP address range".”

GJ – There is no "Server IP Range” under "Advanced Properties
Editor". Besides, according to the test reports I was building, there has
been no more than 11,000 rows.

============

@Pawel
Brzoska
– “Server limit also includes distinct urls, so that explains why it
happened after reconfiguration of url parameter. Most likely there are too many
values of this parameter resulting in thousands different urls populating the
database. 50k is a lot, so i dont advise increasing this limit, rather take a
look at how dynamic is the parameter you configured and tweak its definition to
result in more aggressive aggregation of parameter values to single
operations.”

GJ – Where do I find this parameter you mention?

============

@Adam
Piotrowicz
– “Please go to http://<CAS>/modulestatus?advanced=1 page of your CAS and copy/paste here
rows with Module name column set to "Advanced DB Statistics".”

GJ

@Adam
Piotrowicz
. - "Indeed this script is not answer which URLs are filling server cache
(that is the most often reason) but only a direction which Software
Services have the most sessions (that is usually related to number of
URLs) so we know which one should be investigated in DMI by making very
simple report with Software Service and Operation dimensions and Operations metric."

GJ - I added Server name and Server IP address to the report, hoping to increase the number of servers discovered.

Report: Software Service, Operation, Server name and Server IP address dimensions and Operations metric = 9,181 rows

"Advanced DB Statistics" URL's on servers = 100,372

============

What am I not understanding here?

Thanks in advance for any guidance you can provide.

God Bless,

Genesius

7 REPLIES 7

adam_piotrowicz
Dynatrace Pro
Dynatrace Pro

Genesius,

The first question is what would be the version of your CAS.

genesius_jarom1
Organizer

@Adam Piotrowicz

Product Name: Central Analysis Server


Version: 12.3.3.29

Thanks and God bless,

Genesius

adam_piotrowicz
Dynatrace Pro
Dynatrace Pro

OK, being on SP > 2 keeps you from many problems with server cache. Then we can try to sum up our experience about server cache thing.

In general server cache keeps all the servers and URLs that CAS saw during last (by default) 10 days. So the reasons for it to grow would be either too "wide" monitoring (you monitor everything) or monitoring servers that provides you with many unique URLs having i.e. number of session in the operation name and each operation is unique and will never re-appear.

Usually we start troubleshooting by identifying top Software Services that generates most URLs and servers. We use this SQL script to identify Software Services that the then troubleshoot in DMI: if operations are very unique, how many servers are reported, etc., and then we re-visit the configuration to find a way to aggregate the data. If that is not possible we ask customers to limit the traffic or introduce load-balancing on CAS.

When server cache limit is exceeded, several things happen:


  • no new servers and URLs will be added to the CAS database - so it is possible that data loss will occur (you won't see new URLs or servers that have been configured for monitoring)
  • existing URLs and servers that CAS already knows are processed as usual (measurement data for those is recoded by CAS)

When a new data file (5-min zdata package) with new servers/URLs is processed by the CAS and cache limit will be exceeded in the middle of it - servers are processed first, then URLs are processed, so chances are that servers will be recorded and some (randomly chosen) URLs may be ignored.

Summarizing, if you have an existing server already known to CAS and you expanded number of URLs monitored to the extent that cache has been exceeded - you should still be getting complete measurements for the server level and complete measurements for some URLs (those that have been known to the CAS already plus, perhaps, some of the newly added URLs). Remaining URLs data will be rolled up into All Other Operations.

Again, if you have a server cache limit exceeded on < 12.3.3 it's very likely that you are not monitoring to much data but you experience a bug in DC RUM that is not correctly ages data.

Please let us know if any aspects of this subject are unclear and we will happy to reply.

ulf_thornander3
Inactive

Hi Genesius.

I'm wondering about your statement:

GJ - I added Server name and Server IP address to the report, hoping to increase the number of servers discovered.

Do you want to see more servers?

Are you using DCRUM as a Discovery tool to find out what is communicating on your network?

It "can" work in such a manner but it's not it's primary function. As Adam is eluding to, more than just the server IP address is at play when the "The Server Cache Limit" is reached. With that in mind, please understand that having the warning is NOT a desired state as it can create incoherent data, and the problem should be solved as quickly as possible.

However, if you still want to examine and analyse the network and you bump into the "Server Cache Limit" ever so often, I'd consider expanding into a cluster and utilize a second and/or a third CAS, depending on the nature of your traffic.

Hi Ulf,

I want to "increase the number of servers discovered" so I can determine why we have to continually increase the cache. My understanding is we were at 20,000 a month or so back (before I arrived). Now we went from 80,000 (after I arrived) to setting to 200,000 so the error would disappear. I don't want to place a bandage on the problem, but solve it. Therefore, I want to see ALL the servers that Advanced DB Statistics indicate exist.

Thanks and God bless,

Genesius

genesius_jarom1
Organizer

@Adam Piotrowicz,

I ran the script you provided and received the following error.

Msg 208, Level 16, State 1, Line 1
Invalid object name 'rtmsession'.

When I perform a search in the CAS database for "rtmsession" the table is present.

Note: I am using MSSQL Server Management Studio to run.

I also attempted running "parts" of the script and received the same error.

Thanks and God bless,

Genesius

PS I will be out on Monday and not able to reply.

Genesius,

It sounds like you're connected to the SQL Studio using other user than delta and I admit the script is not ready for that.

Please find cache-troubleshoot-fixed.txt that could be run on any user.

Try it and let us know.