Solved: Oracle Extension version 3.2.0 configuration variables - heavy-query-interval

mn_24 · ‎07 Jan 2025

Hi, I wanted to ask if I understand properly this part of the new version 3.2.0 - as till now I thought if I set it to 900sec(15min) this will help me when for some reasons for example a tablespace query needs more time than 10sec to fetch the data from the database. Now please clarify if that field is in seconds or in minutes - what is the minimum and maximum value there we can put?

Mizső · ‎08 Jan 2025

Hi @mn_24

I thnik it is second. I remember tabel space query timeout was a problem at many clinets. @Mike_L extension team can confirm the min - max range.

Best regards,

Mizső

Dynatrace Community RockStar 2024, Certified Dynatrace Professional

Mike_L · ‎08 Jan 2025

It is indeed in seconds. The number needs to be at least 1, and it cannot be larger than the interval. Ergo, if a query runs every 5 minutes, the timeout needs to be smaller than 300.

For this specific extension, it needs to be smaller than the two query interval variables.

Mike

TomásSeroteRoos · ‎08 Jan 2025

As @Mike_L mentions below, the field is in seconds and should be smaller than the interval for each query. Do note as well that not all queries are affected by this variable. At the moment only the queries under asm (detailed), tablespaces, tablespaces (detailed) and topn are affected.

We will be adding some more information on the Hub tile and the UI to clarify these points.

As for the tablespaces query, the newest version introduced moved the old query into the tablepaces (detailed) feature set and now the tablespaces feature set has a simplified, faster query (which pre-filters some data and avoids an expensive subquery) that could maybe still address your use-case. Please do have a look on the Hub tile's FAQ section for more details on this.

mn_24 · ‎08 Jan 2025

Hi all,

but in this case I see these messages in the ActiveGate logs:

[2025-01-03 | 09:15:45.049 | zScheduler_Worker-13 | ERROR | c.d.s.p.p.AbstractPoller | While polling from endpoint ************************ querying group 'ASM Disks' failed
[out]java.lang.IllegalArgumentException: Timeout 900 seconds cannot be longer than interval 1 minutes.

2025-01-03 | 09:15:45.195 | zScheduler_Worker-24 | ERROR | c.d.s.p.p.AbstractPoller | While polling from endpoint ***********************, querying group 'Tablespaces' failed
[out]java.lang.IllegalArgumentException: Timeout 900 seconds cannot be longer than interval 5 minutes.

so for the ASM disk data is fetching on every 1 min but for the tablespace on every 5 min and if I set in "heavy-query-timeout" let's say 295sec --> this will be a good approach for the tablespace query but what about the ASM?

TomásSeroteRoos · ‎08 Jan 2025

What's happening here is that the ASM Disks group, in particular the query under the asm (detailed) feature set, is set to run with the frequency of the query-interval variable (which defaults to one minute), while the tablespaces query is using the heavy-query-interval variable (which defaults to 5 minutes).

When interval variables were added, this was implemented like this for backwards compatibility reasons, as previously the asm (detailed) feature set was running every minute.

Nonetheless, we will look into this and check whether it can be adjusted next release. Honestly, for metrics like disk space, a 5 minute resolution should be enough.

mn_24 · ‎09 Jan 2025

Hi @TomásSeroteRoos 5 minute is fine also for the disk space. However you should add it as a Field at least where we can adjust it because otherwise both metrics are working against each other. So if I set the value in "Heavy-query timeout" to 55(sec) than the +ASM metrics work fine but those for Tablespace are failing and the opposite when I set it to 295(SEC around 5min) - the Tablespaces are fine but the +ASM are failing. When it is left blank than the default value of 10sec step in but than the Tablespaces are failing again as they need more time to fetch the data.

And one more thing which I've observed is that even those changes were presented in version 3.2.0 I see different behavior in 3.1.3 - please check if those particular haven't been applied on 3.1.3 also.

mn_24 · ‎09 Jan 2025

The Oracle Extension version here is 3.1.3 !!! still not 3.2.0

1/ first I saw these errors and the value in the "heavy-query timeout" was set to 900sec(15min):

09:15:45.049 | | While polling from endpoint *****************, querying group 'ASM Disks' failed
lArgumentException: Timeout 900 seconds cannot be longer than interval 1 minutes.

09:15:45.195 | | ERROR | | While polling from endpoint ************, querying group 'Tablespaces' failed
ArgumentException: Timeout 900 seconds cannot be longer than interval 5 minutes.

2/ I've changed the value to 295sec(5min)

the errors for the tablespaces stopped but continued for the ASM

3/ I've change the value to 0

than this message appeared in the logs: Variable long-running-query-timeout is not numeric. Default timeout will be used.

I know that the default is 10sec

after this change there were no errors for the ASM but this one for tablespace started to appear:

While polling from endpoint *********************, querying group 'Tablespaces' failed
ava.sql.SQLTimeoutException: ORA-01013: user requested cancel of current operation

so 10sec is not enough for data from tablespaces to be extracted, 55sec is not enough too but if I change it to a higher value than 1min(55-60sec) than I break the 1min interval ASM rule....

Is it possible somehow that the change is applied already to the lower versions as well?

TomásSeroteRoos · ‎10 Jan 2025

@mn_24 wrote:
Is it possible somehow that the change is applied already to the lower versions as well?

No. Version 3.2.0 added the ability to configure query intervals, but for lower versions the following intervals apply, as per the Hub tile's FAQ (see the second paragraph):

From version 3.2.0 onwards, query execution frequency is controlled by the configuration variables query-interval and heavy-query-interval. Most of the queries executed by the extension will run every query-interval minutes (with a default of 1 minute), while the queries under tablespaces and tablespaces (detailed), Blocked sessions and TopN will run every heavy-query-interval minutes (with a default of 5 minutes).
For older versions, most queries run every minute, with exceptions for the heavy queries mentioned above, which run every 5 minutes.

This means that for versions 3.1.3 and below, the Tablespaces query is running every 5 minutes, while the ASM Disks query is running every minute, so it is expected you would run into the behavior you are describing above.

This is definitely an oversight on our part since, as you demonstrate, this effectively blocks you from setting the timeout to more the 60 seconds. I'm surprised this is the first time someone brings this up, as it's been like this for quite a while...

In any case, as mentioned before, we will look into the best possible solution and try to address it in the latest release!

mn_24 · ‎10 Jan 2025

Hi @TomásSeroteRoos ,

now I realized why ASM metric was failing ... because I checked both boxes(feature sets) for "asm" and "asm(detailed)". When I un-checked the detailed one - now everything seems to be as it should be. If you ask me "asm(detailed)" need more than 1 min for sure as it is checking all disks...

So, yes 1 min to check every disk in the +ASM is not enough but on my oppinnion from database perspective I am more interested in the disk GROUP usage instead of every disk in the group.

Fine, but as it is stated in the FAQ section and reading the description for the new version 3.2.0 - I still don't get it - what is the new thing here?

So for me it seems like before and after 3.2.0 the data from the so called "Heavy-query timeout" metrics - Tablespaces, TopN and Blocked Sessions is extracted on every 5 minutes interval, all others are based on 1 min interval.

TomásSeroteRoos · ‎10 Jan 2025

now I realized why ASM metric was failing ... because I checked both boxes(feature sets) for "asm" and "asm(detailed)". When I un-checked the detailed one - now everything seems to be as it should be. If you ask me "asm(detailed)" need more than 1 min for sure as it is checking all disks...

Exactly. The asm feature set corresponds to the ASM disk group query which does not use the timeout variable, while the asm (detailed) does use that variable. Both are set to run every minute, but since the latter tries to use the variable, setting the timeout to more than 60 seconds breaks that query.

Fine, but as it is stated in the FAQ section and reading the description for the new version 3.2.0 - I still don't get it - what is the new thing here?
So for me it seems like before and after 3.2.0 the data from the so called "Heavy-query timeout" metrics - Tablespaces, TopN and Blocked Sessions is extracted on every 5 minutes interval, all others are based on 1 min interval.

The "new thing" with 3.2.0 was the addition of configurable query intervals. Two new variables were introduced, query-interval and heavy-query-interval, the former to control the frequency of all queries which previously ran every minute and the latter to control the frequency of the queries previously running every 5 minutes.

These variables default to 1 and 5 minutes respectively, meaning that if you upgrade and don't change those variables, you will indeed see no difference in behavior.

mn_24 · ‎12 Jan 2025

Hi @TomásSeroteRoos ,

and also could you please clarify what are the reasons behind those kind of error messages and from where those 30sec are coming?:

| ERROR | c.d.s.p.p.AbstractPoller | While polling from endpoint ************, querying group 'Instance status' failed
Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:4; busy:4; idle:0; lastwait:30000].

WARN | c.d.s.s.r.RuntimeService | Dead module found: ***********************

WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Databases wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Cluster wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup CPU Usage wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Library Cache Hit Ratio wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup User calls and deadlocks wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Sessions wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Wait Events Detailed wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup IO Physical bytes wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Limits wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Database status wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup PGA Size wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup IO Wait time wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Instance status wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup PGA Usage wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Cores wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Shared Pool Free wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Redo wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup ASM Disk Groups wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Fast Recovery Area wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Tablespaces wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Datafiles wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Backup Jobs wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Time since last backup wont be done.
WARN | c.d.s.p.p.SubgroupPoller | Subgroup parent status is not OK, polling of subgroup Query performance wont be done.

TomásSeroteRoos · ‎13 Jan 2025

@mn_24that seems to be backend stuff related to the datasource on which the extension runs, not related to the extension itself, so I'm not entirely sure...

Your best bet is probably to open a support ticket for that particular issue.