Solved: Davis Analyzer limits - how to chunk up the dimensions

brandon_camp · ‎18 Apr 2025

I am implementing a version of a disk capacity Workflow using the one from the Docs as a starting point. In my environment there are 6k Hosts and 60k Disks so if I try to run across all Disks at one go I will get the following error from the Davis Analyzer:

[ERROR] Supported number of dimensions exceeded. Analyzer supports at most 1000 dimensions, but got 10000.

Is there a straightforward way to chunk up or paginate the input to the Analyzer?

I am currently trying to divide up the Analyzer Input by Disks whose Host match a specific tag value but I can't get that query to convert to timeseries so am exploring alternatives. Any ideas?

Thanks!

marco_irmer · ‎18 Apr 2025

Getting the divided-up query into a timeseries is probably the most straight-forward way of doing this. Another way would be to separate this flow into multiple steps in the workflow, where one step executes the DQL query to fetch the data and the Davis analyzer workflow step then loops over the prior step's results one (or more?) at a time.

Both options are kind of clunky in my opinion though, and it might be worth submitting a product idea that would see the Analyzer enhanced to natively handle larger-scale use cases such as yours.

brandon_camp · ‎21 Apr 2025

Right now I am trying to chunk up the Host list and feed that into the timeseries Davis Task but can't get it working.

General Workflow sequence:

on demand trigger for now
DQL task 'filter_hosts' to produce a list of Host IDs matching a tag value
Davis Task 'predict_disk' to start with a timeseries filtered on the input Host IDs and run the forecast
subsequent tasks flow like the tutorial and irrelevant here for this issue..

filter_hosts DQL is:

fetch dt.entity.host
| fieldsAdd tags
| expand tags
| filter contains(tags, "[foo]bar:")
| parse tags, """((LD:tag (!<<'\\' ':') LD:value)|LD:tag)"""
| fieldsAdd tagvalue = if(isNotNull(value), value)
| fields hostId = id

predict_disk timeseries is:

timeseries avg(dt.host.disk.free), 
by:{dt.entity.host, dt.entity.disk}, 
filter: {in( dt.entity.host, {{result("filter_hosts.hostId")}} )},
bins: 120, from:now()-7d, to:now()

...with the key bit being {{result("filter_hosts.hostId")}} that I've tried different variations on with no success.

When I go to run the Workflow the 'filter_hosts' looks fine and produces records:

[
{
"hostId": "HOST-028F0770BB74AB2B"
},
{
"hostId": "HOST-3CAFBC0439DCF037"
}
]

The predict_disk task has Error:

Error evaluating 'body' in input. 'timeSeriesData': Undefined variables: hostId.
Use the expression preview in edit mode to review the input.

How can the timeseries access the list of hostId's from previous task? Any help is appreciated.

brandon_camp · ‎29 Apr 2025

Update : I made some progress on this but am still encountering an error trying to pass the list of HostIDs to the Davis Forecast timeseries.

Workflow steps:

on demand trigger for now
DQL task 'filter_hosts' to produce a list of Host IDs matching a tag value
Davis Task 'predict_disk' to start with a timeseries filtered on the input Host IDs and run the forecast
subsequent tasks flow like the tutorial and irrelevant here for this issue..

The 'filter_hosts' for now just outputs a short list of Host IDs:

[
  {
    "hostId": "HOST-96BE0040C31AD413"
  },
  {
    "hostId": "HOST-69D86F3D08FE1092"
  },
… <I snipped 8 records>
]

'predict_disk’ is configured with a Loop task to iterate over filter_hosts.records and store in a variable ‘theHost’. This Davis Forecast task is configured as:

timeseries avg(dt.host.disk.free), 
by:{dt.entity.host, dt.entity.disk}, 
filter: {in(dt.entity.host,{{_.theHost.hostId}} )},
bins: 120, from:now()-7d, to:now()

When I run the Workflow ‘predict_disk’ task stops with Error: “400: Input is invalid.. constraintViolations: [{"level":"SEVERE","message":"Error parsing parameter 'timeSeriesData'. Invalid DQL query. A duration like `0040C` isn't allowed here. Please check the autocomplete suggestions before the error for alternative options.","path":"$.timeSeriesData"}]”

You can see in the error message it is complaining about a duration `0040C` which matches a substring in the first hostID value. Weird right? What is going on? Any ideas how to fix?

brandon_camp · ‎02 May 2025

After inspecting the 'predict_disk' logs I could see it was passing an unquoted Host ID string as input into Davis Forecaster. So I quoted the jinja variable reference in the timeseries definition and it worked:

timeseries avg(dt.host.disk.free), 
by:{dt.entity.host, dt.entity.disk}, 
filter: {in(dt.entity.host,"{{_.theHost.hostId}}")},
bins: 120, from:now()-7d, to:now()

Now I can proceed with looping over a list of Host IDs for the Davis Forecaster to sidestep the max dimensions error.