DQL too much data read message

sivart_89 · ‎02 Jun 2023

Is there any way to get around the below message without limiting my results and making multiple queries? My dql is below, we have 2.75k hosts in that mgmt zone.

The lookup command's subquery read too much data. Please continue to filter the lookup table or narrow the query time range

timeseries cpuUsage = avg(dt.process.cpu.usage), by:{host.name, dt.entity.process_group_instance}, interval:1h, from:-2h, to:now()
| sort arrayAvg(cpuUsage) desc
| limit 5
| lookup [fetch dt.entity.process_group_instance], sourceField:dt.entity.process_group_instance, lookupField:id
| filter matchesValue(lookup.managementZones, "UNIX")
| fieldsRename process_group_instance.name = lookup.entity.name
| fieldsRemove lookup.id, dt.entity.process_group_instance

David_Hauger · ‎05 Jun 2023

The limitation is based on the implementation of the lookup command, which collects the data from the nested execution block. Keeping the full pgi information in memory is a bit too much.

As the timeseries query is the shorter/less expensive part of your query, you could also turn the query around.

fetch dt.entity.process_group_instance
| lookup [
    timeseries cpuUsage = avg(dt.process.cpu.usage), by:{host.name, dt.entity.process_group_instance}, interval:1h
    | sort arrayAvg(cpuUsage), direction:"descending"
    | limit 5
], sourceField:id, lookupField:dt.entity.process_group_instance
| ....

you can also specify the fields/ prefix in the lookup command to add the fields that you want in your result,...
But I guess this example here might be a good starting point.
As the inner execution block now contains only 5 elements, this should make the query cheaper and faster - and no limitations would need to apply.