02 Jun 2023 12:09 PM - last edited on 05 Jun 2023 07:49 AM by MaciejNeumann
Is there any way to get around the below message without limiting my results and making multiple queries? My dql is below, we have 2.75k hosts in that mgmt zone.
The lookup command's subquery read too much data. Please continue to filter the lookup table or narrow the query time range
timeseries cpuUsage = avg(dt.process.cpu.usage), by:{host.name, dt.entity.process_group_instance}, interval:1h, from:-2h, to:now()
| sort arrayAvg(cpuUsage) desc
| limit 5
| lookup [fetch dt.entity.process_group_instance], sourceField:dt.entity.process_group_instance, lookupField:id
| filter matchesValue(lookup.managementZones, "UNIX")
| fieldsRename process_group_instance.name = lookup.entity.name
| fieldsRemove lookup.id, dt.entity.process_group_instance
Solved! Go to Solution.
05 Jun 2023 09:06 AM - edited 05 Jun 2023 09:11 AM
The limitation is based on the implementation of the lookup command, which collects the data from the nested execution block. Keeping the full pgi information in memory is a bit too much.
As the timeseries query is the shorter/less expensive part of your query, you could also turn the query around.
fetch dt.entity.process_group_instance
| lookup [
timeseries cpuUsage = avg(dt.process.cpu.usage), by:{host.name, dt.entity.process_group_instance}, interval:1h
| sort arrayAvg(cpuUsage), direction:"descending"
| limit 5
], sourceField:id, lookupField:dt.entity.process_group_instance
| ....
you can also specify the fields/ prefix in the lookup command to add the fields that you want in your result,...
But I guess this example here might be a good starting point.
As the inner execution block now contains only 5 elements, this should make the query cheaper and faster - and no limitations would need to apply.