10 Mar 2025
10:44 AM
- last edited on
11 Mar 2025
07:28 AM
by
MaciejNeumann
I have a dataset of 1000+ hosts, each with a fixed-sized array of 120+ numeric values. Most hosts have only one entry, but a few can have multlple rows. I want to summarize those, per host, with the maximum value of each element within the array. Which is the best (most economic) way to do this in DQL?
Example extract:
host | data |
host_A | (3,8,6) |
host_A | (5,0,7) |
host_B | (1,7,3) |
host_B | (4,6,2) |
host_B | (8,5,2) |
host_C | (5,3,9) |
host_D | (4,5,6) |
Desired outcome:
host | result |
host_A | (5,8,7) |
host_B | (8,7,3) |
host_C | (5,3,9) |
host_D | (4,5,6) |
I was looking into 'dedup', summarize, and max() , takeMax, functions but didn't get any further than filtering on the entry with highest first element of the array, per host
10 Mar 2025 12:11 PM
Hi @joostm
One way how you could achieve this is with summarize, please have a look:
data record(content = """host_A (3,8,6)
host_A (5,0,7)
host_B (1,7,3)
host_B (4,6,2)
host_B (8,5,2)
host_C (5,3,9)
host_D (4,5,6)""")
| parse content, """ARRAY{STRUCTURE{LD:host ' ' LD:arr }:i (LF|EOS)}{1,}:arr
"""
| expand elem = arr
| fields host = elem[host], arr = elem[arr]
| parse arr, """'(' ARRAY{LONG:i (','|')')}{1,}:data"""
| summarize result = arrayDistinct(arrayFlatten((collectArray(data)))), by: { host }
Best,
Sini