12 Nov 2024 10:23 AM - last edited on 13 Nov 2024 07:49 AM by MaciejNeumann
Hi,
I'd like to build a complex regex in DQL. I can't use the same syntax as OneAgent's masking rule, so I tested the parse command:
fetch logs
| parse content, "DATA ([:space:]|[:punct:])([12][0-9]{2}[0][1-9][0-9]{2}[0-9]{3}[0-9]{3}[0-9]{2}):myfield1([:space:]|[:punct:])"
| parse content, "DATA ([:space:]|[:punct:])([12][0-9]{2}[1][0-2][0-9]{2}[0-9]{3}[0-9]{3}[0-9]{2}):myfield2([:space:]|[:punct:])"
| parse content, "DATA ([:space:]|[:punct:])([12][0-9]{2}[0][1-9][2][A-B][0-9]{3}[0-9]{3}[0-9]{2}):myfield3([:space:]|[:punct:])"
| parse content, "DATA ([:space:]|[:punct:])([12][0-9]{2}[1][0-2][2][A-B][0-9]{3}[0-9]{3}[0-9]{2}):myfield4([:space:]|[:punct:])"
| fields myfield=coalesce(myfield1, myfield4, myfield3, myfield4), content, log.source, k8s.namespace.name
| filterOut isNull(myfield)
or
fetch logs
| parse content, "DATA ([:space:]|[:punct:])(([12][0-9]{2}[0][1-9][0-9]{2}[0-9]{3}[0-9]{3}[0-9]{2}):myfield|([12][0-9]{2}[1][0-2][0-9]{2}[0-9]{3}[0-9]{3}[0-9]{2}):myfield|([12][0-9]{2}[0][1-9][2][A-B][0-9]{3}[0-9]{3}[0-9]{2}):myfield|([12][0-9]{2}[1][0-2][2][A-B][0-9]{3}[0-9]{3}[0-9]{2}):myfield)([:space:]|[:punct:])"
| fields myfield, content, log.source, k8s.namespace.name
| filterOut isNull(myfield)
But this costs "four DQL queries" (for each alternative) and this is not acceptable to the customer because of the high volume ...
What's the best practice for complex regexes? Is there a roadmap for improving the use of regexes with DQL? Or to harmonize the syntax used in the platform?
Thanks for your help 🙂
13 Nov 2024 12:51 PM
Applying multiple parsing rules on same data does not increase query cost. It only depends in how much data you needs to access.
Of course heavy parsing parsing will affect query performance, but use of regex will not improve anything here.