12 Mar 2024 04:34 PM
Hey,
I'm attempting to optimize DQL queries for some SRG objectives as I have hit "execution timeouts" on the validation when triggered from workflows.
Though the query goal is not SRG related itself but could also be used for other purposes.
What I want to do
I want to use entry point spans (traces) and identify e.g. how much time is being spent on specific services down the path. e.g. how much time is spent on calls to a DB, an external service or any other service. For that I created a query that:
a) filters on the entry service by some criteria
b) performs various lookups using the trace.id and filters on other criteria for the called service
c) summarize duration of sub-calls
To do so I've created this DQL:
fetch spans
| filter (istio.canonical_service == "istio-egressgateway"
and startsWith(server.address,"api.p.eco")
and contains(http.url,"products/suggest") and code.namespace == "Envoy")
// get span info for hybris call
| lookup [fetch spans
| filter (dt.host_group.id == "lzapp"
and contains(endpoint.name,"getSearchResults"))
],
sourceField: trace.id, lookupField: trace.id, prefix: "hybris."
// get span info of proxy call
| lookup [fetch spans
| filter (contains(host.name,"nclutz")
and dt.host_group.id == "proxy")
and contains(endpoint.name,"products/suggest")
and contains(server.address,"api.p.eco")
],
sourceField: trace.id, lookupField: trace.id, prefix: "proxy."
// get span info of DB calls
| lookup [fetch spans
| filter contains(code.namespace,"de.hybris.platform.jdbcwrapper.")
and dt.host_group.id == "lzapp"
and contains(code.function,"execute")
| summarize duration=sum(duration), by: {trace.id}
],
sourceField: trace.id, lookupField: trace.id, prefix: "db."
| fields trace.id, duration, hybris.duration, proxy.duration, db.duration
This works (mostly), however it could happen that the query fails because the "lookup. table is too big", my other concern is that it is not performant? (thus the mentioned timeout on the guardian verification?
I think the usecase isn't that special.
How would one optimize this query? I tried with only one lookup statement and multiple filter parameters but that is even worse...
Any ideas?