Re: New riddle for the enthusiasts, who can give me the answer?

henk_stobbe · ‎27 Oct 2025

Hello I found more, but this is a simple example

Why gives:

fetch logs | sort content | dedup content | summarize count()

A different value then:

fetch logs | dedup content | sort content | summarize count()

KR Henk

@zanting

m3tomlins · ‎27 Oct 2025

FUN!

"The original order of the records is not preserved. Therefore, by default the sequence of records that are chosen during deduplication is random. If you want to pick a particular record out of the duplicates, you can use the sort parameter." (docs)

- sorting first ensures the order of records sequenced into the deduplication algorithm, potentially more predictable
- deduplication first on a randomized order seems less predicatable to me

How does this compare internally to using the sort option included in the dedup function?
| dedup {fieldA, fieldB}, sort {fieldA asc, fieldB desc}

Dynatrace AllStar | Community Champion | @m3tomlins | @performacology | Dynatracer at FreedomPay

p_devulapalli · ‎28 Oct 2025

@henk_stobbe I think the count returned should be same for both the queries , unless they are executed at different intervals . As they are logs which are continuously ingested even less than a second difference can give you a different count

Phani Devulapalli

henk_stobbe · ‎29 Oct 2025

Hello,

Problem (at my end) is that every part of the "pipeline" seems to have its own limits (-; so you can loose data in every step. Not sure how to prevent this when using multiple steps.

So starting with 9999 log lines, after the sort you can end up with 8888 (as an example),

KR Henk

m3tomlins · ‎29 Oct 2025

This is true in multiple languages, not just DQL.

The limits on each step in the query are applied from the configuration in the tile/segment.

This would be different from a "| limit {n}" applied at the end of the processing.

For your riddle, however - I think it's good to remember how the dedup function works in DQL: it is not sorted by default and on a large query, sorting before dedup can improve performance quite a lot.

Dynatrace AllStar | Community Champion | @m3tomlins | @performacology | Dynatracer at FreedomPay