cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

DQL alternative to dedup

CecilieFosbo
Visitor

I hope someone can help with this. 

 

We have a field sak, which represent the caseId, which is uniqe. We have many traces/records with the same case ID representing each step the uniqe case is beeing processed in a business flow. 

In this case, we want to create a dashboard with a table that presents the total number of cases started, and how many of these were started manual or digital. 

We are first doing a dedup of the caseId, so we ensure that we do not count the same caseId twice. However, we are using a lot of fields from the record to filter/determine whether a case has been started manual and digital (channel, st_felt, etc.), and not all the records will hold these values. This means, that when we do the dedup, it randomly chooses the record which may or may not contain the value of the field we are using to filter on manual/digital started cases. This means, if you choose a timeline back in time, the numbers will change everytime you click refresh depending on which records ir chooses.

Another issue is when verifying the data with a distictCount of the caseID, these numbers also change when refreshing. This should not be possible, since the number of uniqe caseId's should be consistent nomatter which record of the caseId it chooses. 

Does anyone have a suggestion on how to solve this with another approach, maybe without dedup?


Here is the query: 

 

fetch bizevents

| filter event.category == "PMKREDITT"

// WS_TO determines the step the case is currently in, in this case the first step

| filter  WS_TO == "001" or WS_FROM == "24"
| filter pi_tilstand  >=  "2"
| dedup sak

 

| fieldsAdd date = formatTimestamp(timestamp, format:"MM-dd-yyyy"),
            dayOfWeek = formatTimestamp(timestamp, format: "E")

| summarize  {countDistinct(sak),StartedAvKundeDigitalCount = countIf(channel == "NETTBANK" and st_felt == "MODUS" and st_verdi == "REFIN_INCRLOAN") , StartedAvRadgiverDigitalCount = countIf(channel =="KONTORNETT" and CONTEXT == "CUSTOMER"), StartedAvRadgiverManualCount = countIf(channel =="KONTORNETT" and (CONTEXT == "MANUAL") or matchesPhrase(sak_sist_endret_av, "AB*") or matchesPhrase(sak_sist_endret_av, "AD*")) }, by: {date, dayOfWeek}

 

| fieldsAdd `Total cases` = StartedAvKundeDigitalCount + StartedAvRadgiverDigitalCount + StartedAvRadgiverManualCount
| fieldsAdd `Antall saker med digital inngang` = StartedAvKundeDigitalCount + StartedAvRadgiverDigitalCount 
| fieldsAdd `Snitt andel saker med digital inngang` = toDouble(`Antall saker med digital inngang`) / `Total cases` * 100

 

| sort date asc
| fieldsRename `Date`=date, `Weekday`=dayOfWeek


 

3 REPLIES 3

Hi Cecilie, I hope I got your question right. For each record, you could add two calculated fields:

- is_digital: 1 if the record/trace contains one of the fields that means that the process is digital. 0 otherwise

- is_manual: 1 if contains one of the fields that means it is manual. 0 otherwise

Then you summarize:

| summarize {sum(is_digital),sum(is_manual)}, by:{sak}

A sak is digital when is_digital > 0, manual if is_manual > 0.

Of course I assume it can't be both 🙂

Hope this helps

Paolo

CecilieFosbo
Visitor

Hi Paolo!

Thanks for your suggested approach!
When I do this, I will still be counting "duplicates". If a caseID has several records, then when summarizing, I end up with either is_manual or is_digital beeing more than 1. 

Mmm, then it means that a case can have records for which it is digital but also other records for which it is manual? I probably didn't catch it 100% right. 

Happy to help further if I can

Featured Posts