cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to remove fields if they are null?

36Krazyfists
Helper

I need to be able to check a field and, if it's null, remove it.

Yes, I can do that for a single field easy by just saying something like:

 

fetch logs
| filter isNull(someField)
| removeFields someField

 

But what if I have a bunch of fields and I want to only remove the ones that are null?  

 

The if() function in DQL doesn't allow to use the removeFields command in the "then" param, so that doesn't work.

 

I ask because I want to clean up fields in OpenPipeline and I have a LOT of columns I want to clean up.  If I build a separate processor for each one in OpenPipeline, I'll have to build a couple dozen processors.

  

There has to be a better way, right?

4 REPLIES 4

marco_irmer
Champion

I am not aware of an easy way to implement logic to remove a field conditionally in DQL with existing functionality. Even so, I am curious how your log events are ending up with a lot of null fields.. Are these being parsed on ingest, and if so, is the parsing logic treating the value as optional or mandatory?

Yeah, so Lambda at Edge logs from Cloudwatch have a JSON blob with a LOT of fields in them that we dynamically flatten using DPL to parse the JSON into an array and the fieldsFlatten DQL command to turn the fields in that array into columns.  And in that JSON, even if the data is empty, it still shows the keys and null values, which then get flattened out into empty columns.

36Krazyfists_0-1746553866283.png

Those are just a few of them.  There are a TON more like them.

No way we could do it manually as there are dozens and dozens of fields, so it'd be the same problem as removing them.  We'd have to have a crazy DPL or DQL logic that looks at each field and tries to determine if it's null or not to then decide to create a column out of it.

 

Either way, I'm just looking for a way to dynamically remove NULL columns to make the storage smaller and the queries more efficient when querying those records.

Would definitely appreciate an easy way to do that, preferably with OpenPipeline.

Like maybe a processor that just says "Remove all columns whose values are NULL or match a certain value".  Having to specify each column is just not an easy thing to do for some of these logs when they have dozens and dozens and dozens of possible columns.

Looking through the available functionality for Openpipeline I am not seeing anything that would accomplish this. That said, my recommendation is looking into whether the initial parsing with DPL can be adjusted to prevent parsing of those fields in the first place, as opposed to parsing empty fields only to then remove them after the fact.

Yeah, it can't be.  Same deal.  You'd have to manually specify all the fields and it is extremely cumbersome for some of these Lambda logs as well as for a lot of AWS logs and, unfortunately, the bundled AWS log processors that come with OpenPipeline don't do a very good job of addressing this kind of thing either.

Definitely need a processor that just says "Remove all Null/Empty Fields".

That way, any field with a value of "Null", "NONE" (not case sensitive), or "" would be deleted on ingest, saving the customers money and would make browsing the logs much better.

Featured Posts