Remove Null Fields from Query Results Using DQL

36Krazyfists · ‎06 May 2025

I need to be able to check a field and, if it's null, remove it.

Yes, I can do that for a single field easy by just saying something like:

fetch logs
| filter isNull(someField)
| removeFields someField

But what if I have a bunch of fields and I want to only remove the ones that are null?

The if() function in DQL doesn't allow to use the removeFields command in the "then" param, so that doesn't work.

I ask because I want to clean up fields in OpenPipeline and I have a LOT of columns I want to clean up. If I build a separate processor for each one in OpenPipeline, I'll have to build a couple dozen processors.

There has to be a better way, right?

marco_irmer · ‎06 May 2025

I am not aware of an easy way to implement logic to remove a field conditionally in DQL with existing functionality. Even so, I am curious how your log events are ending up with a lot of null fields.. Are these being parsed on ingest, and if so, is the parsing logic treating the value as optional or mandatory?

36Krazyfists · ‎06 May 2025

Yeah, so Lambda at Edge logs from Cloudwatch have a JSON blob with a LOT of fields in them that we dynamically flatten using DPL to parse the JSON into an array and the fieldsFlatten DQL command to turn the fields in that array into columns. And in that JSON, even if the data is empty, it still shows the keys and null values, which then get flattened out into empty columns.

Those are just a few of them. There are a TON more like them.

No way we could do it manually as there are dozens and dozens of fields, so it'd be the same problem as removing them. We'd have to have a crazy DPL or DQL logic that looks at each field and tries to determine if it's null or not to then decide to create a column out of it.

Either way, I'm just looking for a way to dynamically remove NULL columns to make the storage smaller and the queries more efficient when querying those records.

Would definitely appreciate an easy way to do that, preferably with OpenPipeline.

Like maybe a processor that just says "Remove all columns whose values are NULL or match a certain value". Having to specify each column is just not an easy thing to do for some of these logs when they have dozens and dozens and dozens of possible columns.

marco_irmer · ‎06 May 2025

Looking through the available functionality for Openpipeline I am not seeing anything that would accomplish this. That said, my recommendation is looking into whether the initial parsing with DPL can be adjusted to prevent parsing of those fields in the first place, as opposed to parsing empty fields only to then remove them after the fact.

36Krazyfists · ‎14 May 2025

Yeah, it can't be. Same deal. You'd have to manually specify all the fields and it is extremely cumbersome for some of these Lambda logs as well as for a lot of AWS logs and, unfortunately, the bundled AWS log processors that come with OpenPipeline don't do a very good job of addressing this kind of thing either.

Definitely need a processor that just says "Remove all Null/Empty Fields".

That way, any field with a value of "Null", "NONE" (not case sensitive), or "" would be deleted on ingest, saving the customers money and would make browsing the logs much better.

zaid-bashir · ‎18 Nov 2025

Consider Having a column in dql records which contains null, we can remove null values by using built in function of DQL as shown below.

The Above Image Shows the column with null values. Now Use the below Query to remove null values.

data record(Col1 = "Flutter",Col2 = "Mobile"),
     record(Col1 = "React Native",Col2 = "Mobile"),
     record(Col1 = "XYZ",Col2 = null),
     record(Col1 = "PQRS Native",Col2 = null) 
// Add Below Line To Your Column Having null Values.
| filter isNotNull(Col2)

Now the Result be as shown below.

Thanks

36Krazyfists · ‎21 Nov 2025

Sorry, that isn't what I'm looking for. In your example, you omitted two rows of data because, when you filter for isNotNull(Co2), it only returns rows where Col2 is not Null, meaning we didn't get the data in Col1 for those two rows shown.

What I'm wanting this for is for OpenPipeline processors, so I wouldn't want to omit an entire record just because a few fields are missing.

IzabelaRokita · ‎18 Dec 2025

Hey @36Krazyfists ,
I just wanted to check in and see if you still need help with this. If so, I’d be happy to look into it for you! 😊
Please let me know what works best for you.