01 Dec 2025 11:01 PM
Guys, does anyone know of a way to mask sensitive data within the log content?
The scenario is this: we have log ingestion, but the environment doesn't use a oneagent; it's all opentelemetry. There are several log format standards, so creating a rule for each one would be impractical. I'd like to understand if I can, for example, identify in the processing anything containing the key 'cpf' or 'cnpj', and if so, remove and/or mask it. All of this is within the log content.
I used the DPL below, but it only worked for one of the patterns. There are several others. So if there's a way to do something more generic, that would be great.
Does anyone know of or have done something similar?
USING(INOUT content)
//| FIELDS_ADD(content: REPLACE_PATTERN(content, "'numberDocument' LD LD LD LD DATA:numberDocument'\\\"' ", "cpf-masked"))
08 Dec 2025 07:58 PM - edited 08 Dec 2025 07:58 PM
Hi @RPbiaggio,
I reviewed your question and I believe the following DPL pattern might work for you:
DATA <<('cpf-'|'cnpj-') ALNUM:numberDocumentToMaskThis pattern captures the alphanumeric values that come after "cpf-" or "cnpj-", which you can then mask with just "masked". As a result, you would get "cpf-masked" or "cnpj-masked". Please note that this only works if the value you want to mask starts with "cpf-" or "cnpj-".
I’m also attaching some images that show the pattern applied to a sample text for better clarity.
I’m not sure if this fully answers your question, but if you’d like to explore it further, feel free to reply to this message and we can look into it together.
Thanks!
Images:
09 Dec 2025 12:06 AM
Thanks for sharing that approach!
Your pattern should work when the sensitive value consistently starts with cpf- or cnpj-. However, if the log formats vary and these keys appear in different contexts (like "cpf": "12345678901" or "cnpj": "12345678000199"), then we might need a more generic regex-based solution, like below (this is just an idea)
USING(INOUT content)
FIELDS_ADD(
content: REPLACE_PATTERN(
content,
"(cpf|cnpj)[^0-9]*[0-9]{11,14}",
"masked"
)
)
09 Dec 2025 12:35 AM - edited 09 Dec 2025 12:38 AM
Thank you very much for your contribution, and you’re absolutely right.
I just realized that the current approach is a bit limited because it only works with numbers and a specific number of characters. As a complement to your idea, I believe a more flexible version could be the following, where it allows alphanumeric characters of any length and still works even if there is a "-" or ":" after "cpf" or "cnpj":
USING(INOUT content)
//| FIELDS_ADD(content: REPLACE_PATTERN(content, "DATA <<('cnpj'|'cpf')[^0-9A-Za-z]*[0-9A-Za-z]+:dataToMask", "masked"))If you’d like me to explain this in more detail, I’d be happy to do so.
09 Dec 2025 03:24 AM
This approach definitely broadens the scope of masking and makes it more flexible. Thanks a lot for extending the solution and adding more coverage to it!
09 Dec 2025 11:50 AM
Guys, thanks for the help.
The rule that worked for us was the following:
USING(INOUT content)
| FIELDS_ADD(content: REPLACE_PATTERN(content, "(DATA 'cpf' DATA):p1 LONG:cpf", "${p1}\"sensitive-data-masked${p2}\""))
In this case, I need to insert additional lines for each parameter, such as number, CPF, CNPJ, client_number, etc...
I'm asking the proprietary teams to adjust this at the source; it's easier that way due to the number of different possibilities and parameters.
Thank you very much for your help.
Featured Posts