DQL
Questions about Dynatrace Query Language
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Masking sensitive data | OTEL

RPbiaggio
Helper

Guys, does anyone know of a way to mask sensitive data within the log content?

The scenario is this: we have log ingestion, but the environment doesn't use a oneagent; it's all opentelemetry. There are several log format standards, so creating a rule for each one would be impractical. I'd like to understand if I can, for example, identify in the processing anything containing the key 'cpf' or 'cnpj', and if so, remove and/or mask it. All of this is within the log content.

I used the DPL below, but it only worked for one of the patterns. There are several others. So if there's a way to do something more generic, that would be great.

Does anyone know of or have done something similar?

USING(INOUT content)
//| FIELDS_ADD(content: REPLACE_PATTERN(content, "'numberDocument' LD LD LD LD DATA:numberDocument'\\\"' ", "cpf-masked"))

 

5 REPLIES 5

luis_alcantara
Dynatrace Advocate
Dynatrace Advocate

Hi @RPbiaggio,

I reviewed your question and I believe the following DPL pattern might work for you:

DATA <<('cpf-'|'cnpj-') ALNUM:numberDocumentToMask

This pattern captures the alphanumeric values that come after "cpf-" or "cnpj-", which you can then mask with just "masked". As a result, you would get "cpf-masked" or "cnpj-masked". Please note that this only works if the value you want to mask starts with "cpf-" or "cnpj-".

I’m also attaching some images that show the pattern applied to a sample text for better clarity.

I’m not sure if this fully answers your question, but if you’d like to explore it further, feel free to reply to this message and we can look into it together.

Thanks!

Images:

luis_alcantara_0-1765223815012.png

luis_alcantara_1-1765223876040.png

 

@luis_alcantara 

Thanks for sharing that approach!
Your pattern should work when the sensitive value consistently starts with cpf- or cnpj-. However, if the log formats vary and these keys appear in different contexts (like "cpf": "12345678901" or "cnpj": "12345678000199"), then we  might need a more generic regex-based solution, like below (this is just an idea)
USING(INOUT content)
FIELDS_ADD(
content: REPLACE_PATTERN(
content,
"(cpf|cnpj)[^0-9]*[0-9]{11,14}",
"masked"
)
)

Dynatrace Professional Certified

@sujit_k_singh 

Thank you very much for your contribution, and you’re absolutely right.

I just realized that the current approach is a bit limited because it only works with numbers and a specific number of characters. As a complement to your idea, I believe a more flexible version could be the following, where it allows alphanumeric characters of any length and still works even if there is a "-" or ":" after "cpf" or "cnpj":

USING(INOUT content)
//| FIELDS_ADD(content: REPLACE_PATTERN(content, "DATA <<('cnpj'|'cpf')[^0-9A-Za-z]*[0-9A-Za-z]+:dataToMask", "masked"))

If you’d like me to explain this in more detail, I’d be happy to do so.

@luis_alcantara 

This approach definitely broadens the scope of masking and makes it more flexible. Thanks a lot for extending the solution and adding more coverage to it!

Dynatrace Professional Certified

Guys, thanks for the help.

The rule that worked for us was the following:

USING(INOUT content)
| FIELDS_ADD(content: REPLACE_PATTERN(content, "(DATA 'cpf' DATA):p1 LONG:cpf", "${p1}\"sensitive-data-masked${p2}\""))

 

In this case, I need to insert additional lines for each parameter, such as number, CPF, CNPJ, client_number, etc...

I'm asking the proprietary teams to adjust this at the source; it's easier that way due to the number of different possibilities and parameters.

Thank you very much for your help.

Featured Posts