cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

simple csv event log parse

S_Hadley1138
Contributor

Hi there, I'm having a problem with a simple parser acting on a csv file. my problem appears when there is an empty field and a ",," in the source log.
I've simplified the log below to attempt to easily demonstrate my issue.

2024-01-11T13:23:02.578Z,192.168.0.100,Server1,Test,Test,CatContentConversion
2024-01-11T13:24:06.345Z,192.168.0.103,Server2,Test,Test,Alert
2024-01-11T13:27:04.543Z,192.168.0.103,Server2,,Test,Alert
2024-01-11T13:28:03.345Z,192.168.0.178,Server5,,Tree,Bannana

This is my Parse in DPL architect

JSONTIMESTAMP:timestamp
','
IPV4ADDR:field1
','
LD:field2
','
LD:field3
','
LD:field4
','
LD:field5
EOF

the 3rd and 4th lines in the log, should have empty values in field3, as it is represented by ",," in the source.

However, what i am finding is that this double comma, seems to corrupt in the parse, and it just skips those records all together.

very odd behaviour. if you insert a character between the commas, it then works fine

Does anyone else parse csv log files and have seen this issue before?

Any pointers would be appreciated, thanks in advance

5 REPLIES 5

sinisa_zubic
Dynatrace Champion
Dynatrace Champion

Hi @S_Hadley1138 

 

you can add an asterix to the fields which might not be populated

sinisa_zubic_0-1704992385458.png 

sinisa_zubic_1-1704992395947.png

 

Best,
Sini

brilliant, thanks...  there is a follow-up question, as clearly you know your parsing.....    if say field4 potentially contains multiple entries, the are separated by a ";"  how do i extract these?  given that not all lines will have this?

2024-01-11T13:23:02.578Z,192.168.0.100,Server1,Test,Test;Cat;Dog,CatContentConversion
2024-01-11T13:24:06.345Z,192.168.0.103,Server2,Test,Test,Alert
2024-01-11T13:27:04.543Z,192.168.0.103,Server2,,Test;elephant,Alert
2024-01-11T13:28:03.345Z,192.168.0.178,Server5,,Tree,Bannana

you can do it with alternative groups

JSONTIMESTAMP:timestamp
','
IPV4ADDR:field1
','
LD*:field2
','
LD*:field3
','
((LD:subfield1 ';' LD:subfield2 ';' LD:subfield3)|(LD:field4))
','
LD:field5

sinisa_zubic_0-1705075171605.png

 

 

 

S_Hadley1138
Contributor

thanks, not 100% sure i have understood it.  field 4 needs to be split into multiple sub fields.  not seeing that in the output.  what if i dont know how many sub fields there might be?

(i owe you a beer/coffee/milkshake)

if you don't know how many sub fields there are, then the best is to match it as array

JSONTIMESTAMP:timestamp
','
IPV4ADDR:field1
','
LD*:field2
','
LD*:field3
',' (Array{LD:i (';' | >>',')}{1,}:field4)?
','
LD:field5

sinisa_zubic_0-1705308914558.png

 

Featured Posts