11 Jan 2024 02:47 PM - last edited on 15 Jan 2024 08:28 AM by MaciejNeumann
Hi there, I'm having a problem with a simple parser acting on a csv file. my problem appears when there is an empty field and a ",," in the source log.
I've simplified the log below to attempt to easily demonstrate my issue.
2024-01-11T13:23:02.578Z,192.168.0.100,Server1,Test,Test,CatContentConversion
2024-01-11T13:24:06.345Z,192.168.0.103,Server2,Test,Test,Alert
2024-01-11T13:27:04.543Z,192.168.0.103,Server2,,Test,Alert
2024-01-11T13:28:03.345Z,192.168.0.178,Server5,,Tree,Bannana
This is my Parse in DPL architect
JSONTIMESTAMP:timestamp
','
IPV4ADDR:field1
','
LD:field2
','
LD:field3
','
LD:field4
','
LD:field5
EOF
the 3rd and 4th lines in the log, should have empty values in field3, as it is represented by ",," in the source.
However, what i am finding is that this double comma, seems to corrupt in the parse, and it just skips those records all together.
very odd behaviour. if you insert a character between the commas, it then works fine
Does anyone else parse csv log files and have seen this issue before?
Any pointers would be appreciated, thanks in advance
Solved! Go to Solution.
11 Jan 2024 05:00 PM - edited 11 Jan 2024 05:00 PM
12 Jan 2024 03:41 PM
brilliant, thanks... there is a follow-up question, as clearly you know your parsing..... if say field4 potentially contains multiple entries, the are separated by a ";" how do i extract these? given that not all lines will have this?
2024-01-11T13:23:02.578Z,192.168.0.100,Server1,Test,Test;Cat;Dog,CatContentConversion
2024-01-11T13:24:06.345Z,192.168.0.103,Server2,Test,Test,Alert
2024-01-11T13:27:04.543Z,192.168.0.103,Server2,,Test;elephant,Alert
2024-01-11T13:28:03.345Z,192.168.0.178,Server5,,Tree,Bannana
12 Jan 2024 04:00 PM
you can do it with alternative groups
JSONTIMESTAMP:timestamp
','
IPV4ADDR:field1
','
LD*:field2
','
LD*:field3
','
((LD:subfield1 ';' LD:subfield2 ';' LD:subfield3)|(LD:field4))
','
LD:field5
12 Jan 2024 04:13 PM - edited 12 Jan 2024 04:21 PM
thanks, not 100% sure i have understood it. field 4 needs to be split into multiple sub fields. not seeing that in the output. what if i dont know how many sub fields there might be?
(i owe you a beer/coffee/milkshake)
15 Jan 2024 08:55 AM
if you don't know how many sub fields there are, then the best is to match it as array
JSONTIMESTAMP:timestamp
','
IPV4ADDR:field1
','
LD*:field2
','
LD*:field3
',' (Array{LD:i (';' | >>',')}{1,}:field4)?
','
LD:field5