I'm trying to set up user tagging on an application. The tag identifier rule matches the correct element, but the cleanup rule appears to be behaving erratically.
The 'user' matched is "Hello <first name> <last name>."
Thus I added a cleanup rule "Hello (.*)\.", however this still doesn't clean things up, I keep getting the complete string. Looking into the HTML source, I found that there are no real spaces, but nbsp (non-breaking-space), like this:
Hello Firstname Lastname.
However, when changing the regex to accomodate this, I get the 'Anonymous' users again.
The regex looks fine to me and regexpal seems to match it.
Also, I found the following presentation about this:
I have no idea how the user "Maria 'O Donnel" gets turned into 'maria' (all lowercase) here, nor why the whole "'O Donnel" part would fall off.
Any help is greatly appreciated.
Edit: fixed html code shown.
I did some further testing, the 'Anonymous' users were caused by the 'do-not-track' feature I had enabled in private browsing.
Using the following regex sort off works:
This causes the resulting user tag to be captured as:
Thus with a leading space. The same goes for:
So the \s doesn't match the nbsp.
Adding the nbsp to the regex doesn't work either:
So the best I can do now is a regex that includes the nbsp in the group (the first regex in this post).
Edit: removed the brackets around the first whitespace