cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

This product reached the end of support date on March 31, 2021.

Monitored URL Regex - .* or (value|value)?

benjamin_johnso
Organizer

I have a number of hostnames that service the same URL - It would be easiest for me to wildcard the hostname for each monitored URL, with the explicit URL path following it.

Is wildcarding with .* really going to be much more inefficient in my example below?

URLs:

http://www.host1-online.co.uk/logon

http://online.host2.co.uk/logon

http://online.host3.co.uk/logon

http://online.host4.co.uk/logon

wild card rule:

(http://.*.co.uk/logon)

pattern rule

(http://)(www.host1-online.co.uk|online.host2.co.uk|online.host3.co.uk|online.host4.co.uk)(/logon)

I'd prefer to use (http://.*.co.uk/logon), not least because it looks neater 🙂

3 REPLIES 3

john_leight
Dynatrace Pro
Dynatrace Pro

Using a .* is a pretty greedy regular expression. By using it in one regex is it going to be inefficient enough to really limit your AMD, probably not. If you use that type of regex enough, then it might. I try to be as "kind" to my AMD as possible and be as efficient where possible to get the most mileage out of the AMD. Every little bit helps in my opinion.

When the AMD needs to perform the operation thousands (if not millions) of times I try to make the regex as efficient as possible.

http://[^/]*/login is a pretty efficient way to go for your examples. When testing once, I found using [^/]* vs a .* the test could run twice as many iterations over the same time period.

Thanks for the reccomendation, based on your experience it would seem that [^/]* is the simplest way to achieve this, and the performance gain against .* would be considerable

chris_v
Dynatrace Pro
Dynatrace Pro

Just for FYI, (Janusz noted it in the example but didn't explain),

.

In a reg-ex is always a wild card (a single character wild card, adding a * or + makes it match multiple characters).

I see it all the time, people making URL regex's do not escape the . which can lead to unexpected behaviour.

So for example the reg-ex:

(http://www.website.com).*

(Fairly typical of what I see.)

The '.' between 'www' and 'website' and between 'website' and 'com' are interpreted as a single character wildcard. Which may of course match things you're not expecting.

http://www.website.com/page would match as you'd expect.

but it could also legitimately match

http://www.websitescommunications.com, which would be captured as

http://www.websitescom

Leaving your users scratching their heads.

Escape your .'s with a backslash.

\.

The backslash says the next character isn't special, treat it as a literal to match, a '.' in this case.

Reducing unneeded wild cards also improves performance (as well as ensuring no accidental matches are made).

I really like www.regexr.com for an online reg-ex development/testing utility. It behaves a little differently to how the AMD does (for example you have to escape forward slashes here, but not for the AMD).

e.g. my original example.

e.g. escaping the '.'.