In trying to elaborate an answer to this question, I am now trying to come to terms with the behavior/meaning of Zero-Length regular expressions.
I often use www.regexr.com as a playground to test/debug/understand what's going on in regular expressions.
So we have this most banal scenario:
The regex is a*
The input string is dgwawa
(As a matter of fact, the string here is irrelevant)
Why this behavior of reporting that this regex will match infinitely, since it matches zero occurrences of the preceding character ?
Why can't the result be 6 matches, one for each character position (since at every character, regardless of whether it is an a or not, there is a match, since zero matches is a match)?
How does it get into matching infinitely ? So it does not check/progress a character at a time?
I wonder how/where does it get itself into an infinite loop.
You selected JavaScript regex flavor at regexr.com online regex tester. JavaScript regex engine does not move the index automatically when a pattern that can match an empty string is passed.
That is why when you need to emulate the behavior observed in .NET Regex.Matches
, PHP preg_match_all
, Python re.finditer
, etc. you need to manually advance the index to test each position.
See regex101.com test:
var re = /a*/g;
var str = 'dgwawa';
var m;
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) { // <- this part
re.lastIndex++; // <- here
} // <- is important
document.body.innerHTML += "'" + m[0] + "'<br/>";
}
If you remove that if
block, you will get an infinite loop.
There are two very important things to mention with this regard:
Nice. I was told before regexr follows a specific regex flavor. I should take this into account more seriously.
See Online sandboxes (for testing and publishing regexes online) section to select the one that you need.
Thanks for the complete answer ans insights! Excellent job.
Just for the record, there is only one sandbox for .NET regex testing, and, compared to regexr, it is incomparably worse. So I rather stick using regexr with the highest awareness in mind :)
For testing .NET apps, you can use regexhero.net and regexstorm.net. And if you need a functionality like regex101.com for .NET, use a very nice free app called Expresso.