Warm tip: This article is reproduced from stackoverflow.com, please click
perl regex

perl regex A(*ACCEPT)??B

发布于 2020-04-11 22:04:57

I want to match "AB",if behind "A" is not B,only match "A"

I used perl regex: A(*ACCEPT)??B

Strings "AB" is good match,but "AC" it not return "A".Why?

I know alternative,but I want to understand (*ACCEPT) with quantifier.

Is it something I understand wrong? Thanks for your help!

Questioner
Steven
Viewed
56
Wiktor Stribiżew 2020-02-04 19:26

You pointed to the docs that say:

(*ACCEPT) is the only backtracking verb that is allowed to be quantified because an ungreedy quantification with a minimum of zero acts only when a backtrack happens. Consider, for example,

(A(*ACCEPT)??B)C

where A, B, and C may be complex expressions. After matching "A", the matcher processes "BC"; if that fails, causing a backtrack, (*ACCEPT) is triggered and the match succeeds. In both cases, all but C is captured. Whereas (*COMMIT) (see below) means "fail on backtrack", a repeated (*ACCEPT) of this type means "succeed on backtrack".

However, (*ACCEPT) doesn't seem to relate to backtracking, and you see it here in your example. So, AC can't be matched with A(*ACCEPT)??B because:

  • A in the pattern matches A in the string,
  • (*ACCEPT)?? is skipped first because it is lazily quantified
  • B can't match C in the string, and fail occurs.
  • You expected backtracking to occur, but (*ACCEPT)?? does not trigger backtracking.

    A more helpful (*ACCPET) usage example:

The only use case for (*ACCEPT) that I'm aware of is when the branches of an alternation are distributed into a later expression that is not required for all of the branches. For instance, suppose you want to match any of these patterns: BAZ, BIZ, BO.

You could simply write BAZ|BIZ|BO, but if B and Z stand for complicated sub-patterns, you'll probably look for ways to factor the B and Z patterns. A first pass might give you B(?:AZ|IZ|O), but that solution doesn't factor the Z. Another option would be B(?:A|I)Z|BO, but it forces you to repeat the B. This pattern allows you to factor both the B and the Z:

B(?:A|I|O(*ACCEPT))Z

If he engine follows the O branch, it never matches BOZ because it returns BO as soon as (*ACCEPT) is encountered—which is what we wanted.