Warm tip: This article is reproduced from serverfault.com, please click

GREP Regex not working properly, but my regex is correct

发布于 2020-02-18 22:38:00

Hopefully this is a simple mistake I am making, I am fairly new to regex in general. Basically I am trying to extract the name of a website from a text file.

myfile.txt example:

Hello please enjoy your stay at%sbananas.com%sfor the rest of the day. Bye now!

I am trying to extract only the word bananas from this. My regex is as follows:

/(?<=m%s)(.*?)(?=\.com)/

Using regexr online it works just fine but in GREP code I just can't figure out how to get this to work properly. It doesn't return any results. I have tried several variants of the following:

grep "/(?<=m%s)(.*?)(?=\.com)/" myfile.txt
grep -E "/(?<=m%s)(.*?)(?=\.com)/" myfile.txt
grep '/(?<=m%s)(.*?)(?=\.com)/' myfile.txt
grep "(?<=m%s)(.*?)(?=\.com)" myfile.txt
grep '(?<=m%s)(.*?)(?=\.com)' myfile.txt

Nothing seems to work. I would love if someone could point me in the right direction.

Questioner
Jason Waltz
Viewed
0
steffen 2020-02-19 07:46:49

The problem with regular expressions in grep and other Unix tools is that they usually support one, two or three different kinds of regular expressions. These are:

  • Basic regular expressions (BRE)
  • Extended regular expressions (ERE or EREG)
  • Perl compatible regular expressions (PCRE or PREG)

Your pattern is in PCRE syntax, therefore you need to identify your pattern as one (using -P). Note that I also removed the m between = and % (I don't know what that was supposed to do).

grep -Po "(?<=%s)(.*?)(?=\.com)" myfile.txt

With -o, you say you only want to print the matching part. My grep man page declares PCRE in grep as experimental so there probably might be cases where you'd get a segmentation fault or where the evaluation takes unusually much time.