Hopefully this is a simple mistake I am making, I am fairly new to regex in general. Basically I am trying to extract the name of a website from a text file.
myfile.txt example:
Hello please enjoy your stay at%sbananas.com%sfor the rest of the day. Bye now!
I am trying to extract only the word bananas from this. My regex is as follows:
/(?<=m%s)(.*?)(?=\.com)/
Using regexr online it works just fine but in GREP code I just can't figure out how to get this to work properly. It doesn't return any results. I have tried several variants of the following:
grep "/(?<=m%s)(.*?)(?=\.com)/" myfile.txt
grep -E "/(?<=m%s)(.*?)(?=\.com)/" myfile.txt
grep '/(?<=m%s)(.*?)(?=\.com)/' myfile.txt
grep "(?<=m%s)(.*?)(?=\.com)" myfile.txt
grep '(?<=m%s)(.*?)(?=\.com)' myfile.txt
Nothing seems to work. I would love if someone could point me in the right direction.
The problem with regular expressions in grep
and other Unix tools is that they usually support one, two or three different kinds of regular expressions. These are:
Your pattern is in PCRE syntax, therefore you need to identify your pattern as one (using -P
). Note that I also removed the m
between =
and %
(I don't know what that was supposed to do).
grep -Po "(?<=%s)(.*?)(?=\.com)" myfile.txt
With -o
, you say you only want to print the matching part. My grep
man page declares PCRE in grep
as experimental so there probably might be cases where you'd get a segmentation fault or where the evaluation takes unusually much time.
And the difference of the deleted
m
in the lookbehind condition .... ?(?<=m%s)
!=(?<=%s)
I actually don't know what that
m
was supposed to do.But you deleted it. Note that I totally agree wiht deleting it, because I do not see how a regex including it could ever match. But the fact that it is there is enough explanation for it not working and makes the whole question mostly off-topic as "not reproduceable/typo". Just mention it as necessarily deleted, even if it is only a typo.
Note, there is an alternative, using
\K
. I thought that would allow to make a regex without PCRE support, but it seems I was wrong. It does work (tested on regex101.com), but is also PCRE, according to riptutorial.com/regex/topic/1338/match-reset---k I am not going to make a separate answer for this. Feel free to add this info to your answer, to make it more complete.@Yunnosch Yes, I know (see stackoverflow.com/a/52796549/845034). But I don't get the point here. There are always more solutions to problems. I guess, the OP just wanted to know, why is regex did not work.