Warm tip: This article is reproduced from stackoverflow.com, please click
python-3.x

Remove text between [quote= and [/quote] in Python

发布于 2020-03-27 10:30:02

I am reading a csv file for applying NLP and I am trying to pre-process the data. I have received data from an online forum, therefore, there are quotes on it. How can remove them? As an example;

a='[b]Re:[/b] 
[quote="xxx"] How can I do that blah blah xxx [/quote]
 Hello xxx, I will tell you how you can do it blah blah blah.'

I want the form like below;

a='Hello xxx, I will tell you how you can do it blah blah blah.'

I wanna regex that detects [quote=" and started to delete until it sees [/quote]. Is this possible?

I have tried this, but it did not work.

  def quotes(text):
   return re.sub('\[([^\]=]+)(?:=[^\]]+)?\].*?\[\/\\1\]', '', text)

  data['message'] = data['message'].apply(quotes)
Questioner
nurlubanu
Viewed
13
nurlubanu 2019-07-04 23:43

The answer is too simple actually,

def quotes(text):
 return re.sub(r'\[quote.+quote\]','',text)
data['message'] = data['message'].apply(quotes)

Just that.