Warm tip: This article is reproduced from serverfault.com, please click

matching python multiline expression string from a file using grep?

发布于 2020-11-30 12:50:30

please note that this is not a python question. I have multiple directories (around 500 directories, called modules) which include a __manifest__.py file each. this file is considered as a metadata of the module. the file looks like the following:

{
    'name': 'Associations Management',
    'version': '0.1',
    'category': 'Marketing',
    'depends': [
        'base_setup', 
        'membership',
        'event'
    ],
    'data': ['views/views.xml'],
    'demo': [],
    'installable': True,
    'auto_install': False,
}

I'd like to match & extract (using Linux shell only) a pattern which could be as following:

'depends': ['base', 'web],
// or multi-line as
"depends": [
    'base',
    'web',
]

I am really interested in extracting such information using Linux commands such as grep or sed or awk & I'm not interested in evaluating each file using python interpreter. so I used the following Linux command

find . -iname __manifest__.py | xargs -I{} grep -H -E "('|\")depends('|\")(.?|\n)*\]\s*," {}

however my regex doesn't provide me with multi-line selection. also I am worried about matching more lines that are not needed as following:

'depends': [
        'base_setup', 
        'membership',
        'event'
    ],
    'data': ['views/views.xml'],

thank you

Questioner
mohamed ahmed
Viewed
0
Sundeep 2020-11-30 21:23:39

With GNU grep

$ grep -zoE "'depends'"':\s*\[[^][]+]' ip.txt | tr '\0' '\n'
'depends': [
        'base_setup', 
        'membership',
        'event'
    ]
  • -z option will cause grep to use ASCII NUL character as separator. So, assuming your input file doesn't have this character, effectively this means the input is read as a single string
  • -o to get only matching portion
  • "'depends'"':\s*\[[^][]+]' will match 'depends': followed by optional whitespaces followed by [ character followed by one or more non [] characters followed by ]
    • this means any nested [] sequences won't be suited for this solution
  • tr '\0' '\n' to convert NUL character to newline, as -z will also mean NUL as separator in the output

With ripgrep:

$ rg -oUN "'depends'"':\s*\[[^\]\[]+]' ip.txt
'depends': [
        'base_setup', 
        'membership',
        'event'
    ]

The advantage is that this doesn't depend upon NUL character and doesn't have to read entire input in one go. -U is multiline matching option and -N turns off line number prefix (which is on by default for terminal output). Also, both GNU grep and rg support recursive searching.


If your data to be matched is always whole lines, with 'depends': [ in a single line, you could also use awk. See How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)? for explanations.

$ awk '/\047depends\047:[[:blank:]]*\[/{f=1} f; /]/{f=0}' ip.txt
    'depends': [
        'base_setup', 
        'membership',
        'event'
    ],