Warm tip: This article is reproduced from serverfault.com, please click

subprocess.Popen(): change stderr during child's execution

发布于 2020-12-04 08:34:51

Goal: I'm trying to put a Python script together that captures the network traffic that occurs as a result of the execution of a block of code. For simplicity, let's assume I want to log the network traffic resulting from a call to socket.gethostbyname('example.com'). Note: I can't just simply terminate tcpdump when gethostbyname() returns as the actual code block that I want to measure triggers other external code, and I have no way to determine when this external code finishes execution (so I have to leave tcpdump running "long enough" for it to be highly probable that I logged all traffic generated by this external code).

Approach: I'm using subprocess to start tcpdump, telling tcpdump to terminate itself after duration seconds using its -G and -W options, e.g.:

duration = 15
nif = 'en0'
pcap = 'dns.pcap'
cmd = ['tcpdump', '-G', str(duration), '-W', '1', '-i', nif, '-w', pcap]
tcpdump_proc = subprocess.Popen(cmd)
socket.gethostbyname('example.com')
time.sleep(duration + 5) # sleep longer than tcpdump is running

The problem with this is that Popen() returns before tcpdump is fully up and running, thus some/all of the traffic resulting from the call to gethostbyname() will not be captured. I could obviously add a time.sleep(x) before calling gethostbyname() to give tcpdump a bit of time to spin up, but that's not a portable solution (I can't just pick some arbitrary x < duration as a powerful system would start capturing packets earlier than a less powerful system).

To deal with this, my idea is to parse tcpdump's output to look for when the following is written to its stderr as that appears to indicate that the capture is up and running fully:

tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 262144 bytes

Thus I need to attach to stderr, but the problem is that I don't want to commit to reading all of its output as I need my code to move on to actually execute the code block I want to measure (gethostbyname() in this example) instead of being stuck in a loop reading from stderr.

I could solve this by adding a semaphore that blocks the main thread from proceeding onto the gethostbyname() call, and have a background thread read from stderr and decrement the semaphore (to let the main thread move on) when it reads the string above from stderr, but I'd like to keep the code single-threaded if possible.

From my understanding, it's a big NONO to use subprocess.PIPE for stderr and stdout without committing to reading all of the output as the child will end up blocking when the buffer fills up. But can you "detach" (destroy?) the pipe mid execution if you're only interested in reading the first part of the output? Essentially I'd like to end up with something like this:

duration = 15
nif = 'en0'
pcap = 'dns.pcap'
cmd = ['tcpdump', '-G', str(duration), '-W', '1', '-i', nif, '-w', pcap]
tcpdump_proc = subprocess.Popen(cmd, stderr=subprocess.PIPE, text=True)
for l in tcpdump_proc.stderr:
    if 'tcpdump: listening on' in l:
        break
socket.gethostbyname('example.com')
time.sleep(duration) # sleep at least as long as tcpdump is running

What else do I need to add within the if block to "reassign" who's in charge of reading stderr? Can I just set stderr back to None (tcpdump_proc.stderr = None)? Or should I call tcpdump_proc.stderr.close() (and will tcpdump terminate early if I do so)?

It could also very well be that I missed something obvious and that there is a much better approach to achieve what I want - if so, please enlighten me :).

Thanks in advance :)

Questioner
Janus Varmarken
Viewed
0
Maurice Meyer 2020-12-04 21:07:57

You could use detach() or close() on stderr after recieving the listening on message:

import subprocess
import time

duration = 10
nif = 'eth0'
pcap = 'dns.pcap'
cmd = ['tcpdump', '-G', str(duration), '-W', '1', '-i', nif, '-w', pcap]

proc = subprocess.Popen(
    cmd, shell=False, stderr=subprocess.PIPE, bufsize=1, text=True
)
for i, line in enumerate(proc.stderr):
    print('read %d lines from stderr' % i)
    if 'listening on' in line:
        print('detach stderr!')
        proc.stderr.detach()
        break

while proc.poll() is None:
    print("doing something else while tcpdump is runnning!")
    time.sleep(2)

print(proc.returncode)
print(proc.stderr.read())

Out:

read 0 lines from stderr
detach stderr!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
0
Traceback (most recent call last):
  File "x.py", line 24, in <module>
    print(proc.stderr.read())
ValueError: underlying buffer has been detached

Note:

I haven't checked what is really happening to the stderr data, but detaching stderr doesn't seem to have any impact on tcpdump.