温馨提示:本文翻译自stackoverflow.com，查看原文请点击：python - subprocess

multiprocessing multithreading python

python - 子过程

发布于 2020-04-05 23:51:48

运行

def work(repo,cpuid):
    my_tool_subprocess = subprocess.Popen('./scan.py {} {}'.format(repo,cpuid),shell=True, stdout=subprocess.PIPE)
    line = True
    while line:
        myline = my_tool_subprocess.stdout.readline()
        if "scan is done" in myline:
            break

num = 10  # set to the number of workers you want (it defaults to the cpu count of your machine)
tp = ThreadPool(num)
cpuid=1
for repo in repos:
    tp.apply_async(work, (repo[0],"1-240"))
    print('Runing {} at core {}'.format(repo[0],"1-240"))
tp.close()
tp.join()

scan.py

 completed = subprocess.run(['git', 'clone', repo],env=my_env)
 bunch of other subprocess.run()

 # at the end:

 print('returncode:', completed.returncode)
 print('scan is done')

我原本希望活动进程的数量为10（10个线程），但是不知何故……不是。似乎它不等到scan.py中的最后一条语句“扫描完成”，而是遍历存储库列表（for循环），从存储库列表中克隆所有存储库。重复一遍，它不会等待克隆和处理第1至第10个存储库（保持10个进程的移动窗口），而是继续...创建其他进程和存储库克隆。

有人知道这里有什么问题吗？

提问者

dev

被浏览

72

查看英文版

查看原文

Iguananaut 2020-02-01 00:05

尝试像这样重构代码：

在中scan.py，将所有模块级别的代码移到一个函数中，例如：

def run(repo, cpuid):
    # do whatever scan.py does given a repo path and cpuid
    # instead of printing to stdout, have this return a value

如果您还仍然希望scan.py拥有命令行界面，请添加：

import argparse

def main(argv=None):
    parser = argparse.ArgumentParser()
    # ... implement command-line argument parsing here
    args = parser.parse_args(argv)
    value = run(args.repo, args.cpuid)
    print(value)

if __name__ == '__main__':
    main()

现在，您可以run.py执行以下操作：

import multiprocessing
import scan  # maybe give this a more specialized name

def work(args):
    repo, cpuid = args
    output = scan.run(repo, cpuid)
    for line in output.splitlines():
         # Do whatever you want to do here...

def main():
    repos = ... # You didn't show us where this comes from
    pool = multiprocessing.Pool()  # Or pass however many processes
    pool.map(work, [(r[0], '1-240') for r in repos])

if __name__ == '__main__':
    main()

这样的事情。我在这里要说明的是，如果您明智地考虑代码因素，它将使多处理变得更加简单。但是，这里的某些细节有些过时。

dev 2020-02-01 00:13:57

谢谢。会尽快尝试。在run（）中，我可以顺序调用subprocess.run（），对吗？

Iguananaut 2020-02-01 00:17:54

是的，如果您正在运行多个命令并收集它们的标准输出，则可能需要将每个标准输出添加到返回的列表中，或者可以将其添加到生成器函数中，并yield在运行时输出每一行，然后像调用它for line in run(...): print(line)。

相关问题

1

如何使用python cut方法创建bin，接受一个参数并返回适当的bin？

2

从具有特定条件的列表列表创建字典

3

根据行值选择列，Python，Pandas

4

在数据框中绘制零和一的计数

5

python函数。

6

在两个DataFrame之间执行大量Pandas查找的最佳方法

7

如何获取Pandas数据透视表中的列数和每列的宽度？

8

在Pandas数据框中分组时缺少所需值时显示一列

9

Python隐藏壁虱但显示壁虱标签

10

获取Entry和checkbutton值Tkinter时出现问题

热门github

1

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface. (翻译：SkyPilot 是一个框架，可通过统一界面在任何云上轻松运行机器学习工作负载。)

2

1 min voice data can also be used to train a good TTS model! (few shot voice cloning) (翻译：1分钟的语音数据也可以用来训练一个好的TTS模型！（几张声音克隆镜头）)

3

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

4

科技爱好者周刊，每周五发布

5

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

6

Build Real-Time Knowledge Graphs for AI Agents

7

AI Notepad for back-to-back meetings. Local-first & Extensible.

8

21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/ (翻译：12 节课程，开始使用生成式 AI 进行构建)

9

A one-of-a-kind resume builder that keeps your privacy in mind. Completely secure, customizable, portable, open-source and free forever. Try it out today! (翻译：Reactive Resume 是一款免费开源的简历生成器，支持定制和移植、安全、开源且永久免费。赶紧试试吧！)

10

Collection of leaked system prompts

11

A collection of inspiring lists, manuals, cheatsheets, blogs, hacks, one-liners, cli/web tools and more. (翻译：这个存储库是我每天在工作中使用的各种材料和工具的集合。)

12

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions (翻译：包含Linux、Jenkins、AWS、SRE、Prometheus、Docker、Python、Ansible、Git、Kubernetes、Terraform、OpenStack、SQL、NoSQL、Azure、GCP、DNS、弹性、网络、虚拟化等DevOps 面试问题)

13

Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more (翻译：容器映像、文件系统和 Git 存储库中的漏洞以及配置问题和硬编码机密的扫描程序)

14

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

15

AI-powered multi-agent builder