Warm tip: This article is reproduced from stackoverflow.com, please click
multithreading python python-multiprocessing python-multithreading ping

pinging ~ 100,000 servers, is multithreading or multiprocessing better?

发布于 2020-04-05 23:39:28

I have created a simple script that iterates through a list of servers that I need to both ping, and nslookup. The issue is, pinging can take some time, especially pinging more server than that are seconds in a day.

Im fairly new to programming and I understand that multiprocessing or multithreading could be a solution to make my job run faster.

My plan is to take my server list and either 1. Break it into lists of even size, with the number of lists matching the threads / processes or 2. If one of these options support it, loop through the single list passing each a new server name to a thread or process after it finishes its previous ping and nslookup. This is preferable since it ensures I spend the least time, where as if list 1 has 200 offline servers and list 6 has 2000, it will need to wait for the process using list 6 to finish, even though all others would be free at that point.

  1. Which one is superior for this task and why?

  2. If possible, how would I make sure that each thread or process has essentially the same runtime

code snippet even though rather simple right now

import subprocess
import time
server_file = open(r"myfilepath", "r")
initial_time = time.time()
for i in range(1000):
    print(server_file.readline()[0:-1]+ ' '+str(subprocess.run('ping '+server_file.readline()[0:-1]).returncode)) #This returns a string with the server name, and return code,
print(time.time()-initial_time)

The issue arises because a failed ping takes over 3 seconds each on average. Also I am aware that not putting the print statement will make it faster, but I wanted to monitor it for a small case. I am pinging something to the effect of 100,000 servers, and this will need to be done routinely, and the list will keep growing

Questioner
AlbinoRhino
Viewed
68
23.3k 2020-01-31 05:33

TLDR; MultiThreading is the solution for you- The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory.

As for question 1-

For IO tasks, like querying a database or loading a webpage the CPU is just doing nothing but waiting for an answer and that's a waste of resources, thus multithreading is the answer (:

As for question 2-

You can just create pool of threads that will manage them to run simultaneously without you needing to break your head.