Running multiprocessing/Multi threading in pika consumer and directing the data to specific datafram

Roland Smith 2020-02-01 21:57

Before you go down this route, you should evaluate if that is even neccessary. One important limit is the rabbitmq bandwidth.

Build a single-threaded app, and start feeding it synthetic rabbitmq messages. Increase the msg/s rate until it cannot keep up anymore.

If that rate is much higher than is likely to occur in practice, you are done. :-)

If not, then you start profiling your application to find which parts of it take the most time. Those are your bottlenecks. Only when you know what the bottlenecks are, you can look at the relevant code and think about how to improve them.

Note that multiprocessing and threading do different things and have different applications. If your application is limited by the amount of calculations that it can do then multiprocessing can help by spreading out the calculations over multiple CPU cores. Note that this only works well if the calculations are independant of each other. If your application spends a lot of time waiting for I/O, threading can help you with doing calculations in one thread while the other is waiting for I/O.

But neither are free in terms of complexity. For example with threading you have to protect reading and writing of your dataframes with locks so that only one thread at a time can read or modify said dataframe. With multiprocessing you have to send data from the worker processes back to the parent process.

In this case, I think that multiprocessing would be most useful. You could set up a number of processes, each responsible for part of the beds/patients. If rabbitmq can have multiple listeners, you can have each worker process only handle the messages from patients that it is responsible for. Otherwise you have to distribute the messages to the appropriate process. Each worker process now processes the messages (and keeps the dataframes) for a number of patients. When an alert is triggered based on the calcutations done on the data, the worker only has to send a message detailing the identifier of the patient and the nature of the alert to the parent process.

Prem Patrick 2020-02-02 22:42:13

Yes, it works for now since i have limited beds. Only problem is when we scale up say like say 500 beds, then i don't want this to be bottle neck and i certainly don't want to break the code in production, Any way i will perform code profiling and see how things pan out...

Related issues

How to use python cut method to create bins, accept one parameter and return appropriate bin?

Create a dictionary from a list of lists with certain criteria

selecting columns based on row value, Python, Pandas

plotting count of zeros and ones in a dataframe

BeautifulSoup find.all() web scraping returns empty

python function. output a keys list from a dictionary if the key is todays date

Best way to perform multiple amount of Pandas lookups between two DataFrames

How to get the number of columns and the width of each column in a Pandas pivot table?

Display a column when a desired value is missing while grouping in Pandas dataframe

Python hide ticks but show tick labels