Warm tip: This article is reproduced from serverfault.com, please click

OpenMP for loop with specific threads

发布于 2020-11-30 10:38:00

I'm new in parallel programming. I'm trying to make point cloud processing process parallel. I share with my program structure below. Firstly, I separate the point cloud into the partial clouds. My aim is that every thread must call the fillFrustumCloud() function separately.

int num_threads = 12;

std::vector<CloudColored::Ptr> vector_colored_projected_clouds(num_threads);
std::vector<Cloud::Ptr> vector_projected_clouds(num_threads);

omp_set_num_threads(num_threads);

// private( ) shared()
#pragma omp parallel  shared(vector_colored_projected_clouds,vector_projected_clouds)
{
    
    for(int i=0; i<num_threads; i++)
    {

        #pragma omp critical
        {
            std::cout << "Thread id: " << omp_get_thread_num() << " loop id: " << i <<  std::endl;
        }

        const unsigned int  start_index = cloud_in->size()/num_threads*i;
        const unsigned int  end_index = cloud_in->size()/num_threads*(i+1);

        Cloud::Ptr partial_cloud(new Cloud);

        if(i==num_threads-1)
        {
            partial_cloud->points.assign(cloud_in->points.begin()+start_index, cloud_in->points.end());
        }else{
            partial_cloud->points.assign(cloud_in->points.begin()+start_index, cloud_in->points.begin()+end_index);
        }

            LidcamHelpers::fillFrustumCloud(partial_cloud, mat_point_transformer, img_size, vector_colored_projected_clouds,
                                            vector_projected_clouds, i, interested_detections, id, reshaped_img);
    }
}

but output is:

Thread id: 0 loop id: 0
Thread id: 1 loop id: 0
Thread id: 2 loop id: 0
Thread id: 3 loop id: 0
Thread id: 0 loop id: 1
Thread id: 1 loop id: 1
Thread id: 2 loop id: 1
Thread id: 3 loop id: 1
Thread id: 0 loop id: 2
Thread id: 3 loop id: 2
Thread id: 2 loop id: 2
Thread id: 1 loop id: 2
Thread id: 3 loop id: 3
Thread id: 1 loop id: 3
Thread id: 2 loop id: 3
Thread id: 0 loop id: 3

According to my aim, it should be like this:

Thread id: 0 loop id: 0
Thread id: 1 loop id: 1
Thread id: 2 loop id: 2

Note that: I pass the vector_colored_projected_clouds and vector_projected_clouds into the function by reference in order to store the result. I guess they should be shared variables.

Questioner
goktug_yildirim
Viewed
0
dreamcrash 2020-12-29 03:51:51

This #pragma omp parallel constructor will create a parallel region, with as many threads as you have set it up. Hence, when you do:

#pragma omp parallel
{
    for(int i=0; i<num_threads; i++)
    {
       ... 
    }
}

every thread in the parallel region will execute all the iterations of the loop. That is why you have 16 outputted lines (i.e., 4 threads x 4 loop iterations).

If you want to distribute the iterations of a loop among threads you should use the #pragma omp for instead. So in your code you can either do:

#pragma omp parallel
{
    #pragma omp for
    for(int i=0; i<num_threads; i++)
    {
       ... 
    }
}

or

#pragma omp parallel for
for(int i=0; i<num_threads; i++)
{
   ... 
}

Since, you only want to distribute the iterations of the loop among threads, you can use the latter (i.e., #pragma omp parallel for).

It looks as though you are using

#pragma omp critical
{
    std::cout << "Thread id: " << omp_get_thread_num() << " loop id: " << i <<  std::endl;
}

for debugging purposes. Bear in mind, however, that even with the critical region, the order in which threads will output is non-deterministic. If you rather that threads would output deterministically, use #pragma omp ordered instead of critical. The ordered constructor will enforce that the chunk of code that it wraps around will be executed in the same order that would have been executed if the code was executed sequentially.