c++ 如何使用OpenMP并行化最近邻居搜索

Qubit 2020-04-06 21:51

由于我们无法访问实际编译的示例（本来不错），因此我实际上没有尝试编译以下示例。尽管如此，除了一些小的错别字外，总体思路应该很清楚。

我们的任务是找到相似度的最大值和相应的标签，为此，我们确实可以使用reduction，但是由于我们需要找到一个最大值，然后存储相应的标签，因此我们使用一对来存储两个值立即将其实现为reductionOpenMP中的一个。

我已经稍微重写了您的代码，temp使用变量的原始命名（）可能会使阅读起来有些困难。基本上，我们并行执行搜索，因此每个线程都找到一个最佳值，然后要求OpenMP在线程（reduction）之间找到最佳解决方案，然后完成。

//Reduce by finding the maximum and also storing the corresponding label, this is why we use a std::pair. 
void reduce_custom (std::pair<float, unsigned int>& output, std::pair<float, unsigned int>& input) {
    if (input.first > output.first) output = input;
}
//Declare an OpenMP reduction with our pair and our custom reduction function. 
#pragma omp declare reduction(custom_reduction : \
    std::pair<float, unsigned int>: \
    reduce_custom(omp_out, omp_in)) \
    initializer(omp_priv(omp_orig))

bool identify(const float* data, unsigned int length, unsigned int& label, float& similarity) {
    std::pair<float, unsigned int> temp(0.0, label); //Stores thread local similarity and corresponding best label. 

#pragma omp parallel for reduction(custom_reduction:temp)
    for (size_t i = 0; i < collection.size(); ++i) {
        const float* candidateTemplate = collection[i].first.data(); 
        float consinSimilarity = getSimilarity(data, candidateTemplate, length);

        if (consinSimilarity > temp.first) {
            temp.first = consinSimilarity;
            temp.second = collection[i].second;
        }
    }

    if (temp.first > 0.f) {
        similarity = temp.first;
        label = temp.second;
        return true;
    }

    return false;
}

关于SIMD寄存器数量有限的问题，它们的数量取决于您使用的特定CPU。据我所知，每个内核都有一定数量的向量寄存器可用，因此，只要您使用的寄存器不多于现在可用的数量，就可以了，此外，例如AVX512还提供32个向量寄存器和2个向量寄存器。用于每个核的向量运算的算术单位，因此用尽计算资源并非易事，由于内存局部性较差，您更有可能遭受苦难（尤其是在向量被各处保存的情况下）。我当然可能是错的，如果是这样，请随时在评论中纠正我。

cyrusbehr 2020-02-04 02:16:18

感谢您的回答Qubit，这看起来很棒。同样你是正确的，它应该是一个指针const float* candidateTemplate 。我已对问题进行了修改以反映此更改。

cyrusbehr 2020-02-04 02:28:40

也许您可以再回答一个问题。该#pragma omp parallel指令将启动多少个线程？是否取决于系统？

Qubit 2020-02-04 18:58:19

通常，它将启动与该系统上可用逻辑处理器一样多的线程（即，不考虑它们是否被占用而与线程数量一样多）。您会发现，在某些应用程序中，运行更少的线程可能是有好处的（即，超线程并不总是您的朋友）–或者您可能只想将部分资源用于任务。您可以通过添加num_threads(some_value_here)到omp pragma指令来控制线程数。您还应该考虑添加schedule(static)，我认为这对您来说最有意义。

cyrusbehr 2020-02-05 03:38:17

太好了，谢谢您的帮助！

c++ - 如何使用OpenMP并行化最近邻居搜索

相关问题

热门github