Why is std::string_view faster than const char*?

发布于 2020-04-09 22:58:19

Or am I measuring something else?

In this code I have a stack of tags (integers). Each tag has a string representation (const char* or std::string_view). In the loop stack values are converted to the corresponding string values. Those values are appended to a preallocated string or assigned to an array element.

The results show that the version with std::string_view is slightly faster than the version with const char*.

Code:

#include <array>
#include <iostream>
#include <chrono>
#include <stack>
#include <string_view>

using namespace std;

int main()
{
    enum Tag : int { TAG_A, TAG_B, TAG_C, TAG_D, TAG_E, TAG_F };
    constexpr const char* tag_value[] = 
        { "AAA", "BBB", "CCC", "DDD", "EEE", "FFF" };
    constexpr std::string_view tag_values[] =
        { "AAA", "BBB", "CCC", "DDD", "EEE", "FFF" };

    const size_t iterations = 10000;
    std::stack<Tag> stack_tag;
    std::string out;
    std::chrono::steady_clock::time_point begin;
    std::chrono::steady_clock::time_point end;

    auto prepareForBecnhmark = [&stack_tag, &out](){
        for(size_t i=0; i<iterations; i++)
            stack_tag.push(static_cast<Tag>(i%6));
        out.clear();
        out.reserve(iterations*10);
    };

// Append to string
    prepareForBecnhmark();
    begin = std::chrono::steady_clock::now();
    for(size_t i=0; i<iterations; i++) {
        out.append(tag_value[stack_tag.top()]);
        stack_tag.pop();
    }
    end = std::chrono::steady_clock::now();
    std::cout << out[100] << "append string const char* = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;

    prepareForBecnhmark();
    begin = std::chrono::steady_clock::now();
    for(size_t i=0; i<iterations; i++) {
        out.append(tag_values[stack_tag.top()]);
        stack_tag.pop();
    }
    end = std::chrono::steady_clock::now();
    std::cout << out[100] << "append string string_view= " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;

// Add to array
    prepareForBecnhmark();
    std::array<const char*, iterations> cca;
    begin = std::chrono::steady_clock::now();
    for(size_t i=0; i<iterations; i++) {
        cca[i] = tag_value[stack_tag.top()];
        stack_tag.pop();
    }
    end = std::chrono::steady_clock::now();
    std::cout << "fill array const char* = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;

    prepareForBecnhmark();
    std::array<std::string_view, iterations> ccsv;
    begin = std::chrono::steady_clock::now();
    for(size_t i=0; i<iterations; i++) {
        ccsv[i] = tag_values[stack_tag.top()];
        stack_tag.pop();
    }
    end = std::chrono::steady_clock::now();
    std::cout << "fill array string_view = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;
    std::cout << ccsv[ccsv.size()-1] << cca[cca.size()-1] << std::endl;

    return 0;
}

Results on my machine are:

Aappend string const char* = 97[µs]
Aappend string string_view= 72[µs]
fill array const char* = 35[µs]
fill array string_view = 18[µs]

Godbolt compiler explorer url: https://godbolt.org/z/SMrevx

UPD: Results after more accurate benchmarking (500 runs 300000 iterations):

Caverage append string const char* = 2636[µs]
Caverage append string string_view= 2096[µs]
average fill array const char* = 526[µs]
average fill array string_view = 568[µs]

Godbolt url: https://godbolt.org/z/aU7zL_

So in the second case const char* is faster as expected. And the first case was explained in the answers.

Questioner

uni

Viewed

Chinese

Original

uni 2020-02-01 18:19:06

What about becnhmark with array? Isn't it just a pointer copied to an array. Or does string literal it references is copied too?

Peter Cordes 2020-02-01 18:32:01

@uni: do those numbers change if you benchmark the other one first? Your total benchmark is over so quickly that the CPU might just be ramping up to max turbo around then. Or the first array pays more in page-fault cost than the 2nd array. TL:DR: that part of your results is probably down to naive microbenchmarking methodology.

Peter Cordes 2020-02-01 18:46:43

@uni: or maybe it's real; I tried reversing them and running on Godbolt still shows fill array string_view = 19[µs], vs. 61[µs] for const char*. godbolt.org/z/MyUxqE. The loops loop basically equivalent, assuming they never fall through to the part that calls operator delete. (Of course the string-view objects are 16 bytes wide and get copies with movdqa / movaps). IDK, would have to try it locally with perf counters, or single step to see if delete calls happen. Increasing iteration count reduces the difference ratio some: godbolt.org/z/jvM8Cr

uni 2020-02-01 19:16:10

@PeterCordes I followed your suggestion and increased iterations count and times benchmark runs to have time average across 500 runs. Here are results: iterations 10000 50000 100000 200000 300000 string_view 17 87 183 368 588 const char* 17 88 177 353 526 delta 0 -1 6 15 62 In this case the difference is minuscule and const char* is now faster

Peter Cordes 2020-02-01 19:20:31

@uni: That makes more sense. Probably const char* is faster for large iteration counts because it's smaller, and those large sizes mean bigger arrays that start to get L1 or even L2 cache misses. (Modern x86-64 can copy an 8-byte object just as fast as a 16-byte object, especially when they're aligned.) Still not sure what was slowing down the small size without a repeat loop.

Why is std::string_view faster than const char*?

Related issues