Or am I measuring something else?
In this code I have a stack of tags (integers
). Each tag has a string representation (const char*
or std::string_view
).
In the loop stack values are converted to the corresponding string values. Those values are appended to a preallocated string or assigned to an array element.
The results show that the version with std::string_view
is slightly faster than the version with const char*
.
Code:
#include <array>
#include <iostream>
#include <chrono>
#include <stack>
#include <string_view>
using namespace std;
int main()
{
enum Tag : int { TAG_A, TAG_B, TAG_C, TAG_D, TAG_E, TAG_F };
constexpr const char* tag_value[] =
{ "AAA", "BBB", "CCC", "DDD", "EEE", "FFF" };
constexpr std::string_view tag_values[] =
{ "AAA", "BBB", "CCC", "DDD", "EEE", "FFF" };
const size_t iterations = 10000;
std::stack<Tag> stack_tag;
std::string out;
std::chrono::steady_clock::time_point begin;
std::chrono::steady_clock::time_point end;
auto prepareForBecnhmark = [&stack_tag, &out](){
for(size_t i=0; i<iterations; i++)
stack_tag.push(static_cast<Tag>(i%6));
out.clear();
out.reserve(iterations*10);
};
// Append to string
prepareForBecnhmark();
begin = std::chrono::steady_clock::now();
for(size_t i=0; i<iterations; i++) {
out.append(tag_value[stack_tag.top()]);
stack_tag.pop();
}
end = std::chrono::steady_clock::now();
std::cout << out[100] << "append string const char* = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;
prepareForBecnhmark();
begin = std::chrono::steady_clock::now();
for(size_t i=0; i<iterations; i++) {
out.append(tag_values[stack_tag.top()]);
stack_tag.pop();
}
end = std::chrono::steady_clock::now();
std::cout << out[100] << "append string string_view= " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;
// Add to array
prepareForBecnhmark();
std::array<const char*, iterations> cca;
begin = std::chrono::steady_clock::now();
for(size_t i=0; i<iterations; i++) {
cca[i] = tag_value[stack_tag.top()];
stack_tag.pop();
}
end = std::chrono::steady_clock::now();
std::cout << "fill array const char* = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;
prepareForBecnhmark();
std::array<std::string_view, iterations> ccsv;
begin = std::chrono::steady_clock::now();
for(size_t i=0; i<iterations; i++) {
ccsv[i] = tag_values[stack_tag.top()];
stack_tag.pop();
}
end = std::chrono::steady_clock::now();
std::cout << "fill array string_view = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;
std::cout << ccsv[ccsv.size()-1] << cca[cca.size()-1] << std::endl;
return 0;
}
Results on my machine are:
Aappend string const char* = 97[µs]
Aappend string string_view= 72[µs]
fill array const char* = 35[µs]
fill array string_view = 18[µs]
Godbolt compiler explorer url: https://godbolt.org/z/SMrevx
UPD: Results after more accurate benchmarking (500 runs 300000 iterations):
Caverage append string const char* = 2636[µs]
Caverage append string string_view= 2096[µs]
average fill array const char* = 526[µs]
average fill array string_view = 568[µs]
Godbolt url: https://godbolt.org/z/aU7zL_
So in the second case const char*
is faster as expected. And the first case was explained in the answers.
Simply because with std::string_view
you're passed the length and you don't have to insert a null char whenever you want a new string. char*
has to search for the end everytime and if you want a substring you'll probably have to copy as you'll need a null char at the end of the substring.
What about becnhmark with array? Isn't it just a pointer copied to an array. Or does string literal it references is copied too?
@uni: do those numbers change if you benchmark the other one first? Your total benchmark is over so quickly that the CPU might just be ramping up to max turbo around then. Or the first array pays more in page-fault cost than the 2nd array. TL:DR: that part of your results is probably down to naive microbenchmarking methodology.
@uni: or maybe it's real; I tried reversing them and running on Godbolt still shows fill array string_view = 19[µs], vs. 61[µs] for const char*. godbolt.org/z/MyUxqE. The loops loop basically equivalent, assuming they never fall through to the part that calls
operator delete
. (Of course the string-view objects are 16 bytes wide and get copies withmovdqa
/movaps
). IDK, would have to try it locally with perf counters, or single step to see if delete calls happen. Increasing iteration count reduces the difference ratio some: godbolt.org/z/jvM8Cr@PeterCordes I followed your suggestion and increased iterations count and times benchmark runs to have time average across 500 runs. Here are results: iterations 10000 50000 100000 200000 300000 string_view 17 87 183 368 588 const char* 17 88 177 353 526 delta 0 -1 6 15 62 In this case the difference is minuscule and
const char*
is now faster@uni: That makes more sense. Probably
const char*
is faster for large iteration counts because it's smaller, and those large sizes mean bigger arrays that start to get L1 or even L2 cache misses. (Modern x86-64 can copy an 8-byte object just as fast as a 16-byte object, especially when they're aligned.) Still not sure what was slowing down the small size without a repeat loop.