I have this array : var arr = new int[] { 1, 1, 0, -1, -1 };
where I need to count the number of positives, negatives and zeros numbers.
I did it with foreach
loop and with Linq
and I tried to compare the performance between the two methods by using the Stopwatch
here is my code:
int pos = 0, neg = 0, zeros = 0;
int p = 0, n = 0, z = 0;
Stopwatch sw = new Stopwatch();
sw.Start();
pos = arr.Sum(e => e > 0 ? 1 : 0);
neg = arr.Sum(e => e < 0 ? 1 : 0);
zeros = arr.Sum(e => e == 0 ? 1 : 0);
sw.Stop();
Stopwatch sw2 = new Stopwatch();
sw2.Start();
foreach (var item in arr)
{
if (item > 0) p++;
else if (item < 0) n++;
else z++;
}
sw2.Stop();
Console.WriteLine("Elapsed={0}", sw.Elapsed); //Elapsed=00:00:00.0008311
Console.WriteLine("Elapsed2={0}", sw2.Elapsed); //Elapsed2=00:00:00.0000028
The results showed me that the foreach
loop where a lot better (28ms) than the Linq
method (8311ms), So my question is why there is all this difference in performance?
I even tried to make three foreach
loop, one to count the negatives, one to count the positives and the third to count the zeros, but the performance was still good than the Linq method!
Thanks in advance for your help!
Let's carry out horse races in a bit different manner:
private static string UnderTest(int size) {
int pos = 0, neg = 0, zeros = 0;
int p = 0, n = 0, z = 0;
Random random = new Random(0);
int[] arr = Enumerable
.Range(0, size)
.Select(x => random.Next(-1, 2))
.ToArray();
GC.Collect(GC.MaxGeneration);
Stopwatch sw = new Stopwatch();
sw.Start();
// Three Linq loops (expected to be 3 three times slower)
pos = arr.Sum(e => e > 0 ? 1 : 0);
neg = arr.Sum(e => e < 0 ? 1 : 0);
zeros = arr.Sum(e => e == 0 ? 1 : 0);
sw.Stop();
Stopwatch sw2 = new Stopwatch();
sw2.Start();
// Just 1 loop
foreach (var item in arr) {
if (item > 0) p++;
else if (item < 0) n++;
else z++;
}
sw2.Stop();
return $"{sw.Elapsed} vs. {sw2.Elapsed} ratio: {((double)sw.Elapsed.Ticks) / sw2.Elapsed.Ticks:f3}";
}
Races for different array size
s:
int[] loops = new int[] {
1000,
10_000,
100_000,
1_000_000,
10_000_000,
100_000_000,
1000, // <- 1000 again
};
string report = string.Join(Environment.NewLine, loops
.Select(loop => $"loops: {loop,10} : {UnderTest(loop)}"));
Console.Write(report);
Outcome:
loops: 1000 : 00:00:00.0006471 vs. 00:00:00.0000114 ratio: 56.763 // <- Warming up
loops: 10000 : 00:00:00.0003195 vs. 00:00:00.0001074 ratio: 2.975
loops: 100000 : 00:00:00.0037131 vs. 00:00:00.0010910 ratio: 3.403
loops: 1000000 : 00:00:00.0351574 vs. 00:00:00.0118858 ratio: 2.958
loops: 10000000 : 00:00:00.3729617 vs. 00:00:00.1198276 ratio: 3.112
loops: 100000000 : 00:00:03.7002508 vs. 00:00:01.1808595 ratio: 3.134
loops: 1000 : 00:00:00.0000322 vs. 00:00:00.0000099 ratio: 3.253 // <- Expected
What's going on: we have 3
Linq based loops (3
calls for Sum
), so Linq 3 times slower ( no surprise).
However we have a huge outlier at the very first run when system loads assembly, compiles IL code etc.
So you have a so called warming up effect.
Very nice answer. However, would you please explain why you are collecting garbage after creating the array
loops
?@Cid: Thank you! A better place for
GC.Collect(GC.MaxGeneration);
is just before races (withinUnderTest
): we don't wantGC
occasionally starts and disturbe the measurementsThis makes sense, I'm not used to manually call GC, so collecting directly in the code would prevent the garbages to stack (heap, eh) up the street and the collector being called automatically ?
@Cid: We try to avoid unwanted processes (esp. garbage collecting which stops the world) to interfere. If we collect all the garbage (e.g., the array with millions of items) we can hope, that the following run will not be interrupted (say, in the middle of
foreach
).