How can you time arithmetic operations in c++?

Question:

1970-01-01 00:00:00 UTC

How can you time arithmetic operations in c++?

Three answers:

quatto

2016-11-08 07:36:52 UTC

C++ is way more beneficial than C, yet C++ stocks ninety 9.9% of C's syntax and purely throws away or adjustments some issues. no, all those operators at the on the spot are not dealt with as /, purely / is dealt with as /, and an integer divide isn't like a flow divide. +-/* are context-pushed. I *imagine* ^ and % is integer purely. I truly have under no circumstances considered them used with floats, in spite of the actual incontrovertible truth that it may be achieved i think in case you pick to regulate mantissas and exponents in a application IEEE-754 floating aspect equipment or something. definite, ++ and -- belong to C++ and to my awareness they're integer purely.

green meklar

2011-11-14 20:24:36 UTC

If you are getting results, then you must have some code that is giving you those results. Can we see it?

2011-11-14 19:02:53 UTC

12 ns is a pretty slow computer for today's standards, I would expect less than 1 ns!

Note that It's a bit tricky to set up such a benchmark

At the very least, you should:

1. make sure all variables are volatile (otherwise the compiler will optimize out code)

2. repeat the test many times (i'll do one billion)

given these two conditions, you can get something like:

#include

#include

using namespace std::chrono;

const unsigned long long CNT = 1000000000; // 10^9

// must be volatile to avoid optimization

volatile double f = 0.0, g = 1.0, h=2.0;

int main()

{

     auto t1 = high_resolution_clock::now();

     // load from g, load from h, multiplication, store to f

     for(unsigned long long cnt = 0; cnt < CNT; ++cnt)

         f = g*h;

     auto t2 = high_resolution_clock::now();

     std::cout << "two loads, 1 multiplication, and one store took " << duration_cast( t2 - t1 ).count()/(double)CNT << " ns\n";

}

on my home computer, this results in 1.13 ns

ideone.com gets a very similar 1.16 ns https://ideone.com/VShyV

But this is measuring more than just the multiplication. On my computer, the body of that loop is:

.L2:

     movsd g(%rip), %xmm0 // load g from RAM to CPU

     subq $1, %rax // decrement loop counter

     movsd h(%rip), %xmm1 // load h from RAM to CPU

     mulsd %xmm1, %xmm0 // multiply

     movsd %xmm0, f(%rip) // store f from CPU to RAM

     jne .L2 // test the loop condition and loop if not reached

So I measured two loads from RAM, one store to RAM, one integer decrement, and one zero test in addition to one multiplication. To get just multiplication, you need to benchmark a loop that does the same things *except* multiplying:

     for(unsigned long long cnt = 0; cnt < CNT; ++cnt)

         f = g,h; // load, load, store

and subtact the time this loop took from the time the multiplying loop did.

But here's a catch: the compiler I use, GCC, loads g and h to integer CPU registers in this case, since no floating-point math is requested, and the timings of the second loop are not comparable.

I had to use assembly language to craft the second loop properly (I took the code of the first loop and commented out the multiplication) to measure the throughput of floating-point multiplication on my computer, and I got 0.37 ns per multiplication, which is exactly one operation per CPU clock (I'm running a first-generation i7 at 2.66 GHz) as expected for a throughput-type benchmark.

ⓘ

This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.