Your code is OK, except that I would compute (end-start) instead of (start-end) to get positive value.
The issue is you are trying to measure the time for your processor to basically do 500 times an addition and a test.With today processors that include floating point units and run at let's say 2.4GHz, that would mean a few micro seconds.
Let's be more precise, here is the disassembly of this code on Windows processor, non optimized mode:
for (double i = 0; i < 500000000; i++)
010013C0 fldz
010013C2 fstp qword ptr [i]
010013C5 jmp wmain+43h (10013D3h)
010013C7 fld qword ptr [i]
010013CA fadd qword ptr [__real@3ff0000000000000 (1005850h)]
010013D0 fstp qword ptr [i]
010013D3 fld qword ptr [__real@415312d000000000 (1005840h)]
010013D9 fcomp qword ptr [i]
010013DC fnstsw ax
010013DE test ah,41h
010013E1 jne wmain+55h (10013E5h)
{
}
010013E3 jmp wmain+37h (10013C7h)
9 instructions, everything in the cache so let's say 9 cycles [actually it's both more and less due to the pipelining architecture of current processors].
My processor is running at 3.3GHz
thus duration = 500 * 9 / 3.3e9
= 1.363e-6 thus around 1.4 microseconds, no way to measure such timing.
To confirm my estimate I run this code using 500 000 000 i.e. 1 million time more so should takearound 1.4s
And I got 1.537s so quite close to my estimate ;-)
Here is the modified code for reference:
clock_t start, end;
double diff;
start = clock();
for (double i = 0; i < 500000000; i++)
{
}
end = clock();
diff = (double(end-start)/CLOCKS_PER_SEC);
cout << diff;