How can I measure cold-code performance?
Suppose I have two methods, Foo and Bar, that do roughly the same thing, and I want to measure which one is faster. Also, single execution of both Foo and Bar is too fast to measure reliably.
Normally, I'd simply run them both a huge number of times like this:
var sw=new Stopwatch(); sw.Start(); for(int ii=0;ii<HugeNumber;++ii) Foo(); sw.Stop(); Console.WriteLine("Foo: "+sw.ElapsedMilliseconds); // and the same code for Bar
But in this way, every run of Foo after the first will probably be working with processor cache, not actual memory. Which is probably way faster than in real application. What can I do to ensure that my method is run cold every time?
Clarification By "roughly the same thing" I mean the both methods are used in the same way, but actual algorithm may differ significantly. For example, Foo might be doing some tricky math, while Bar skips it by using more memory.
And yes, I understand that methods running in cold will not have much effect on overall performance. I'm still interested which one is faster.
First of all if Foo is working with the processor cache then Bar will also work with the processor cache. Shouldn't It ???????? So both of your functions are getting the same previledge. Now suppose the after first time the time for foo is A and then it is running with avg time B as it is working with processor cache. So total time will be
A + B*(hugenumber-1)
Similarly for Bar it will be
C + D*(hugenumber-1) //where C is the first runtime and D is the avg runtime using prscr cache
If i am not wrong here the result is depended on B and D and both of them are average runtime using the processor cache. So if you want to calculate which of your function is better I thing processor cache is not a problem as both functions are suppose to use that.
I think now its clear. As Bar is skipping some tricky maths by using memory it will have a little bit (may be in nano/pico seconds) advantage. So in order to restrict that you have to flush your cpu cache inside your for loop. As in both the loops you will be doing the same thing I think now you will get a better idea about which function is better. There is already a stack overflow discussion on how to flush cpu cache. Please vist this link hope it helps.
Edit details: Improved answer and corrected spellings