Back to basics; for-loops, arrays/vectors/lists, and optimization

I was working on some code recently and came across a method that had 3 for-loops that worked on 2 different arrays.

Basically, what was happening was a foreach loop would walk through a vector and convert a DateTime from an object, and then another foreach loop would convert a long value from an object. Each of these loops would store the converted value into lists.

The final loop would go through these two lists and store those values into yet another list because one final conversion needed to be done for the date.

Then after all that is said and done, The final two lists are converted to an array using ToArray().

Ok, bear with me, I'm finally getting to my question.

So, I decided to make a single for loop to replace the first two foreach loops and convert the values in one fell swoop (the third loop is quasi-necessary, although, I'm sure with some working I could also put it into the single loop).

But then I read the article "What your computer does while you wait" by Gustav Duarte and started thinking about memory management and what the data was doing while it's being accessed in the for-loop where two lists are being accessed simultaneously.

So my question is, what is the best approach for something like this? Try to condense the for-loops so it happens in as little loops as possible, causing multiple data access for the different lists. Or, allow the multiple loops and let the system bring in data it's anticipating. These lists and arrays can be potentially large and looping through 3 lists, perhaps 4 depending on how ToArray() is implemented, can get very costy (O(n^3) ??). But from what I understood in said article and from my CS classes, having to fetch data can be expensive too.

Would anyone like to provide any insight? Or have I completely gone off my rocker and need to relearn what I have unlearned?

Thank you


Well, you've got complications if the two vectors are of different sizes. As has already been pointed out, this doesn't increase the overall complexity of the issue, so I'd stick with the simplest code - which is probably 2 loops, rather than 1 loop with complicated test conditions re the two different lengths.

Actually, these length tests could easily make the two loops quicker than a single loop. You might also get better memory fetch performance with 2 loops - i.e. you are looking at contiguous memory - i.e. A[0],A[1],A[2]... B[0],B[1],B[2]..., rather than A[0],B[0],A[1],B[1],A[2],B[2]...

So in every way, I'd go with 2 separate loops ;-p

The best approach? Write the most readable code, work out its complexity, and work out if that's actually a problem.

If each of your loops is O(n), then you've still only got an O(n) operation.

Having said that, it does sound like a LINQ approach would be more readable... and quite possibly more efficient as well. Admittedly we haven't seen the code, but I suspect it's the kind of thing which is ideal for LINQ.

For referemce,

the article is at What your computer does while you wait - Gustav Duarte

Also there's a guide to big-O notation.

It's impossible to answer the question without being able to see code/pseudocode. The only reliable answer is "use a profiler". Assuming what your loops are doing is a disservice to you and anyone who reads this question.

Am I understanding you correctly in this?

You have these loops:

for (...){
  // Do A

for (...){
  // Do B

for (...){
  // Do C

And you converted it into

for (...){
  // Do A
  // Do B
for (...){
  // Do C

and you're wondering which is faster?

If not, some pseudocode would be nice, so we could see what you meant. :) Impossible to say. It could go either way. You're right, fetching data is expensive, but locality is also important. The first version may be better for data locality, but on the other hand, the second has bigger blocks with no branches, allowing more efficient instruction scheduling.

If the extra performance really matters (as Jon Skeet says, it probably doesn't, and you should pick whatever is most readable), you really need to measure both options, to see which is fastest.

My gut feeling says the second, with more work being done between jump instructions, would be more efficient, but it's just a hunch, and it can easily be wrong.

Need Your Help

change the height of a gridview header row programmatically

c# gridview

i am trying to change the height of both the tables cells height and the header rows height with code behind. the code below works only on the cells and not the header rows cells, how can this be d...

How to identify and make a link out of '@' in user comments like they do on Youtube

ruby-on-rails youtube

I want to look into making functionality like Youtube has on their website when a user types in a comment and if they have something like '@username' the system recognizes the '@' sign and makes a ...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.