Parallelize a list consolidation in Python

I have a sorted list of "box definitions" that I would like to consolidate. The list looks something like:

big_list = [\
# ...
# ...
[3, 4, 5, 4, 5, 6, 65],\
[3, 4, 5, 4, 5, 6, 60],\
[3, 4, 5, 4, 5, 6, 55],\
[3, 4, 5, 4, 5, 6, 52],\
[3, 4, 5, 4, 5, 6, 23],\
[3, 4, 5, 4, 5, 6, 17],\
[3, 4, 5, 4, 5, 6, 0],\
[5, 8, 9, 6, 9, 10, 90],\
[5, 8, 9, 6, 9, 10, 84],\
[5, 8, 9, 6, 9, 10, 32],\
[5, 8, 9, 6, 9, 10, 0],\
# ...
# ...
[750, 800, 900, 751, 801, 901, 97],\
[750, 800, 900, 751, 801, 901, 24],\
[750, 800, 900, 751, 801, 901, 17],\
[750, 800, 900, 751, 801, 901, 16],\
[750, 800, 900, 751, 801, 901, 0]\
# ...
# ...
]

Where the box "format" is: [x1, y1, z1, x2, y2, z2, attribute], and we can assume dx=1, dy=1, dz=1

Also, we can assume the list has already been sorted by something like:

big_list=sorted(big_list, key=lambda n:n[6], reverse=True)
big_list=sorted(big_list, key=lambda n:n[2])
big_list=sorted(big_list, key=lambda n:n[1])
big_list=sorted(big_list, key=lambda n:n[0])

The list may be several millions of items long, and I would like to reduce the list so that any discrete "box" only gets the highest "attribute"...so something in this case like:

reduced_big_list = [\
[3, 4, 5, 4, 5, 6, 65],\
[5, 8, 9, 6, 9, 10, 90],\
[750, 800, 900, 751, 801, 901, 97]\
]

The method I am currently using on this list is something like:

i = 0

while i < len(big_list)-1:
     if big_list[i][0]==big_list[i+1][0]\
     and big_list[i][1]==big_list[i+1][1]\
     and big_list[i][2]==big_list[i+1][2] \
     and big_list[i][6] >= big_list[i+1][6]:
          del big_list[i+1]
     else:
          i=i+1

The problem is that when the list is "long" (10 million+ "boxes"), the process is very, very slow.

Is there a clever way to parallelize this list "decimation" process or perhaps quicken this process?

Answers


The slowness is the call to del, which moves the items of the complete tail of the list by one step. In your case, simply don't use del. Make instead a new list, starting from an empty list and appending the items that you want to keep.


The Boolean and evaluates the left expression first. It only evaluates the right hand expression if the first one is true. Since you've sorted your list, the adjacent elements are maybe more likely to have identical 0th elements than later elements. Try

i = 0

while i < len(big_list)-1:
    if big_list[i][2]==big_list[i+1][2]\
    and big_list[i][1]==big_list[i+1][1]\
    and big_list[i][0]==big_list[i+1][0]\
    and big_list[i][6] >= big_list[i+1][6]:
        del big_list[i+1]
    else:
        i=i+1

The reason is slow is that each time you del a line takes a linear amount of time, making the total process O(n^2).

If instead of deleting lines from the original list, you append the lines you want to keep into a new list, it should be much faster.

But there are other, possibly more Pythonic, ways to perform the same thing. For example, using itertools.groupby (assuming the list is sorted as you specified):

from itertools import groupby
new_list = [next(group) for val,group in groupby(big_list, key=lambda x: x[:3])]

This will group the list items by the first 3 elements, and return a list of the first item in each group.


Need Your Help

UIPopoverController, backgroundColor = clearColor

iphone uipopovercontroller uicolor

I had a UITableView that I replaced with a UIPopoverController. My UITableView's background was set to clearColor and it showed the custom background through the table. The table cells also match...

ASP.NET Show Div Conditionally

c# asp.net webforms

I have googled this and didn't find what I needed. All of the existing solutions I found say to set "visibility" to false. This will not seem to work for me as my application renders PDF which si...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.