Parallelizing the “Reduce” in “MapReduce”

I understand how Map is easily parallelizable - each computer/CPU can just operate on a small portion of the array.

Is Reduce/foldl parallelizable? It seems like each computation depends on the previous one. Is it just parallelizable for certain types of functions?

Answers


If your reduction underlying operation is associative*, you can play with the order of operations and locality. Therefore you often have a tree-like structure in the 'gather' phase, so you can do it in several passes in logarithmic time:

a  +  b  +  c  +  d
 \   /       \   /
 (a+b)       (c+d)
     \       /
   ((a+b)+(c+d))

instead of (((a+b)+c)+d)

If your operation is commutative, further optimization are possible as you can gather in different order (it may be important for data alignment when those operations are vector operations for example)

[*] your real desired mathematical operations, not those on effective types like floats of course.


Need Your Help

Using a templated parameter's value_type

c++ templates std

How is one supposed to use a std container's value_type?

Enable submit button on selecting an option along with checkbox

javascript checkbox

i have a javascript below which enables the submit button if check the checkbox(atlease one).

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.