Parallelizing the “Reduce” in “MapReduce”

I understand how Map is easily parallelizable - each computer/CPU can just operate on a small portion of the array.

Is Reduce/foldl parallelizable? It seems like each computation depends on the previous one. Is it just parallelizable for certain types of functions?

Answers


If your reduction underlying operation is associative*, you can play with the order of operations and locality. Therefore you often have a tree-like structure in the 'gather' phase, so you can do it in several passes in logarithmic time:

a  +  b  +  c  +  d
 \   /       \   /
 (a+b)       (c+d)
     \       /
   ((a+b)+(c+d))

instead of (((a+b)+c)+d)

If your operation is commutative, further optimization are possible as you can gather in different order (it may be important for data alignment when those operations are vector operations for example)

[*] your real desired mathematical operations, not those on effective types like floats of course.


Need Your Help

Using a templated parameter's value_type

c++ templates std

How is one supposed to use a std container's value_type?

Enable submit button on selecting an option along with checkbox

javascript checkbox

i have a javascript below which enables the submit button if check the checkbox(atlease one).