Map/Reduce in Python

My interest in Grid Computing over the last weeks begins to show. After reading the Google MapReduce paper, I tried my fingers on a client side toy problem.

For formatting purposes, I was interested in the size of the longest string in a sequence. There are lots of ways to do this, but I wanted to try Python's map and reduce functions.

This is my sample input sequence:

values = ('abc', 'xy', 'abcdef', 'xyz')

The first step is to transform the values sequence into a new list which contains the lengths of each string:

mapped = map(lambda x: len(x), values)

After that, the mapped list is reduced to a single value:

max_len = reduce(lambda x, y: max(x, y), mapped)

Both user-defined map and reduce functions are associative and commutative, so this task can be parallelized easily. In fact, I could in theory hand each map operation off to a different node in a grid. Because of the max function's properties, the reduce could be executed in multiple steps on a grid, too.

BTW: While map/reduce was a nice experiment, I used a different implementation in the end, using list comprehension and the powerful max() function:

max( [len(x) for x in values] )

If I had been interested in the longest string itself, I would have used the following (python-2.5 only):

max(values, key=lambda x: len(x))

social