My interest in Grid Computing over the last weeks begins to show. After reading the Google MapReduce paper, I tried my fingers on a client side toy problem.
For formatting purposes, I was interested in the size of the longest string in a sequence. There are lots of ways to do this, but I wanted to try Python’s
This is my sample input sequence:
values = ('abc', 'xy', 'abcdef', 'xyz')
The first step is to transform the
values sequence into a new list which contains the lengths of each string:
mapped = map(lambda x: len(x), values)
After that, the
mapped list is reduced to a single value:
max_len = reduce(lambda x, y: max(x, y), mapped)
Both user-defined map and reduce functions are associative and commutative, so this task can be parallelized easily. In fact, I could in theory hand each map operation off to a different node in a grid. Because of the
max function’s properties, the reduce could be executed in multiple steps on a grid, too.
BTW: While map/reduce was a nice experiment, I used a different implementation in the end, using list comprehension and the powerful
max( [len(x) for x in values] )
If I had been interested in the longest string itself, I would have used the following (python-2.5 only):
max(values, key=lambda x: len(x))