Quick Tip #3: Creating Histograms in Python

Since Python 2.5, creating histograms has become easier. Instead of dict, we can now use defaultdict which is similar in behavior to awk’s associative arrays. Instead of raising a KeyError for undefined keys, defaultdict adds a user-defined item and returns it.

I’ll demonstrate this with a simple program that analyzes line length distribution in a file. In older Python versions, you’d typically write code like this:

hist = { }
for line in open(filename):
    hist[len(line)] = hist.get(len(line), 0) + 1

The code using defaultdict is much clearer and more elegant (although an additional import is needed):

from collections import defaultdict

hist = defaultdict(int)
for line in open(filename):
    hist[len(line)] += 1

Note that defaultdict‘s constructor expects a factory function that initializes unset items on request.

Unless you dump the contents of hist to gnuplot or similar, you might want to sort the dict by value. There are several ways to do this, but I learned from a related blog posting that this is the most efficient way:

from operator import itemgetter
sorted(hist.iteritems(), key=itemgetter(1))

The min and max builtins support the key parameter, too, by the way.

This entry was posted in python and tagged , . Bookmark the permalink.

2 Responses to Quick Tip #3: Creating Histograms in Python

  1. geparada says:

    Very useful tip!!
    But… with sorted(hist.iteritems(), key=itemgetter(1)) get the alphanumeric sort like:
    [(’10’, 1), (‘2’, 1), (‘5’, 1), (‘8’, 1), (‘6’, 1)]

    And I would like to get the numeric sort:
    [(‘2’, 1), (‘5’, 1), (‘8’, 1), (‘6′, 1),(’10’, 1) ]

    • mafr says:

      It seems you’re putting strings in your defaultdict (my example uses line lengths which are ints), so you get alphanumeric order. Just convert the data to int before adding it to hist and you should be fine.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s