Plotting Time Series Data with Matplotlib

It's been a while since my last article on Matplotlib. Today we're going to plot time series data for visualizing web page impressions, stock prices and the like over time.

If you haven't already, install Matplotlib (package python-matplotlib on Debian-based systems) and fire up a Python interpreter. For the rest of this article, we'll need the following imports:

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import matplotlib.dates as mdates

Usually, when plotting a diagram, the process is something like this: Create two arrays of the same length, one for the x axis and one for the y axis. Plotting time series data works the same way, but the data points on one axis (usually the x axis) are times or dates.

To get us started quickly, I have prepared sample data to play with:

2012-01-23    147
2012-01-24    157
2012-01-25    156
...
2012-03-09    184

The first column is a date in ISO format and the second column is the number of page impressions on that particular day. To work with this data, we read it from file creating two one-dimensional arrays days and impressions (we would get one two-dimensional array if it weren't for the unpack parameter):

>>> days, impressions = np.loadtxt("page-impressions.csv", unpack=True,
        converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

What's interesting here is the converters parameter. The loadtxt() function expects floating point data, so we have to register a converter that turns the date strings in column 0 into floating point numbers. Matplotlib represents dates and times as floats starting at January 1st, year 0001, so this is no problem for us. The mdates.strpdate2num() function is a factory function that returns a converter for the specified format. The format string uses the same conversion directives as strftime().

Let's have a look at the result:

>>> days[0:2]
array([ 734525.,  734526.])

The first array element represents 2012-01-23, the second 2012-01-24, and so on. We could easily convert those numbers back to dates using mdates.num2date() if we wanted to. In fact, this is what we'll need later to label our x axis.

Now let's plot the data using Matplotlib's plot_date() function. We use days as x values and impressions as y values and don't touch the default settings:

>>> plt.plot_date(x=days, y=impressions)
>>> plt.show()

image0

The diagram isn't really impressive, but note how Matplotlib automatically scales the axes and adds date labels to the x axis, converting the floating point numbers back to strings.

To make the diagram easier to read, we'll change the blue dots to a red line ("r-"), add some text and a grid:

>>> plt.plot_date(x=days, y=impressions, fmt="r-")
>>> plt.title("Page impressions on example.com")
>>> plt.ylabel("Page impressions")
>>> plt.grid(True)
>>> plt.show()

image1

That's better. In a future article, we'll use a bar chart that looks a lot better for small data sets. Until then, here's the complete script for easy copy and pasting:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

days, impressions = np.loadtxt("page-impressions.csv", unpack=True,
        converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

plt.plot_date(x=days, y=impressions, fmt="r-")
plt.title("Page impressions on example.com")
plt.ylabel("Page impressions")
plt.grid(True)
plt.show()

social