It's been a while since my last article on Matplotlib. Today we're going to plot time series data for visualizing web page impressions, stock prices and the like over time.

If you haven't already, install Matplotlib (package `python-matplotlib` on Debian-based systems) and fire up a Python interpreter. For the rest of this article, we'll need the following imports:

>>> import numpy as np >>> import matplotlib.pyplot as plt >>> import matplotlib.dates as mdates

Usually, when plotting a diagram, the process is something like this: Create two arrays of the same length, one for the x axis and one for the y axis. Plotting time series data works the same way, but the data points on one axis (usually the x axis) are times or dates.

To get us started quickly, I have prepared sample data to play with:

2012-01-23 147 2012-01-24 157 2012-01-25 156 ... 2012-03-09 184

The first column is a date in ISO format and the second column is the number of page impressions on that particular day. To work with this data, we read it from file creating two one-dimensional arrays `days` and `impressions` (we would get one two-dimensional array if it weren't for the `unpack` parameter):

>>> days, impressions = np.loadtxt("page-impressions.csv", unpack=True, converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

What's interesting here is the `converters` parameter. The loadtxt() function expects floating point data, so we have to register a converter that turns the date strings in column 0 into floating point numbers. Matplotlib represents dates and times as floats starting at January 1st, year 0001, so this is no problem for us. The `mdates.strpdate2num()` function is a factory function that returns a converter for the specified format. The format string uses the same conversion directives as strftime().

Let's have a look at the result:

>>> days[0:2] array([ 734525., 734526.])

The first array element represents 2012-01-23, the second 2012-01-24, and so on. We could easily convert those numbers back to dates using `mdates.num2date()` if we wanted to. In fact, this is what we'll need later to label our x axis.

Now let's plot the data using Matplotlib's plot_date() function. We use `days` as x values and `impressions` as y values and don't touch the default settings:

>>> plt.plot_date(x=days, y=impressions) >>> plt.show()

The diagram isn't really impressive, but note how Matplotlib automatically scales the axes and adds date labels to the x axis, converting the floating point numbers back to strings.

To make the diagram easier to read, we'll change the blue dots to a red line ("r-"), add some text and a grid:

>>> plt.plot_date(x=days, y=impressions, fmt="r-") >>> plt.title("Page impressions on example.com") >>> plt.ylabel("Page impressions") >>> plt.grid(True) >>> plt.show()

That's better. In a future article, we'll use a bar chart that looks a lot better for small data sets. Until then, here's the complete script for easy copy and pasting:

import numpy as np import matplotlib.pyplot as plt import matplotlib.dates as mdates days, impressions = np.loadtxt("page-impressions.csv", unpack=True, converters={ 0: mdates.strpdate2num('%Y-%m-%d')}) plt.plot_date(x=days, y=impressions, fmt="r-") plt.title("Page impressions on example.com") plt.ylabel("Page impressions") plt.grid(True) plt.show()