Plotting Time Series Data with Matplotlib

It’s been a while since my last article on Matplotlib. Today we’re going to plot time series data for visualizing web page impressions, stock prices and the like over time.

If you haven’t already, install Matplotlib (package python-matplotlib on Debian-based systems) and fire up a Python interpreter. For the rest of this article, we’ll need the following imports:

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import matplotlib.dates as mdates

Usually, when plotting a diagram, the process is something like this: Create two arrays of the same length, one for the x axis and one for the y axis. Plotting time series data works the same way, but the data points on one axis (usually the x axis) are times or dates.

To get us started quickly, I have prepared sample data to play with:

2012-01-23    147
2012-01-24    157
2012-01-25    156
...
2012-03-09    184

The first column is a date in ISO format and the second column is the number of page impressions on that particular day. To work with this data, we read it from file creating two one-dimensional arrays days and impressions (we would get one two-dimensional array if it weren’t for the unpack parameter):

>>> days, impressions = np.loadtxt("page-impressions.csv", unpack=True,
        converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

What’s interesting here is the converters parameter. The loadtxt() function expects floating point data, so we have to register a converter that turns the date strings in column 0 into floating point numbers. Matplotlib represents dates and times as floats starting at January 1st, year 0001, so this is no problem for us. The mdates.strpdate2num() function is a factory function that returns a converter for the specified format. The format string uses the same conversion directives as strftime().

Let’s have a look at the result:

>>> days[0:2]
array([ 734525.,  734526.])

The first array element represents 2012-01-23, the second 2012-01-24, and so on. We could easily convert those numbers back to dates using mdates.num2date() if we wanted to. In fact, this is what we’ll need later to label our x axis.

Now let’s plot the data using Matplotlib’s plot_date() function. We use days as x values and impressions as y values and don’t touch the default settings:

>>> plt.plot_date(x=days, y=impressions)
>>> plt.show()

The diagram isn’t really impressive, but note how Matplotlib automatically scales the axes and adds date labels to the x axis, converting the floating point numbers back to strings.

To make the diagram easier to read, we’ll change the blue dots to a red line ("r-"), add some text and a grid:

>>> plt.plot_date(x=days, y=impressions, fmt="r-")
>>> plt.title("Page impressions on example.com")
>>> plt.ylabel("Page impressions")
>>> plt.grid(True)
>>> plt.show()

That’s better. In a future article, we’ll use a bar chart that looks a lot better for small data sets. Until then, here’s the complete script for easy copy and pasting:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

days, impressions = np.loadtxt("page-impressions.csv", unpack=True,
        converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

plt.plot_date(x=days, y=impressions, fmt="r-")
plt.title("Page impressions on example.com")
plt.ylabel("Page impressions")
plt.grid(True)
plt.show()
About these ads
This entry was posted in python and tagged , . Bookmark the permalink.

8 Responses to Plotting Time Series Data with Matplotlib

  1. Adam says:

    Thanks for the writeup and info – I had some real issues but it turned out to mostly be date and csv formatting from excel to csv.

    I had previously been pulling data out of a larger set (50+ column spreadsheet) and trying to plot values against dates – but could not get them plotted correctly.

    I don’t have much experience with the “converter” terminology in the np.loadtext command, I’ll look it up when I have some time, but does the ‘0: ‘ section mean that it is converting all (or trying to convert?) all elements of the text file?

    Thanks!

  2. mafr says:

    Glad you like it! The “0: func” part only converts the first column, loadtxt()’s column numbering is zero-based.

  3. George says:

    fabulos little tutorial really easy to understand

  4. Pingback: Matplotlib: Plotting Bar Diagrams | Matthias Friedrich's Blog

  5. Geoff says:

    Is “strptime2num” a python 3 function ? I (using google) can’t find any mention of it anywhere ???

  6. Casey says:

    Thanks for the tutorial. It is the first I have seen to plot dates on a time series plot rather than numbers. Trying to apply it to my own needs, I have trouble getting a .csv file to format like yours did and also don’t have any experience with the “converter” terminology. Is there a way to do this with two data columns; one with date information and one with some secondary info or do they have to be combined into one column? When I tried, it viewed the second column entry and told me “unconverted data remains:” rather than assigning it to the second variable.

    • Matthias says:

      I’m not entirely sure I get you completely, but you can pass the usecols parameter to loadtxt() to only read the columns you need from your file: np.loadtxt(“filename”, unpack=True, converters={ 0: mdates.strpdate2num(‘%Y-%m-%d’) }, usecols=(0, 1)).

      This would select the first and second columns (0 and 1) with column 0 being a timestamp that is converted.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s