Plotting Time Series Data with Matplotlib

It’s been a while since my last article on Matplotlib. Today we’re going to plot time series data for visualizing web page impressions, stock prices and the like over time.

If you haven’t already, install Matplotlib (package python-matplotlib on Debian-based systems) and fire up a Python interpreter. For the rest of this article, we’ll need the following imports:

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import matplotlib.dates as mdates

Usually, when plotting a diagram, the process is something like this: Create two arrays of the same length, one for the x axis and one for the y axis. Plotting time series data works the same way, but the data points on one axis (usually the x axis) are times or dates.

To get us started quickly, I have prepared sample data to play with:

2012-01-23    147
2012-01-24    157
2012-01-25    156
2012-03-09    184

The first column is a date in ISO format and the second column is the number of page impressions on that particular day. To work with this data, we read it from file creating two one-dimensional arrays days and impressions (we would get one two-dimensional array if it weren’t for the unpack parameter):

>>> days, impressions = np.loadtxt("page-impressions.csv", unpack=True,
        converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

What’s interesting here is the converters parameter. The loadtxt() function expects floating point data, so we have to register a converter that turns the date strings in column 0 into floating point numbers. Matplotlib represents dates and times as floats starting at January 1st, year 0001, so this is no problem for us. The mdates.strpdate2num() function is a factory function that returns a converter for the specified format. The format string uses the same conversion directives as strftime().

Let’s have a look at the result:

>>> days[0:2]
array([ 734525.,  734526.])

The first array element represents 2012-01-23, the second 2012-01-24, and so on. We could easily convert those numbers back to dates using mdates.num2date() if we wanted to. In fact, this is what we’ll need later to label our x axis.

Now let’s plot the data using Matplotlib’s plot_date() function. We use days as x values and impressions as y values and don’t touch the default settings:

>>> plt.plot_date(x=days, y=impressions)

The diagram isn’t really impressive, but note how Matplotlib automatically scales the axes and adds date labels to the x axis, converting the floating point numbers back to strings.

To make the diagram easier to read, we’ll change the blue dots to a red line ("r-"), add some text and a grid:

>>> plt.plot_date(x=days, y=impressions, fmt="r-")
>>> plt.title("Page impressions on")
>>> plt.ylabel("Page impressions")
>>> plt.grid(True)

That’s better. In a future article, we’ll use a bar chart that looks a lot better for small data sets. Until then, here’s the complete script for easy copy and pasting:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

days, impressions = np.loadtxt("page-impressions.csv", unpack=True,
        converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

plt.plot_date(x=days, y=impressions, fmt="r-")
plt.title("Page impressions on")
plt.ylabel("Page impressions")
This entry was posted in python and tagged , . Bookmark the permalink.

16 Responses to Plotting Time Series Data with Matplotlib

  1. Adam says:

    Thanks for the writeup and info – I had some real issues but it turned out to mostly be date and csv formatting from excel to csv.

    I had previously been pulling data out of a larger set (50+ column spreadsheet) and trying to plot values against dates – but could not get them plotted correctly.

    I don’t have much experience with the “converter” terminology in the np.loadtext command, I’ll look it up when I have some time, but does the ‘0: ‘ section mean that it is converting all (or trying to convert?) all elements of the text file?


  2. mafr says:

    Glad you like it! The “0: func” part only converts the first column, loadtxt()’s column numbering is zero-based.

  3. George says:

    fabulos little tutorial really easy to understand

  4. Pingback: Matplotlib: Plotting Bar Diagrams | Matthias Friedrich's Blog

  5. Geoff says:

    Is “strptime2num” a python 3 function ? I (using google) can’t find any mention of it anywhere ???

  6. Casey says:

    Thanks for the tutorial. It is the first I have seen to plot dates on a time series plot rather than numbers. Trying to apply it to my own needs, I have trouble getting a .csv file to format like yours did and also don’t have any experience with the “converter” terminology. Is there a way to do this with two data columns; one with date information and one with some secondary info or do they have to be combined into one column? When I tried, it viewed the second column entry and told me “unconverted data remains:” rather than assigning it to the second variable.

    • Matthias says:

      I’m not entirely sure I get you completely, but you can pass the usecols parameter to loadtxt() to only read the columns you need from your file: np.loadtxt(“filename”, unpack=True, converters={ 0: mdates.strpdate2num(‘%Y-%m-%d’) }, usecols=(0, 1)).

      This would select the first and second columns (0 and 1) with column 0 being a timestamp that is converted.

  7. RL says:

    How about a animated thing in a sub plot.. I managed to draw a Ĺ›ingle’plot with real time graph update but subplots are just eluding me.. Like say you get quotes off a web every minute and then plot it for say the stock prices in a sub plot and the RSI in another one just below it. (Newbie to both python and matplot lib. Started about a week ago).


  8. Alessandro says:

    Useful post, thank you!

    However, when using Python3.X it does not work (probably due to some issues related to the new handling of strings in Python3.X).
    I tested the script with some modifications, inspired by the following post and it works!

  9. andy says:

    Very nice tutorial, Here are some changes I made to get it to work with python 3
    import matplotlib.dates as mdates
    from matplotlib.dates import strpdate2num

    def bytedate2num(fmt):
        def converter(b):
            return mdates.strpdate2num(fmt)(b.decode('ascii'))
        return converter
    date_converter = bytedate2num("%Y-%m-%d")
    csvData = np.loadtxt("page-impressions.csv", unpack=False,
            converters={ 0 : date_converter})
    days = csvData[:,0]
    impressions = csvData[:,1]
  10. Raja says:

    Thank you for this nice explanation. It is a good start for me – always start with something that is working before it gets more complex. Now the next problem comes: What if I have a date and than a time format and several csv files which I would like to hang all together to get one time series, but I will figure out. Just wanted to say thanx – this works so far. :)

  11. diedro says:

    Dear M,
    great post. How can I add even hours and minutes?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s