Tag Archives: python

Scikit-learn: Feature Extraction From Text

I’ve been playing with scikit-learn recently, a machine learning package for Python. While there’s great documentation on many topics, feature extraction isn’t one of them. My use case was to turn article tags (like I use them on my blog) … Continue reading

Posted in python | Tagged , , | 7 Comments

Plotting Time Series Data with Matplotlib

It’s been a while since my last article on Matplotlib. Today we’re going to plot time series data for visualizing web page impressions, stock prices and the like over time.

Posted in python | Tagged , | 16 Comments

Basics of Near Duplicate Detection

Finding duplicate files is easy, anyone can do it. Finding files that are almost identical is more difficult, but it’s useful for use cases like detecting plagiarism. In this article, I’ll present a simple python program that calculates the textual … Continue reading

Posted in computer science | Tagged , | 9 Comments

Delicious shutting down?

Yesterday, I was quite surprised when I head rumors that Delicious, my favorite link sharing site, is shutting down. According to their blog, they are looking for a way to continue the service outside of Yahoo, but it’s better to … Continue reading

Posted in python | Tagged , , , , | 1 Comment

The Future of python-musicbrainz2

I started the python-musicbrainz2 project in January 2006 as the first client library to the newly designed MusicBrainz XML web service. It has been my first Python project and I learned quite a lot in the process. Now MusicBrainz is … Continue reading

Posted in python | Tagged , , | Leave a comment

Finding the Majority Item in a Stream

Going through old CACM issues I discovered a paper (PDF) on stream processing. A common problem in this field is to find frequent items in a data stream when you only get one pass through the data and you need … Continue reading

Posted in computer science | Tagged , | Leave a comment

Fun with Context Managers

Sometimes I need a simple stop watch in my Python scripts to find out how expensive my code is in wall clock time. The problem is trivial to solve, but I thought I’d give it a try using Python’s with … Continue reading

Posted in python | Tagged , | Leave a comment