Web sites need design updates from time to time and this blog is no exception. While I was mostly happy with the Sapphire theme, code examples didn't look good and it didn't support widgets on the article pages. Most importantly, however, its tag cloud was quite ugly which prevented me …
read moreBasics of Near Duplicate Detection
Finding duplicate files is easy, anyone can do it. Finding files that are almost identical is more difficult, but it's useful for use cases like detecting plagiarism. In this article, I'll present a simple python program that calculates the textual similarity of two documents.
The basic idea is to reduce …
read moreSo Much for 2010
2010 is over and almost forgotten already and my blog has been running for four years now. Like in previous years, I haven't been posting much, but pretty regularly. The helpful people from Wordpress sent me an email with some statistics I'd like to share.
This blog has seen 31 …
read moreDelicious shutting down?
Yesterday, I was quite surprised when I head rumors that Delicious, my favorite link sharing site, is shutting down. According to their blog, they are looking for a way to continue the service outside of Yahoo, but it's better to be safe than sorry and backup all bookmarks.
I was …
read moreInto the Future with IPv6
2011 may not be the year when IPv4 addresses finally run out, but the reserves are running low enough to warrant large IPv6 transition projects. Access providers and domain hosters will see the effects first because the growth of their businesses depends directly on the supply of new IP addresses …
read moreJava: Finding Package Cycles
JDepend is a tool for detecting cycles between your Java packages. It is often used from a Maven plugin to generate reports for the project's Maven site.In most teams, however, people only look at these reports from time to time. So when a cycle has been introduced, it takes …Switching Displays via Keyboard Shortcuts
When I'm at my desk I use an external LCD monitor with my netbook. I'm glad that switching displays finally works on Linux, but even on Ubuntu it takes a lot of clicks in the Monitors menu. Fortunately, I found a way to map this to keyboard shortcuts.
The following …
read moreSoftware Developers: You Need Computer Science Education!
Computer science and software development are two entirely different things. The former is a science, the latter is mostly craftsmanship, still struggling to become an engineering discipline in its own right. Being a good computer scientist doesn't make you a good software developer and vice versa, but as a software …
read moreA Maven Archetype for Hadoop Jobs
In my last article I showed how to build a Hadoop job that contains all its dependencies. To make things even easier, I created a Maven archetype that turns project setup into a simple 30 second process.
To generate a new project run the following command (on one line):
mvn …
read moreMaven: Building a Self-Contained Hadoop Job
Non-trivial Hadoop jobs usually have dependencies that go beyond those provided by the Hadoop runtime environment. That means, if your job needs additional libraries you have to make sure they are on Hadoop's classpath as soon as the job is executed. This article shows how you can build a self-contained …
read more