Finding duplicate files is easy, anyone can do it. Finding files that are almost identical is more difficult, but it's useful for use cases like detecting plagiarism. In this article, I'll present a simple python program that calculates the textual similarity of two documents.
Computer science and software development are two entirely different things. The former is a science, the latter is mostly craftsmanship, still struggling to become an engineering discipline in its own right. Being a good computer scientist doesn't make you a good software developer and vice versa, but as a software …
You can use RSS to easily follow a few high-profile websites and link sharing services like Slashdot or Digg to discover popular web content. But that's like reading a classic newspaper and some magazines: The information provided may have a higher chance of being relevant to you, but there's still …
Going through old CACM issues I discovered a paper (PDF) on stream processing. A common problem in this field is to find frequent items in a data stream when you only get one pass through the data and you need answers in real time. This is interesting in situations where …
I've been reading up on distributed systems again. For quite a while, my monthly copy of CACM has been my only connection to computer science topics. This time, I followed a few references and came across interesting concepts (most of them familiar from back in university).