About
This is a technology blog by Matthias Friedrich, a software developer and architect from Karlsruhe, Germany. more ...-
Recent Posts
Tags
- android
- backup
- best practices
- books
- build systems
- c/c++
- computer science
- databases
- deployment
- distributed systems
- django
- docker
- documentation
- go
- hadoop
- java
- java python
- kubernetes
- libraries
- linux
- machine learning
- maven
- meta
- monitoring
- music
- musicbrainz
- networking
- opinion
- oracle
- process
- productivity
- prometheus
- python
- quality
- quick tips
- rcs
- scalability
- scheme
- scripting
- security
- server
- shell
- standards
- testing
- tools
- ubuntu
- web
- xml
Tag Archives: computer science
Scikit-learn: Feature Extraction From Text
I’ve been playing with scikit-learn recently, a machine learning package for Python. While there’s great documentation on many topics, feature extraction isn’t one of them. My use case was to turn article tags (like I use them on my blog) … Continue reading
Basics of Near Duplicate Detection
Finding duplicate files is easy, anyone can do it. Finding files that are almost identical is more difficult, but it’s useful for use cases like detecting plagiarism. In this article, I’ll present a simple python program that calculates the textual … Continue reading
Software Developers: You Need Computer Science Education!
Computer science and software development are two entirely different things. The former is a science, the latter is mostly craftsmanship, still struggling to become an engineering discipline in its own right. Being a good computer scientist doesn’t make you a … Continue reading
Are Link-Sharing Services Irrelevant?
You can use RSS to easily follow a few high-profile websites and link sharing services like Slashdot or Digg to discover popular web content. But that’s like reading a classic newspaper and some magazines: The information provided may have a … Continue reading
Finding the Majority Item in a Stream
Going through old CACM issues I discovered a paper (PDF) on stream processing. A common problem in this field is to find frequent items in a data stream when you only get one pass through the data and you need … Continue reading
CAP, Consistent Hashing, etc.
I’ve been reading up on distributed systems again. For quite a while, my monthly copy of CACM has been my only connection to computer science topics. This time, I followed a few references and came across interesting concepts (most of … Continue reading