AboutThis is a technology blog by Matthias Friedrich, a software developer and architect from Karlsruhe, Germany. more ...
- best practices
- build systems
- computer science
- distributed systems
- java python
- machine learning
- quick tips
Tag Archives: hadoop
When setting up a Hadoop cluster using Debian packages, it’s often useful to work with a local mirror. In this article, I’ll walk you through creating an apt mirror for Cloudera’s Hadoop distribution.
Two years ago, I published a Maven archetype for Hadoop that turned out to be quite popular, judging from the comments I received and the access logs on my server. Today I’ve updated it to use the latest version of … Continue reading
In my last article I showed how to build a Hadoop job that contains all its dependencies. To make things even easier, I created a Maven archetype that turns project setup into a simple 30 second process.
Non-trivial Hadoop jobs usually have dependencies that go beyond those provided by the Hadoop runtime environment. That means, if your job needs additional libraries you have to make sure they are on Hadoop’s classpath as soon as the job is … Continue reading