AboutThis is a technology blog by Matthias Friedrich, a software developer and architect from Karlsruhe, Germany. more ...
Tagsandroid backup best practices books build systems c/c++ computer science databases deployment distributed systems django docker documentation google hadoop java java python libraries linux machine learning maven meta music musicbrainz networking opinion oracle process productivity python quality quick tips rcs scalability scheme scripting security server shell standards testing tools ubuntu web xml
Tag Archives: hadoop
When setting up a Hadoop cluster using Debian packages, it’s often useful to work with a local mirror. In this article, I’ll walk you through creating an apt mirror for Cloudera’s Hadoop distribution.
Two years ago, I published a Maven archetype for Hadoop that turned out to be quite popular, judging from the comments I received and the access logs on my server. Today I’ve updated it to use the latest version of … Continue reading
In my last article I showed how to build a Hadoop job that contains all its dependencies. To make things even easier, I created a Maven archetype that turns project setup into a simple 30 second process.
Non-trivial Hadoop jobs usually have dependencies that go beyond those provided by the Hadoop runtime environment. That means, if your job needs additional libraries you have to make sure they are on Hadoop’s classpath as soon as the job is … Continue reading