Tag Archives: hadoop

Mirroring an Apt Repository

When setting up a Hadoop cluster using Debian packages, it’s often useful to work with a local mirror. In this article, I’ll walk you through creating an apt mirror for Cloudera’s Hadoop distribution. Advertisements

Posted in linux | Tagged , , ,

Maven Archetypes Updated!

Two years ago, I published a Maven archetype for Hadoop that turned out to be quite popular, judging from the comments I received and the access logs on my server. Today I’ve updated it to use the latest version of … Continue reading

Posted in java | Tagged , , , , | Leave a comment

A Maven Archetype for Hadoop Jobs

In my last article I showed how to build a Hadoop job that contains all its dependencies. To make things even easier, I created a Maven archetype that turns project setup into a simple 30 second process.

Posted in java | Tagged , , , , | 8 Comments

Maven: Building a Self-Contained Hadoop Job

Non-trivial Hadoop jobs usually have dependencies that go beyond those provided by the Hadoop runtime environment. That means, if your job needs additional libraries you have to make sure they are on Hadoop’s classpath as soon as the job is … Continue reading

Posted in java | Tagged , , , , | 9 Comments