A Maven Archetype for Hadoop Jobs

In my last article I showed how to build a Hadoop job that contains all its dependencies. To make things even easier, I created a Maven archetype that turns project setup into a simple 30 second process.

To generate a new project run the following command (on one line):

mvn archetype:generate
   -DarchetypeCatalog=http://dev.mafr.de/repos/maven2/

Then follow the instructions on the screen: Pick the hadoop-job-basic archetype from the list and enter your project's coordinates (groupId, artifactId, etc.). If you use a different Hadoop version you can adjust the version number in the generated pom.xml. And that's it!

The Maven archetype:generate command above downloads my personal catalog of archetypes, which is just a simple XML file that I created manually.

Since my last time with the Maven Archetype plugin, things have really improved. They ditched the old descriptor format and introduced a new one that gives you almost complete control over the files that will be part of the generated project. More than that, creating the archetype was as simple as calling the archetype:create-from-project goal on an existing project.

I'm releasing the archetype under the Apache 2.0 license on Github, so feel free to use it as you see fit.

social