About
This is a technology blog by Matthias Friedrich, a software developer and architect from Karlsruhe, Germany. more ...-
Recent Posts
Tags
- android
- backup
- best practices
- books
- build systems
- c/c++
- computer science
- databases
- deployment
- distributed systems
- django
- docker
- documentation
- go
- hadoop
- java
- java python
- kubernetes
- libraries
- linux
- machine learning
- maven
- meta
- monitoring
- music
- musicbrainz
- networking
- opinion
- oracle
- process
- productivity
- prometheus
- python
- quality
- quick tips
- rcs
- scalability
- scheme
- scripting
- security
- server
- shell
- standards
- testing
- tools
- ubuntu
- web
- xml
Tag Archives: hadoop
Mirroring an Apt Repository
When setting up a Hadoop cluster using Debian packages, it’s often useful to work with a local mirror. In this article, I’ll walk you through creating an apt mirror for Cloudera’s Hadoop distribution.
Maven Archetypes Updated!
Two years ago, I published a Maven archetype for Hadoop that turned out to be quite popular, judging from the comments I received and the access logs on my server. Today I’ve updated it to use the latest version of … Continue reading
A Maven Archetype for Hadoop Jobs
In my last article I showed how to build a Hadoop job that contains all its dependencies. To make things even easier, I created a Maven archetype that turns project setup into a simple 30 second process.
Maven: Building a Self-Contained Hadoop Job
Non-trivial Hadoop jobs usually have dependencies that go beyond those provided by the Hadoop runtime environment. That means, if your job needs additional libraries you have to make sure they are on Hadoop’s classpath as soon as the job is … Continue reading