A Quick Look at GridGain

This weekend I finally had the time to take a look at GridGain, a computational grid package written in and for Java. GridGain is a an open source product licensed under LGPL-2.1 (the same as JBoss) with minor portions under the Apache 2.0 license, so use in commercial products is possible. Since we're doing medium-sized number crunching at work, I definitely wanted to give it a try. The following article is no in-depth evaluation but rather a first impression after a few days of reading and experimenting with version 1.6.1.

Like I said, GridGain is a computational grid, which means it focuses on the computational part. Getting meta data (job input parameters etc.) to your workers is easily possible, of course, but transfer of large amounts of input data is entirely up the user. GridGain is no data grid, but it provides optional integration with Oracle Coherence, a commercial, closed source data grid and caching package.

The programming paradigm behind GridGain is Map/Reduce, a divide and conquer approach which has been made popular by Google. The underlying idea is a simple one: Split your problem into smaller parts and execute them in parallel on network nodes (the map part). When the nodes are done, aggregate the individual results to obtain the final result (reduce). If you're able to express your problem in terms of map and reduce operations, you can scale out to many nodes and solve large problems.

GridGain is a pretty young product, so there isn't much to be found about it on the web. That leaves the official documentation (and the source code) as the only sources of information. Fortunately, the documentation (Javadoc and the read-only wiki) is in a good state already and gets you up to speed quickly.

In retrospect, I spent too much time reading the wiki documentation though. It's excellent to get a basic grasp of GridGain's concepts and features, but there's no substitute for actually writing a simple application yourself. As soon as I did that, a lot of things suddenly cleared up and and I gained much more confidence. I quickly implemented a GridTask (containing the map and reduce functionality) and a GridJob (containing the actual processing logic) and distributed it to nodes on the local network. I even added an MBean for monitoring the entire task just because it was so simple to do. The comprehensive API documentation helped a lot in the process.

To make things easier for new users, I think it would help to restructure the tutorials a bit. In my opinion, the @Gridify annotation (a way to grid-enable a method) is a bit too magic for a beginner in his or her first two hours with GridGain, as it stands in the way of understanding the basics: The simple and elegant GridTask and GridJob abstractions. I'd also move the anonymous GridJob implementations out of the GridTask in the examples. That would make the code easier to read and understand.

From what I've seen, GridGain's source code is clean and well-written. The authors obviously have a great deal of design experience which results in a modular, flexible system. Like many applications using the Spring Framework, GridGain is made up of several services implementing service provider interfaces (SPIs). For example, finding grid nodes on a network is done using a discovery service, which implements the DiscoverySpi interface. The default is a multicast-based discovery (with lots of other implementations available), but users could create their own service easily. It is possible to configure many aspects of the provided services or use your own services to change policies. For the start, I didn't have to do that, however, since things generally worked out of the box.

Over the next few months it will be quite interesting to see if GridGain is able to build a community around their software. A great step towards this is the public online forum. Employees from GridGain Systems (the company behind GridGain) answer questions in a polite and helpful manner. The forum has a good search function, which proved to be very useful when I had questions.

Personally, I'm also interested in a public maven proxy - something many users would benefit from - and also a public source code repository. I know that companies are often hesitant to go this far, but it has proven to be a great asset to open source projects, since it allows advanced users to track the project's progress in detail. I'd also appreciate build instructions (from what I've seen, the build system isn't shipped in version 1.6.1), in case I'd have to do emergency bug fixes to GridGain. Experience shows that things like that are necessary in any software product from time to time.

To sum things up, my overall impression is a very positive one: The package is really a lot of fun to work with, as the website promised. I think everybody who considers doing serious number crunching on the Java platform should take the time to evaluate GridGain.

social