Using GridGain's Topology SPI

On a GridGain cluster, you sometimes want to execute your jobs on only a subset of the nodes available: those nodes meeting a given condition. Let's say some nodes run an expensive piece of thirdparty software that is (fortunately) only needed for a couple of tasks. At another time, the jobs should be executed on a different subset, or maybe on all nodes of the cluster.

There are several ways to do this: Filtering out nodes in your map() method would work, but you usually don't want that if you're using split() already. Or you could put the nodes on different multicast groups, thus filtering via DiscoverySpi. In that case it's more difficult to run a task's job on all nodes: You can no longer use the default discovery service provider; you'd have to pick a different one or even implement your own.

Luckily, there's the TopologySpi. It is used by the framework to filter the set of nodes returned by discovery. Any strategy for filtering can be implemented, providing a perfect hook for a criteria based filter. For example, the service provider shipped with GridGain, GridBasicTopologySpi, makes it possible to execute jobs only on the local node or only on remote nodes.

To solve the problem described above, I developed a simple group concept based on user attributes that can be assigned to each grid node in its configuration. A node may be part of one or more groups. This is added to the configuration file of the worker grid nodes like this:

<property name="userAttributes">
  <map>
    <entry key="grid.groups">
      <set>
        <value>foo</value>
        <value>bar</value>
      </set>
    </entry>
  </map>
</property>

A custom topology provider is used on the node where my tasks are deployed. It's derived from GridBasicTopologySpi, but you can also specify which groups a node has to be part of to qualify for the task. The provider is called GroupTopologySpi (the "i" is used consistently in GridGain, although it's counter-intuitive) and is activated in the master's configuration:

<property name="topologySpi">
  <bean class="de.mafr.grid.GroupTopologySpi">
    <property name="localNode" value="true"/>
    <property name="remoteNodes" value="true"/>
    <property name="requiredGroups">
      <set>
        <value>foo</value>
      </set>
    </property>
  </bean>
</property>

If the requiredGroups property isn't given or if it contains the empty set, the service provider works exactly the same as GridBasicTopologySpi, so it can be used as a drop-in replacement. In the example only grid nodes in the group foo will take part in the task.

The implementation is simple and consists of one short class and an interface: The actual topology provider, GroupTopologySpi, and the MBean interface.

social