Monday, April 07, 2014

Running JGroups on Google Compute Engine

I recently started looking into Google Compute Engine (GCE) [1], an IAAS service similar to Amazon's EC2. I had been looking into EC2 a few years ago and created S3_PING back then, to simplify running JGroups clusters on EC2.

As GCE seems to be taking off, I wanted to be able to provide a simple way to run JGroups clusters on GCE, too. Of course, this also means it's simple to run Infinispan/JDG and WildFly(JBoss EAP) clusters on GCE.

So the first step was creating a discovery protocol named GOOGLE_PING.

Turns out this was surprisingly easy; only 27 lines of code as I could more or less reuse S3_PING. Google provides an S3 compatibility mode which only requires a change of the cloud storage host name !

Next, I wanted to run a number of UPerf instances on GCE and measure performance.

UPerf mimics Infinispan's partial replication, in which every node picks a backup for its data: every node invokes 20'000 synchronous RPCs, 80% of those are READS and 20% WRITES. Random destinations are picked for every request and when all members are done, the times to invoke the 20'000 requests are collected from all members, averaged and printed to the console.

The nice thing about UPerf is that parameters (such as the payload of each request, the number of invoker threads, the number of RPCs etc) can be changed dynamically; this is done in the form of a CONFIG RPC. All members apply the changes and therefore don't need to be restarted. (New members acquire the entire configuration from the coordinator).

I made 2 videos showing how this is done. Part 1 [3] shows how to setup and configure JGroups to run on 2 node cluster, and part 2 [4] shows a 100 node cluster and measures performance of UPerf. I hope to add a part 3 later, when I have a quota to run 1000 cores...

The first part can be seen here:

The second part is here:

Enjoy !






  1. You often mention about EC2 packet size limitation. Is there a similar limitation on GCE?

  2. I don't know about UDP (with ip_mcast=false), but since this test was run under TCP, there is no such problem.