Tuesday, November 30, 2010

Clustering between different sites / geopgraphic failover

I just completed a new feature in JGroups which allows for transparent bridging of separate clusters, e.g. at different sites.

Let's say we have a (local) cluster in New York (NYC) and another cluster in San Francisco (SFO). They're completely autonomous, and can even have completely different configurations.

RELAY [1] essentially has the coordinators of the local clusters relay local traffic to the remote cluster, and vice versa. The relaying (or bridging) is done via a separate cluster, usually based on TCP, as IP multicasting is typically not allowed between sites.

SFO could be a backup of NYC, or both could be active, or we could think of a follow-the-sun model where each cluster is active during working hours at its site.

If we have nodes {A,B,C} in NYC and {D,E,F} in SFO, then there would be a global view, e.g. {D,E,F,A,B,C}, which is the same across all the nodes of both clusters.

One use of RELAY could be to provide geographic failover in case of site failures. Because all of the data in NYC is also available in SFO, clients can simply fail over from NYC to SFO if the entire NYC site goes down, and continue to work.

Another use case is to have SFO act as a read-only copy of NYC, and run data analysis functions on SFO, without disturbing NYC, and with access to almost real-time data.

As you can guess, this feature is going to be used by Infinispan, and since Infinispan serves as the data replication / distribution layer in JBoss, we hope to be able to provide replication / distribution between sites in JBoss as well...

Exciting times ... stay tuned for more interesting news from the Infinispan team !

Read more on RELAY at [1] and provide feedback !
Cheers,


[1] http://www.jgroups.org/manual/html/user-advanced.html#RelayAdvanced

14 comments:

  1. Forgot to say, JGroups 2.12.0.Alpha1 can be downloaded from [1]

    [1] http://sourceforge.net/projects/javagroups/files/JGroups/2.12.0.Alpha1/jgroups-2.12.0.Alpha1.jar/download

    ReplyDelete
  2. Anonymous5:35 AM

    Hi Bela -

    Suppose we have a cluster of 4 nodes and 2 nodes are getting OS upgrade (RHEL5 in this case). All nodes are in the same subnet. When the 2 nodes are starting up after the OS upgrade - they are showing "Error installing to Start: name=HAPartition state=Create java.lang.IllegalStateException: Node xxxx could not flush the cluster for state retrieval" and so the deployments which are dependent on this HAPartition are in Error State. Can you point to where the problem is?

    Thanks
    Vishy

    ReplyDelete
  3. Anonymous8:30 AM

    hi ben,
    i was looking to create two clusters at 2 different sites.
    but i don't know how to use RELAY.
    can you please provide some examples about it.

    ReplyDelete
  4. There's documentation: http://www.jgroups.org/manual/html/user-advanced.html#RelayAdvanced

    ReplyDelete
  5. Anonymous10:16 AM

    can i have some examples for cluster communication

    ReplyDelete
  6. Anonymous11:33 AM

    i gone through your example code given in demos.
    i need to know what are the properties required to set run that demo(https://github.com/belaban/JGroups/blob/JGroups_2_12_0_Beta1/src/org/jgroups/demos/RelayDemo.java)

    ReplyDelete
  7. Let's discuss this on the mailing list, not my blog !

    ReplyDelete
  8. Anonymous12:15 PM

    i sent a mail using community.jboss.org....
    please respond to that mail as earliest as possible

    ReplyDelete
  9. Anonymous1:15 PM

    hi bela,

    i'm newbie in RELAY API in JGroup. trying to execute RELAYdemo of yours from past 4-5 days.

    i'm stuck in this.unable to run the demo successfully.

    please provide the solution to the earliest.

    thanks

    ReplyDelete
  10. Anonymous1:22 PM

    hi bela
    can you suggests the solution for this
    "WARNING: discarded message from different cluster "RELAY_1" (our cluster is "RELAY_2")."

    ReplyDelete
  11. awesome :)
    But with the relay I suppose you do not keep the total ordering ...

    ReplyDelete
  12. @Anonymous: I suggest subscribe to jg-users (https://sourceforge.net/mail/?group_id=6081) and post your questions there...

    @Yann: it depends; if the destination cluster has a config that doesn't define total order, then you won't be able to keep total ordering. Maybe this wasn't clear, but the 2 clusters that are bridged do not *need* to have the same config !
    If both clusters do have total order, then the following happens:
    - The coordinator (relay) in cluster-1 receives messages M1 and M2
    - It forwards M1 and M2 to cluster-2 via the bridge cluster (e.g. TCP-based)
    - If the bridge cluster defines total ordering, it'll forward M1 and M2 in that order. If not, it could also forward M2 and M1...
    - The coordinator (relay) of cluster-2 receives M1 and M2
    - The coordinator will deliver M1 and M2 (in total order) to cluster-2

    ReplyDelete
  13. My presentation "Geographic Failover" at JBossWorld 2011 is now online: http://www.vimeo.com/24825312

    ReplyDelete