Tuesday, December 06, 2011

Repondez s'il vous plait !

No, this isn't a post in French (my school French would be too rusty for this !); this is about a new protocol in JGroups, called RSVP :-)

As the name possibly suggests, this feature allows for messages to get ack'ed by receivers before a message send returns. In other words, when A broadcasts a message M to {A,B,C,D}, then JChannel.send() will only return once itself, B, C and D have acknowledged that they delivered M to the application.

This differs from the default behavior of JGroups which always sends messages asynchronously, and guarantees that all non-faulty members will eventually receive the message. If we tag a message as RSVP, then we basically have 2 properties:
  1. The message send will only return when we've received all acks from the current members. Members leaving or crashing during the wait are treated as if they sent an ack. The send() method can also throw a (runtime) TimeoutException if a timeout was defined (in RSVP) and encountered.
  2. If A sent (asynchronous) messages #1-10, and tagged #10 as RSVP, then - when send() returns successfully - A is guaranteed that all members received A's message #10 and all messages prior to #10, that's #1-9.
This can be used for example when completing a unit of work, and needing to know that all current cluster members received all of the messages sent up to now by a given cluster member.

This is similar to FLUSH, but less strict in that it is a per-sender flush, there is no reconciliation phase, and it doesn't stop the world.

An alternative is to use a blocking RPC. However, I wanted to add the capability of synchronous messages directly into the base channel.

Note that this also solves another problem: if A sends messages #1-5, but some members drop #5, and A doesn't send more messages for some time, then A#5 won't get delivered at some members for quite a while (until stability (STABLE) kicks in).

RSVP will be available in JGroups 3.1. If you want to try it out, get the code from master [2]. The documentation is at [1], section 3.8.8.2.

For questions, I suggest one of the mailing lists.
Cheers,

[1] http://www.jgroups.org/manual-3.x/html/user-channel.html#SendingMessages

[2] https://github.com/belaban/JGroups


Thursday, November 17, 2011

JGroups 3.0.0.Final released

I'm happy to announce that JGroups 3.0.0.Final is here !

While originally intended to make only API changes (some of them queued for years), there are also several optimizations, most of them related to running JGroups in larger clusters.

For instance, the size of several messages has been reduced, and some protocol rounds have been eliminated, making JGroups more memory efficient and less chatty.

For the last couple of weeks, I've been working on making merging of 100-300 cluster nodes faster and making sure a merge never blocks. To this end, I've written a unit test, which creates N singleton nodes (= nodes which only see themselves in the cluster), then make them see each other and wait until a cluster of N has formed.

The test itself was a real challenge because I was hitting the max heap size pretty soon. For example, with 300 members, I had to increase the heap size to at least 900 MB, to make the test complete. This indicates that a JGroups member needs roughly a max of 3MBs of heap. Of course, I had to use shared thread pools, timers and do a fair amount of (memory) tuning on some of the protocols, to accommodate 300 members all running in the same JVM.

Running in such a memory constrained environment led to some more optimizations, which will benefit users, even if they're not running 300 members inside the same JVM ! :-)

One of them is that UNICAST / UNICAST2 maintain a structure for every member they talk to. So if member A sends a unicast to each and every member of a cluster of 300, it'll have 300 connections open.

The change is to close connections that have been idle for a given (configurable) time, and re-establish them when needed.

Further optimizations will be made in 3.1.

The release notes for 3.0.0.Final are here: https://github.com/belaban/JGroups/blob/master/doc/ReleaseNotes-3.0.0.txt

JGroups 3.0.0.Final can be downloaded here: https://sourceforge.net/projects/javagroups/files/JGroups/3.0.0.Final

As usual, if you have questions, use one of the mailing lists for questions.

Enjoy !


Monday, September 12, 2011

Publish-subscribe with JGroups

I've added a new demo program (org.jgroups.demos.PubSub), which shows how to use JGroups channels to do publish-subscribe.

Pub-sub is a pattern where instances subscribe to topics and receive only messages posted to those topics. For example, in a stock feed application, an instance could subscribe to topics "rht", "aapl" and "msft". Stock quote publishers could post to these topics to update a quote, and subscribers would get notified of the updates.

The simplest way to do this in JGroups is for each instance to join a cluster; publishers send topic posts as multicasts, and subscribers discard messages for topics to which they haven't subscribed.

The problem with this is that a lot of multicasts will make it all they way up to the application, only to be discarded there if the topic doesn't match. This means that a message is received by the transport protocols (by all instances in the cluster), passed up through all the protocols, and then handed over to the application. If the application discards the message, then all the work of fragmenting, retransmitting, ordering, flow-controlling, de-fragmenting, uncompressing and so on is unnecessary, resulting in wasted CPU cycles, lock acquisitions, cache and memory accesses, context switching and bandwidth.

A solution to this could be to do topic filtering at the publisher's side: a publisher maintains a hashmap of subscribers and topics they've subscribed to and sends updates only to instances which have a current subscription.

This has two drawbacks though: first the publishers have additional work maintaining those subscriptions, and the subscribers need to multicast subscribe or unsubscribe requests. In addition, new publishers need to somehow get the current subscriptions from an existing cluster member (via state transfer).

Secondly, to send updates only to instances with a subscription, we'd have to resort to unicasts: if 10 instances of a 100 instance cluster are subscribed to "rht", an update message to "rht" would entail sending 10 unicast messages rather than 1 multicast message. This generates more traffic than needed, especially when the cluster size increases.

Another solution, and that's the one chosen by PubSub, is to send all updates as multicast messages, but discard them as soon as possible at the receivers when there isn't a match. Instead of having to traverse the entire JGroups stack, a message that doesn't match is discarded directly by the transport, which is the first protocol that receives a message.

This is done by using a shared transport and creating a separate channel for each subscription: whenever a new topic is subscribed to, PubSub creates a new channel and joins a cluster whose name is the topic name. This is not overly costly, as the transport protocol - which contains almost all the resources of a stack, such as the thread pools, timers and sockets -  is only created once.

The first channel to join a cluster will create the shared transport. Subsequent channels will only link to the existing shared transport, but won't initialize it. Using reference counting, the last channel to leave the cluster will de-allocate the resources used by the shared transport and destroy it.

Every channel on top of the same shared transport will join a different cluster, named after the topic. PubSub maintains a hashmap of topic names as keys and channels as values. A "subscribe rht" operation simply creates a new channel (if there isn't one for topic "rht" yet), adds a listener, joins cluster "rht" and adds the topic/channel pair to the hashmap. An "unsubscribe rht" grabs the channel for "rht", closes it and removes it from the hashmap.

When a publishes posts an update for "rht", it essentially sends a multicast to the "rht" cluster.

The important point is that, when an update for "rht" is received by a shared transport, JGroups tries to find the channel which joined cluster "rht" and passes the message up to that channel (through its protocol stack), or discards it if there isn't a channel which joined cluster "rht".

For example, if we have 3 channels A, B and C over the same shared transport TP, and A joined cluster "rht", B joined "aapl" and C joined "msft", then when a message for "ibm" arrives, it will be discarded by TP as there is no cluster "ibm" present. When a message for "rht" arrives, it will be passed up the stack for "rht" to channel A.

As a non-matching message will be discarded at the transport level, and not the application level, we save the costs of passing the message up the stack, through all the protocols and delivering it to the application.

Note that PubSub uses the properties of IP multicasting, so the stack used by it should have UDP as shared transport. If TCP is used, then there are no benefits to the approach outlined above.

Wednesday, September 07, 2011

Speaking at the OpenBlend conference on Sept 15

FYI,

I'll be speaking at the OpenBlend conference in Ljubljana on Sept 15.

My talk will be about how to persist data without using a disk, by spreading it over a grid with a customizable degree of redundancy. Kind of the NoSQL stuff everybody and their grandmothers are talking about these days...

I'm excited to visit Ljubljana, as I've never been there before and I like seeing new towns.

The other reason, of course, is to beat Ales Justin's a**s in tennis :-)

If you happen to be in town, come and join us ! I mean not for tennis, but for the conference, or for a beer in the evening !

Cheers,
Bela

Thursday, September 01, 2011

Optimizations for large clusters

I've been working on making JGroups more efficient on large clusters. 'Large' is between 100 and 2000 nodes.

My focus has been on making the memory footprint smaller, and to reduce the wire size of certain types of messages.


Here are some of the optimizations that I implemented.

Discovery

Discovery is needed by a new member to find the coordinator when joining. It broadcasts a discovery request, and everybody in the cluster replies with a discovery response.

There were 2 problems with this: first, a cluster of 1000 nodes meant that a new joiner received 1000 messages at the same time, possibly clogging up network queues and causing messages to get dropped.

This was solved by staggering the sending of responses (stagger_timeout).

The second problem was that every discovery response included the current view. In a cluster of 1000, this meant that 1000 responses each contained a view of 1000 members !

The solution to this was that we only send back the address of the coordinator; as this is all that's needed to send a JOIN request to it. So instead of sending back (with every discovery response) 1000 addresses, we now only send back 1 address.


Digest

A digest used to contain the lowest, highest delivered and highest received sequence numbers (seqnos) for every member. They are sent back to a new joiner in a JOIN response, and they are also broadcast periodically by STABLE to purge messages delivered by everyone.

The wire size would be 2 longs for every address (UUID), and 3 longs for the 3 seqnos. That's roughly 1000 * 5 * 8 = 40000 bytes for a cluster of 1000 members. Bear in mind that that's the size of one digest; in a cluster of 1000, everyone broadcasts such a digest periodically (STABLE) !

The first optimization was to remove the 'low' seqno; I had to change some code in the retransmitters to allow for that, but - hey - who wouldn't do that to save 8 bytes / STABLE message ? :-)

This reduced the wire (and memory !) size of a 1000-member digest by another 8'000 bytes, down to 32'000 (from 40'000).

Having only highest delivered (HD) and highest received (HR) seqnos allowed for another optimization: HR is always >= HD, and the difference between HR and HD is usually small.

So the next optimization was to send HR as a delta to HD. So instead of sending 322649 | 322650, we'd send 322649 | 1.

The central optimization underlying that was that seqnos seldomly need 8 bytes: a seqno starts at 1 and increases monotonically. If a member sends 5 million messages, the seqno can still be encoded in 4 bytes (saving 4 bytes per seqno). If a member is restarted, the seqno starts again at 1 and can thus be encoded in 1 byte.

So now I could encode an HD/HR pair by sending a byte containing the number of bytes needed for the HD part in the lower 4 bits and the number of bytes needed for the delta in the higher 4 bits. The HD and the delta would then follow. Example: to encode HD=2000000 | HR=2000500, we'd generate the bytes:

| 50 | -128 | -124 | 30 | -12 | 1 |

  • 50 encodes a length of 3 for HD and 2 for HD-HR (500)
  • -128, -124 and 30 encode 2'000'000 in 3 bytes
  • -12 and 1 encode the delta (500)

So instead of using 16 bytes for the above sequence, we use only 6 bytes !

If we assume that we can encode 2 seqnos on average in 6 bytes, the wire size of a digest is now 1000 * (16 (UUID) + 6) = 22'000, that's down from 40'000 in a 1000 member cluster. In other words, we're saving almost 50% of the wire size of a digest !

Of course, we can not only encode seqno sequences, but also other longs, which is exactly what we did for another optimization. Examples of where this makes sense are:
  • Seqnos in NakackHeaders: every multicast message has such a header, so the savings here are significant
  • Range: this is used for retransmission requests, and is also a seqno sequence
  • RequestCorrelator IDs: used for every RPC
  • Fragmentation IDs (FRAG and FRAG2)
  • UNICAST and UNICAST2: sqnos and ranges
  • ViewId
An example of where this doesn't make sense are UUIDs: they are generated such that the bits are spread out over the entire 8 bytes, so encoding them would make 9 bytes out of 8 and that doesn't help.


JoinRsp

A JoinRsp used to contain a list of members twice: once in the view and once in the digest. The was eliminated, and now we're sending the member list only once. This also cut the wire size of a JoinRsp in half.



Further optimizations planned for 3.1 include delta views and better compressed STABLE messages:



Delta views

If we have a view of 1000 members, we always send the full address list with every view change. This is not necessary, as everybody has access to the previous view.

So, for example, when we have P, Q and R joining, and X and Y leaving in V22, then we can simply send a delta view; a view V22={V21+P+Q+R-X-Y}. This means, take the current view V21, remove members X and Y, and add members P, Q and R to the tail of the list, in order to generate a new view V22.

So, instead of sending a list of 1000 members, we simply send 5 members, and everybody creates the new view locally, based on the current view and the delta information.


Compressed STABLE messages

A STABLE message contains a digest with a list of all members and then the digest seqnos for HD and HR. Since STABLE messages are exchanged between members of the same cluster, they all have the same view, or else they would drop a STABLE message.

Hence, we can drop the View and instead send the ViewId, which is 1 address and a long. Everyone knows that the digest seqnos will be in order of the current view, e.g. seqno pair 1 belongs to the first member of the current view, seqno pair 2 to the second member and so on.

So instead of sending a list of 1000 members for a STABLE message, we only send 1 address.

This will reduce the wire size of a 1000-member digest sent by STABLE from roughly 40'000 bytes to ca. 6'000 bytes !



Download 3.0.0.CR1

The optimizations (exluding delta views and compressed STABLE messages) are available in JGroups 3.0.0.CR1, which can be downloaded from [1].

Enjoy (and feedback appreciated, on the mailing lists...) !

[1] https://sourceforge.net/projects/javagroups/files/JGroups/3.0.0.CR1

Tuesday, July 26, 2011

It's time for a change: JGroups 3.0

I'm happy to anounce that I just released a first beta of JGroups 3.0 !

It's been a long time since I released version 2.0 (Feb 2002); over 11 years and 77 2.x releases !

We've pushed a lot of API changes into 3.x, in order to provide more features, bug fixes and optimizations in 2.x releases, which were always (API) backwards compatible to previous 2.x releases.

However, now it was time to take that step and make all the changes we've accumulated over the years.

The bad thing is that 3.x will require code changes if you port your 2.x app to it... however I anticipate that those changes will be trivial. Please ask questions regarding porting on the JGroups mailing list (or forums), and also post suggestions for improvements !

The good thing is that I was able to remove a lot of code (ca. 25'000 lines compared to 2.12.1) and simplify JGroups significantly.

Just one example: the getState(OutputStream) callback in 2.x didn't have an exception in its signature, so an implementation would typically look like this:

public void getState(OutputStream output) {
    try {
        marshalStateToStream(output);
    }
    catch(Exception ex) {
         log.error(ex);
    }
}

In 3.x, getState() is allowed to throw an exception, so the code looks like this now:

public void getState(OutputStream output) throws Exception {
    marshalStateToStream(output);
}

First of all, we don't need to catch (and swallow !) the exception. Secondly, a possible exception will now actually be passed to the state requester, so that we know *why* a state transfer failed when we call JChannel.getState().

There are many small (or bigger) changes like this, which I hope will make using JGroups simpler. A list of all API changes can be found at [2].

The stability of 3 beta1 is about the same as 2.12.1 (very high), because there were mainly API changes, and only a few bug fixes or optimizations.

I've also created a new 3.x specific set of documentation (manual, tutorial, javadocs), for example see the 3.x manual at [3].

JGroups 3 beta1 can be downloaded from [1]. Please try it out and send me your feedback (mailing lists preferred) !

Enjoy !



[1] https://sourceforge.net/projects/javagroups/files/JGroups/3.0.0.Beta1

[2] https://github.com/belaban/JGroups/blob/JGroups_3_0_16_Final/doc/API_Changes.txt

[3] http://www.jgroups.org/manual-3.x/html/index.html

Friday, April 29, 2011

Largest JGroups cluster ever: 536 nodes !

I just returned from a trip to a customer who's working on creating a large scale JGroups cluster. The largest cluster I've ever created is 32 nodes, due to the fact that I don't have access to a larger lab...

I've heard of a customer who's running a 420 node cluster, but I haven't seen it with my own eyes.

However, this record was surpassed on Thursday April 28 2011: we managed to run a 536 node cluster !

The setup was 130 celeron based blades with 1GB of memory, each running 4 JVMs with 96MB of heap, plus 4 embedded devices with 4 JVMs running on each. Each blade had 2 1GB NICs setup with IP Bonding. Note that the 4 processes are competing for CPU time and network IO, so with more blades or more physical memory available, I'm convinced we could go to 1000+ nodes !

The configuration used was udp-largecluster.xml (with some modifications), recently created and shipped with JGroups 2.12.

We started the processes in batches of 130, then waited for 20 seconds, then launched the second batch and so on. The reason we staggered the startup was to reduce the number of merges, which would have increased the startup time.

Running this a couple of times (plus 50+ times over night), the cluster always formed fine, and most of the time we didn't have any merges at all.

It took around 150-200 seconds (including the 5 sleeps of 20 seconds each) to start the cluster; in the picture at the bottom we see a run that took 176 seconds.

Changes to JGroups

This large scale setup revealed that certain protocols need slight modifications to optimally support large clusters, a few of these changes are:
  • Discovery: the current view is sent back with every discovery response. This is not normally an issue, but if you have a 500+ view, then the size of a discovery response becomes huge. We'll fix this by returning only the coordinator's address and not the view. For discovery requests triggered by MERGE2, we'll return the ViewId instead of the entire view.
  • We're thinking about canonicalizing UUIDs with IDs, so nodes will be assigned unique (short) IDs instead of UUIDs. This means reducing the size for having 17 bytes (UUID) in memory in favor of 2 bytes (short).
  • STABLE messages: here, we return an array of members plus a digest (containing 3 longs) for *each* member. This also generates large messages (11K for 260 nodes).
  • The fix in general for these problems is to reduce the data sent, e.g. by compressing the view, or not sending it at all, if possible. For digests, we can also reduce the data sent by sending only diffs, by sending only 1 long and using shorts for diffs, by using bitsets representing offsets to a previously sent value, and so on. 
Ideas are abundant, we now need to see which one is the most efficient.

For now, 536 nodes is an excellent number and - remember - we got to this number *without* the changes discussed above ! I'm convinced we can easily go higher, e.g. to 1000 nodes, without any changes. However, to reach 2000 nodes, the above changes will probably be required.

Anyway, I'm very happy to see this new record !

If anyone has created an even larger cluster, I'd be very interested in hearing about it !
Cheers, and happy clustering,



Friday, April 01, 2011

JBossWorld 2011 around the corner

Wanted to let you know that I've got 2 talks at JBW (Boston, May 3-6).

The first talk [1] is about geographic failover of JBoss clusters. I'll show 2 clusters, one in NYC, the other one in ZRH. Both are completely independent and don't know about each other. However, they're bridged with a JGroups RELAY and therefore appear as if they were one big virtual cluster.

This can be used for geographic failover, but it could also be used for example to extend a private cloud with an external, public cloud without having to use a hardware VPN device.

As always with my talks, this will be demo'ed, so you know this isn't just vapor ware !

The second talk [2] discusses 5 different ways of running a JBoss cluster on EC2. I'll show 2 demos, one of which works only on EC2, the other works on all clouds.

This will be a fun week, followed by a week of biking in the Bay Area ! YEAH !!

Hope to see and meet many of you in Boston !
Cheers,


[1] http://www.redhat.com/summit/sessions/best-of.html#66

[2] http://www.redhat.com/summit/sessions/jboss.html#43

Friday, March 11, 2011

A quick update on performance of JGroups 2.12.0.Final

I forgot to add performance data to the release announcement of 2.1.0.Final, so here it is.

Caveat: this is a quick check to see if we have a performance regression, which I run routinely before a release, and my no means a comprehensive performance test !

I ran this both on my home cluster and our internal lab.


org.jgroups.tests.perf.Test

This test is described in detail in [1]. It forms a cluster of 4 nodes, and every node sends 1 million messages of varying size (1K, 5K, 20K). We measure how long it takes for every node to receive the 4 million messages, and compute the message rate and throughput, per second, per node.

This is my home cluster and consists of 4 HP ProLiant DL380G5 quad core servers (ca 3700 bogomips), connected to a GB switch, and running Linux 2.6. The JDK is 1.6 and the heap size is 600M. I ran 1 process on every box. The configuration used was udp.xml (using IP multicasting) shipped with JGroups.

Results
  •   1K message size: 140 MBytes / sec / node
  •   5K message size: 153 MBytes / sec / node
  • 20K message size: 154 MBytes / sec / node
 This shows that GB ethernet is saturated. The reason that every node receives more than the limit of GB ethernet (~ 125 MBytes/sec) is that every node loops back its own traffic, and therefore doesn't have to share it with other incoming packets. In theory, the max throughput should therefore be 4/3 * 125 ~= 166 MBytes/sec. We see that the numbers above are not too far away from this.


org.jgroups.tests.UnicastTestRpcDist

This test mimicks the way Infinispan's DIST mode works.

Again, we form a cluster of between 1 and 9 nodes. Every node is on a separate machine. The test then has every node invoke 2 unicast RPCs in randomly selected nodes. With a chance of 80% the RPCs are reads, and with a chance of 20% they're writes. The writes carry a payload of 1K, and the reads return a payload of 1K. Every node makes 20'000 RPCs.

The hardware is a bit more powerful than my home cluster; every machine has 5300 bogomips, and all machines are connected with GB ethernet.

Results
  • 1 node:   50'000 requests / sec /node
  • 2 nodes: 23'000 requests / sec / node
  • 3 nodes: 20'000 requests / sec / node
  • 4 nodes: 20'000 requests / sec / node
  • 5 nodes: 20'000 requests / sec / node
  • 6 nodes: 20'000 requests / sec / node
  • 7 nodes: 20'000 requests / sec / node
  • 8 nodes: 20'000 requests / sec / node
  • 9 nodes: 20'000 requests / sec / node
As can be seen, the number of requests per node is the same after 2-3 nodes. The 1 node scenario is somewhat contrived as there is no network communication involved.

This is actually good news, as it shows that performance grows linearly. As a matter of fact, with increasing cluster size, the chances of more than 2 nodes picking the same target decreases, therefore performance degradation due to (write) access conflicts are likely to decrease.

Caveat: I haven't tested this on a larger cluster yet, but the current performance is already very promising.

[1] http://community.jboss.org/docs/DOC-11594

Wednesday, March 09, 2011

It took me 9 years to go from JGroups 2.0.0 to 2.12.0

Yes, you heard right: I released JGroups 2.0.0, new, shiny and refactored, in Feb 2002.

I just released JGroups 2.12.0.Final, which will be the last minor release on the 2.x branch. (There won't be a 2.13; bug fixes will go into 2.12.x).

Time difference: 9 years and change...:-)

I'm still investigating why it took me so long !

Anyway, 2.12.0.Final is here and it is an important release, as it will be shipped in Infinispan 4.2.1 and JBoss 6.


Below are the major features and bug fixes.

On to 3.0 !
Cheers,




Release Notes JGroups 2.12


JGroups 2.12 is API-backwards compatible with previous versions (down to 2.2.7).



New features



RELAY: connecting local (autonomous) clusters into a large virtual cluster


[https://issues.jboss.org/browse/JGRP-747]

A new protocol to connect 2 geographically separate sites into 1 large virtual cluster. The local clusters are
completely autonomous, but RELAY makes them appear as if they were one.

This can for example be used to implement geographic failover

Blog: http://belaban.blogspot.com/2010/11/clustering-between-different-sites.html



LockService: a new distributed locking service

[https://issues.jboss.org/browse/JGRP-1249]
[https://issues.jboss.org/browse/JGRP-1298]
[https://issues.jboss.org/browse/JGRP-1278]

New distributed lock service, offering a java.util.concurrent.lock.Lock implementation (including conditions)
providing cluster wide locks.

Blog: http://belaban.blogspot.com/2011/01/new-distributed-locking-service-in.html



Distributed ExecutorService

[https://issues.jboss.org/browse/JGRP-1300]

New implementation of java.util.concurrent.ExecutorService over JGroups (contributed by William Burns).
Read the documentation at www.jgroups.org for details.



BPING (Broadcast Ping): new discovery protocol based on broadcasting

[https://issues.jboss.org/browse/JGRP-1269]

This is mainly used for discovery of JGroups on Android based phones. Apparently, IP multicasting is not correctly implemented / supported on Android (2.1), and so we have to resort to UPD broadcasting.

Blog: http://belaban.blogspot.com/2011/01/jgroups-on-android-phones.html



JDBC_PING: new discovery protocol using a shared database


[https://issues.jboss.org/browse/JGRP-1231]

All nodes use a shared DB (e.g. RDS on EC2) to place their location information into, and to read information from.
Thanks to Sanne for coming up with the idea and for implementing this !
Additional infos are on the wiki: community.jboss.org/wiki/JDBCPING


FD_SOCK: ability to pick the bind address and port for the client socket

[https://issues.jboss.org/browse/JGRP-1262]



Pluggable address generation


[https://issues.jboss.org/browse/JGRP-1297]

Address generation is now pluggable; JChannel.setAddressGenerator(AddressGenerator) allows for generation of specific implementations of Address. This can for example be used to pass additional information along with every address. Currently used by RELAY to pass the name of the sub cluster around with a UUID.





Optimizations



NAKACK: retransmitted messages don't need to be wrapped


[https://issues.jboss.org/browse/JGRP-1266]

Not serializing retransmitted messages at the retransmitter and deserializing them at the requester saves
1 serialization and 1 deserialization per retransmitted message.


Faster NakReceiverWindow

[https://issues.jboss.org/browse/JGRP-1133]

Various optimizations to reduce locking in NakReceiverWindow:
  • Use of RetransmitTable (array-based matrix) rather than HashMap (reduced memory need, reduced locking, compaction)
  • Removal of double locking






Bug fixes



NAKACK: incorrect digest on merge and state transfer

[https://issues.jboss.org/browse/JGRP-1251]

When calling JChannel.getState() on a merge, the fetched state would overwrite the digest incorrectly.


AUTH: merge can bypass authorization

[https://issues.jboss.org/browse/JGRP-1255]

AUTH would not check creds of other members in case of a merge. This allowed an unauthorized node to join a cluster by triggering a merge.


Custom SocketFactory ignored

[https://issues.jboss.org/browse/JGRP-1276]

Despite setting a custom SocketFactory, it was ignored.


UFC: crash of depleted member could hang node

[https://issues.jboss.org/browse/JGRP-1274]

Causing it to wait forever for credits from the crashed member.


Flow control: crash of member doesn't unblock sender


[https://issues.jboss.org/browse/JGRP-1283]
[https://issues.jboss.org/browse/JGRP-1287]
[https://issues.jboss.org/browse/JGRP-1274]

When a sender block on P sending credits, and P crashes before being able to send credits,
the sender blocks indefinitely.


UNICAST2: incorrect delivery order under stress

[https://issues.jboss.org/browse/JGRP-1267]

UNICAST2 could (in rare cases) deliver messages in incorrect order. Fixed by using the same (proven)
algorithm as NAKACK.


Incorrect conversion of TimeUnit if MILLISECONDS were not used

[https://issues.jboss.org/browse/JGRP-1277]


Check if bind_addr is correct

[https://issues.jboss.org/browse/JGRP-1280]

JGroups now verifies that the bind address is indeed a valid IP address: it has to be either the wildcard
address (0.0.0.0) or an address of a network interface that is up.


ENCRYPT: sym_provider ignored

[https://issues.jboss.org/browse/JGRP-1279]

Property sym_provider is ignored



Manual


The manual is online at http://www.jgroups.org/manual/html/index.html



The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.

Download the new release at https://sourceforge.net/projects/javagroups/files/JGroups/2.12.0.Final.

Bela Ban, Kreuzlingen, Switzerland
Vladimir Blagojevic, Toronto, Canada
Richard Achmatowicz, Toronto, Canada
Sanne Grinovero, Newcastle, Great Britain

March 2011

Saturday, January 22, 2011

JGroups on Android phones

Yann Sionneau recently completed a port of JGroups to Android (2.1+). He took the 2.11 version of JGroups and removed classes which weren't available on Android, and changed some code to make JGroups run on Android.

The QR code for a demo app (based on Draw) is available at [1]. Point a QR code scanner to it, download the app and run it on your Android based phone (I ran it on my HTC Desire). Then start Draw on your local computer, connected to the same wifi network as the phone. The instances, whether run on the phone or computers, should find each other and form a cluster.

It was cool to draw some lines on my HTC and see them getting drawn on all cluster instances as well !

[1] http://sionneau.net/index.php?option=com_content&view=article&id=12%3Atouchsurface-android-app-now-pc-compatible-&catid=3%3Adivers&Itemid=2&lang=en

Friday, January 21, 2011

New distributed locking service in JGroups

I just uploaded JGroups 2.12.0.Beta1, which contains a first version of the new distributed locking service (LockService), which replaces DistributedLockManager.

LockService provides a distributed implementation of java.util.concurrent.lock.Lock. A lock is named and locking granularity is per thread. Here's an example of how to use it:

// lock.xml has to have a locking protocol in it
JChannel ch=new JChannel("/home/bela/lock.xml");
LockService lock_service=new LockService(ch);
Lock lock=lock_service.getLock("mylock");
if(lock.tryLock(2000, TimeUnit.MILLISECONDS)) {
    try {
        // access the resource protected by "mylock"
    }
    finally {
        lock.unlock();
    }
}

If "mylock" is locked by a different thread, it doesn't matter whether inside the same JVM, on the same box, or somewhere in the same cluster, then tryLock() will return false after 2 seconds, else it'll return true.

Lock.newCondition() is currently not implemented - if there's a need for this, let us know on one of the JGroups mailing lists and we'll tackle this. If you have a chance to play with LockService, we're also grateful for feedback.

The new locking service is part of 2.12.0.Beta1, which can be downloaded at [1]. Documentation is at [2].
Cheers,


[1] http://sourceforge.net/projects/javagroups/files/JGroups/2.12.0.Beta1
[2] http://www.jgroups.org/manual/html/index.html, section 4.6