Monday, February 16, 2009

What's cool about logical addresses ?

Finally, logical addresses (https://jira.jboss.org/jira/browse/JGRP-129) will get implemented (in 2.8) !

For those of you who've used JGroups, you'll know that the identity of a node was always its IP address and the port on which the receiver thread was listening, e.g. 192.168.1.5:7800.

While this gives you a relatively compact and readable address (you can deduct from the address on which host it resides), there's also a problem: this type of address is not unique over space and time.

Let's look at an example.

Say we have a cluster of {A,B,C}. C's address is 192.168.1.5:7800. Let's assume A has sent 25 messages to C and C has multicast 104 messages. We're using sequence numbers (seqnos) to order messages, attached to a message via a header.

So the next message that C will multicast is #105 and the next message it expects from A is #26.

This is state that is maintained by the respective protocols in a JGroups stack.

Now let's assume C is killed and restarted. Or C is shunned, therefore leaves the channel and then automatically (if configured) reconnects. Let's also assume that the failure detection protocol has not yet kicked in and therefore A and B will not have received a view {A,B} which excludes C.

Now C rejoins the cluster. Because this is a reincarnation of C, it creates a new protocol stack, and all the state mentioned above is gone. The reincarnated C now sends #1 as next seqno and expects #1 from A as well.

There are 2 things that happen now:
  1. When C multicasts its next message with seqno #1, both A and B will drop it. A drops it because it expects C's next message to be #105, not #1. As a matter of fact A will drop the first 104 messages from C !
  2. A multicasts a message with seqno #26. However, C expects #1 from A and therefore buffers message #26. As a matter of fact, C will buffer all messages from A until it receives #1 which will not happen ! Consequence: C will run out of memory at some point. Even worse: C will prevent stability messages from purging messages seen by all cluster nodes, so in the worst case, all cluster nodes will run out of memory !
OK, while this is somewhat of an edge case and can be remedied by (a) waiting some time before restarting a node and/or (b) not pinning down ports, the fact is still that when this happens, it wreaks havoc.

So how are logical address going to change this ?

A logical address consists of
  • an org.jgroups.util.UUID (copied from java.util.UUID and relieved of some useless fields) and
  • a logical name
The logical name is given to a channel when the channel is created, e.g.
JChannel channel=new JChannel("node-4", "/home/bela/udp.xml");

This means that the channel's address will always get printed as "node-4". Under the cover, however, we use a UUID (for equals() and hashCode()), which is unique over space and time. The UUID is recreated on channel connect, so the above reincarnation issue will not happen.

The logical name is syntactic sugar, because if we have views consisting of UUIDs (16 bytes), that's not a pretty sight, so views like {"node-1", "node-2", "node-3", "node-4"} look much better.

Note that the user will be able to pick whether to see UUIDs or logical names.

Also, if null is passed as logical name, JGroups will create a logical name (e.g. using the host name and a counter).

A UUID will get mapped to one or more physical addresses. The mapping is maintained by the transport and there will be an ARP-like protocol (handled by Discovery) to fetch the initial mappings, and to fetch a mapping if not available.

The detailed design is described in http://javagroups.cvs.sourceforge.net/viewvc/javagroups/JGroups/doc/design/LogicalAddresses.txt?revision=1.12&view=markup.

So the most important aspect of logical addresses is that they decouple the identity of a JGroups node from its physical address.

This opens up interesting possibilities.

We might for example associate multiple physical address with a UUID, and load balance over the physical addresses. We could open multiple sockets, and associate each (receiver) socket's physical address with the UUID. We could even change this at runtime: e.g. if a NIC is down and we get exceptions on the socket, simply create another socket, remove the old association across the cluster (there's a call for this) and associate the new physical address with the UUID.

Another possibilty is to implement NATting, firewalling or STUNning this way !

I'll probably make the picking of a physical address given a UUID pluggable, so developers can even provide their own address translation in the transport !

This change is overdue and I'm happy that work has finally started on this. If you want to follow this, the branch is Branch_JGroups_2_8_LogicalAddresses.