Saturday, March 27, 2010

Scopes: making message delivery in JGroups more concurrent

In JGroups, messages are delivered in the order in which they were sent by a given member. So when member X sends messages 1-3 to the cluster, then everyone will deliver them in the order X1 -> X2 -> X3 ('->' means 'followed by').

When a different member Y delivers messages 4-6, then they will get delivered in parallel to X's messages ('||' means 'parallel to'):
X1 -> X2 -> X3 || Y4 -> Y5 -> Y6

This is good, but what if X has 100 HTTP sessions and performs session replication ?

All modifications to the sessions are sent to the cluster, and will get delivered in the order in which they were performed.

The problem here is that even updates to different sessions will be ordered, e.g. if X updates sessions A, B and C, then we could end up with the following delivery order (X is omitted for brevity):
A1 -> A2 -> B1 -> A3 -> C1 -> C2 -> C3

This means that update 1 to session C has to wait until updates A1-3 and B1 have been processed; in other words, an update has to wait until all updates ahead of it in the queue have been processed !

This unnecessarily delays updates: since updates to A, B and C and unrelated, we could deliver them in parallel, e.g.:

A1 -> A2 -> A3 || B1 || C1 -> C2 -> C3

This means that all updates to A are delivered in order, but parallel to updates to B and updates to C.

How is this done ? Enter the SCOPE protocol.

SCOPE delivers messages  in the order in which they were sent within a given scope. Place it somewhere above NAKACK and UNICAST (or SEQUENCER).

To give a message a scope, simply use Message.setScope(short). The argument should be as unique as possible, to prevent collisions.

The use case described above is actually for real, and we anticipate using this feature in HTTP session replication / distribution in the JBoss application server !

More detailed documentation of  scopes can be found at [1]. Configuration of the SCOPE protocol is described in [2].

This is yet an experimental feature, so feedback is appreciated !

[1] Scopes
[2] The SCOPE protocol

Friday, March 05, 2010

Status report: performance of JGroups 2.10.0.Alpha2

I've already improved (mainly unicast) performance in Alpha1, a short list is:

  • BARRIER: moved lock acquired by every up-message out of the critical path
  • IPv6: just running a JGroups channel without any system props (e.g. now works, as IPv4 addresses are mapped to IP4-mapped IPv6 addresses under IPv6
  • NAKACK and UNICAST: streamlined marshalling of headers, drastically reducing the number of bytes streamed when marshalling headers
  • TCPGOSSIP: Vladimir fixed a bug in RouterStub which caused GossipRouters to return incorrect membership lists, resulting in JOIN failures
  • TP.Bundler:
    • Provided a new bundler implementation, which is faster than the default one (the new *is* actually the default in 2.10)
    • Sending of message lists (bundling): we don't ship the dest and src address for each message, but only ship them *once* for the entire list
  • AckReceiverWindow (used by UNICAST): I made this almost lock-free, so concurrent messages to the same recipient don't compete for the same lock. Should be a nice speedup for multiple unicasts to the same sender (e.g. OOB messages)
The complete list of features is at [1].

In 2.10.0.Alpha2 (that's actually the current CVS trunk), I replaced strings as header names with IDs [2]. This means that for each header, instead of marshalling "UNICAST" as a moniker for the UnicastHeader, we marshal a short.

The string (assuming a single-byte charset) uses up 9 bytes, whereas the short uses 2 bytes. We usually have 3-5 headers per message, so that's an average of 20-30 bytes saved per message. If we send 10 million messages, those saving accumulate !

Not only does this change make the marshalled message smaller, it also means that a message kept in memory has a smaller footprint: as messages are kept in memory until they're garbage collected by STABLE (or ack'ed by UNICAST), the savings are really nice...

The downside ? It's an API change for protocol implementers: methods getHeader(), putHeader() and putHeaderIfAbsent() in Message changed from taking a string to taking a short. Plus, if you implement headers, you have to register them in jg-magic-map.xml / jg-protocol-ids.xml and implement Streamable...

Now for some performance numbers. This is a quick and dirty benchmark, without many data points...

perf.Test (see [3] for details) has N senders send M messages of S size to all cluster nodes. This exercises the NAKACK code.

On my home cluster (4 blades with 4 cores each), 1GB ethernet, sending 1000-byte messages:
  • 4 senders, JGroups 2.9.0.GA:         128'000 messages / sec / member
  • 4 senders, JGroups 2.10.0.Alpha2: 137'000 messages / sec / member
  • 6 senders, JGroups 2.10.0.Alpha2: 100'000 messages / sec /member
  • 8 senders, JGroups 2.10.0.Alpha2:  78'000 messages / sec / member
2.10.0.Alpha2 is ca 7% faster for 4 members.

There is also a stress test for unicasts, UnicastTestRpcDist. It mimicks DIST mode of Infinispan and has every member invoke 20'000 requests on 2 members; 80% of those requests are GETs (simple RPCs) and 20% are PUTs (2 RPCs in parallel). All RPCs are synchronous, so the caller always waits for the result and thus blocks for the roud trip time. Every member has 25 threads invoking the RPCs concurrently.

On my home network, I got the following numbers:
  • 4 members, JGroups 2.9.0.GA:         4'500 requests / sec / member
  • 4 members, JGroups 2.10.0.Alpha2: 5'700 requests / sec / member
  • 6 members, JGroups 2.9.0.GA:         4'000 requests / sec / member
  • 6 members, JGroups 2.10.0.Alpha2: 5'000 requests / sec / member
  • 8 members, JGroups 2.9.0.GA:         3'800 requests / sec / member
  • 8 members, JGroups 2.10.0.Alpha2: 4'300 requests / sec / member

In our Atlanta lab (faster boxes), I got (unfortunately only for 2.10.0.Alpha2):

  • 4 members, JGroups 2.10.0.Alpha2: 10'900 requests / sec / member
  • 6 members, JGroups 2.10.0.Alpha2: 10'900 requests / sec / member
  • 8 members, JGroups 2.10.0.Alpha2: 10'900 requests / sec / member
Since the focus of the first half of 2.10.0 was on improving unicast performance, the numbers above are already pretty good and show (at least for up to 8 members) linear scalability.