Thursday, September 08, 2016

Removing thread pools in JGroups 4.0

JGroups 3.x has 4 thread pools:
  • Regular thread pool: used for regular messages (default has a queue)
  • OOB thread pool: used for OOB messages (no queue)
  • Internal thread pool: used for JGroups internal messages only. The main raison d'etre for this pool was that internal messages such as heartbeats or credits should never get queued up behind other messages, and get processed immediately.
  • Timer thread pool: all tasks in a timer need to be executed by a thread pool as they can potentially block
Over time, I found out that - with most configurations - I had the queue in the regular thread pool disabled, as I wanted the pool to dynamically create new threads (up to max-threads) when required, and terminate them again after some idle time.

Hence the idea to club regular and OOB thread pools into one.

When I further thought about this, I realized that incoming messages could also be handled by a queue-less thread pool: by handling RejectedExecutionException thrown when the pool is full and simply spawning a new thread to process the internal message, so it wouldn't get dropped.

The same goes for timer tasks: a timer task (e.g. a message retransmission task) cannot get dropped, or retransmission would cease. However, by using the same mechanism as for internal messages, namely spawning a new thread when the thread pool is full, this can be solved.

Therefore, in 4.0 there's only a single thread pool handling regular, OOB and internal messages, plus timer tasks.

The new thread pool has no queue, or else it would not throw a RejectedExecutionException when a task is added, but simply queue it, which is not what we want for internal messages or timer tasks.

It also has a default rejection policy of "abort" which cannot be changed (only by substituting the thread pool with a custom pool).

This dramatically reduces configuration complexity: from 4 to 1 pools, and the new thread pool only exposes min-threads, max-threads, idle-time and enabled as configuration options.

Here's an example of a 3.x configuration:

        timer_type="new3"
        timer.min_threads="2"
        timer.max_threads="4"
        timer.keep_alive_time="3000"
        timer.queue_max_size="500"

        thread_pool.enabled="true"
        thread_pool.min_threads="2"
        thread_pool.max_threads="8"
        thread_pool.keep_alive_time="5000"
        thread_pool.queue_enabled="true"
        thread_pool.queue_max_size="10000"
        thread_pool.rejection_policy="discard"

        internal_thread_pool.enabled="true"
        internal_thread_pool.min_threads="1"
        internal_thread_pool.max_threads="4"
        internal_thread_pool.keep_alive_time="5000"
        internal_thread_pool.queue_enabled="false"
        internal_thread_pool.rejection_policy="discard"

        oob_thread_pool.enabled="true"
        oob_thread_pool.min_threads="1"
        oob_thread_pool.max_threads="8"
        oob_thread_pool.keep_alive_time="5000"
        oob_thread_pool.queue_enabled="false"
        oob_thread_pool.queue_max_size="100"
        oob_thread_pool.rejection_policy="discard"


and here's the 4.0 configuration:

        thread_pool.enabled="true"
        thread_pool.min_threads="2"
        thread_pool.max_threads="8"
        thread_pool.keep_alive_time="5000"



Nice, isn't it?

Cheers,

Tuesday, August 23, 2016

JGroups 4.0.0.Alpha1 released

Howdy folks,

I just released a first alpha of JGroups 4.0.0 to SourceForge and maven. There are 12+ issues left, and I expect a final release in October.

The major version number was incremented because there are a lot of API changes and removal of deprecated classes/methods.

I'd be happy to get feedback on the API; there's still time to make a few API changes until the final release.

Major API changes



New features 

 

Deliver message batches

Ability to handle message batches in addition to individual messages: receive(MessageBatch) was added to ReceiverAdapter: https://issues.jboss.org/browse/JGRP-2003

Encryption and authentication

ENCRYPT was removed and replaced by SYM_ENCRYPT (keystore-based) and ASYM_ENCRYPT (asymmetric key exchange): https://issues.jboss.org/browse/JGRP-2021.

ENCRYPT had a couple of security flaws, and since the code hasn't been touched for almost a decade, I took this chance and completely refactored it into the 2 protocols above.

The code is now much more readable and maintainable.


Optimizations

 

Removal of Java serialization

JGroups marshalling falls back to Java serialization if a class is not primitive or implements Streamable. This was completely removed for JGroups-internal classes:  https://issues.jboss.org/browse/JGRP-2033

MessageBundler improvements

The performance of existing bundlers has been improved, and new bundlers (RingBufferBundler, NoBundler) have been added. Since the shared transport feature was removed, SingletonAddress used as key into a bundler's hashmap could also be removed.

For instance, TransferQueueBundler now removes chunks of message from the queue, rather than one-by-one: https://issues.jboss.org/browse/JGRP-2076.

I also added the ability to switch bundlers at runtime (via (e.g.) probe.sh op=TCP.bundler["rb"]): https://issues.jboss.org/browse/JGRP-2057 / https://issues.jboss.org/browse/JGRP-2058


Reduction of event creation

For every message sent up or down, a new Event (wrapping the message) had to be created. Adding 2 separate up(Message) and down(Message) callbacks removed that unneeded creation: https://issues.jboss.org/browse/JGRP-2067


Faster header marshalling and creation

Headers used to do a hashmap lookup to get their magic-id when getting marshalled. Now they carry the magic ID directly in the header. This saves a hashmap lookup per header (usually we have 3-4 headers per message): https://issues.jboss.org/browse/JGRP-2042.

Header creation was getting the class using a lookup with the magic ID, and then creating an instance. This was replaced by each header class implementing a create() method, resulting in header creation getting from 300ns down to 25ns.
https://issues.jboss.org/browse/JGRP-2043.


Bug fixes


MFC and UFC headers getting mixed up

This is a regression that led to premature exhaustion of credits, slowing things down:
https://issues.jboss.org/browse/JGRP-2072


Download


<groupId>org.jgroups</groupId>
<artifactId>jgroups</artifactId>
<version>4.0.0-Alpha1</version>


The manual has not been changed yet, so browse the source code or javadocs for details.

Enjoy! and feedback via the mailing list, please!

Cheers,

Bela Ban


Wednesday, February 24, 2016

JGroups 3.6.8 released

FYI, 3.6.8.Final has been released.

Not a big release; it contains mostly optimizations and a nice probe improvement. The main issues are listed below.
Enjoy!

Bela

New features
============

Probe improvements
------------------
[https://issues.jboss.org/browse/JGRP-2004]
[https://issues.jboss.org/browse/JGRP-2005]

- Proper discarding of messages from a different cluster with '-cluster' option.
- Less information per cluster member; only the requested information is returned
- Detailed information about RPCs (number of sync, async RPCs, plus timings)
  - http://www.jgroups.org/manual/index.html#_looking_at_details_of_rpcs_with_probe



Optimizations
=============

DONT_BUNDLE and OOB: messages are not removed from batch when execution fails
-----------------------------------------------------------------------------
[https://issues.jboss.org/browse/JGRP-2015]

- Messages are not removed from batch when execution fails
- Rejections are not counted to num_rejected_msgs


COMPRESS: removed 1 byte[] buff copy
------------------------------------
[https://issues.jboss.org/browse/JGRP-2017]

An unneeded copy of the compressed payload was created when sending and compressing a message. The fix should reduce
memory allocation pressure quite a bit.


RpcDispatcher: don't copy the first anycast
-------------------------------------------
[https://issues.jboss.org/browse/JGRP-2010]

When sending an anycast to 3 destinations, JGroups sends a copy of the original message to all 3. However, the first
doesn't need to be copied (less memory allocation pressure). For an anycast to a single destination, no copy is
needed, either.


Compaction of in-memory size
----------------------------
[https://issues.jboss.org/browse/JGRP-2011]
[https://issues.jboss.org/browse/JGRP-2012]

- Reduced size of Rsp (used in every RPC) from 32 -> 24 bytes
- Request/UnicastRequest/GroupRequest: reduced size


RequestCorrelator.done() is slow
--------------------------------
[https://issues.jboss.org/browse/JGRP-2013]

Used by RpcDispatcher. Fixed by eliminating the linear search done previously.



Bug fixes
=========

FILE_PING: consider special characters in file names
----------------------------------------------------
[https://issues.jboss.org/browse/JGRP-2014]

Names like "A/web-cluster" would fail on Windows as the slash char ('/') was treated as demarcation char in some clouds.



Manual
======

The manual is at http://www.jgroups.org/manual/index.html.

The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.


Bela Ban, Kreuzlingen, Switzerland
Feb 2016

Wednesday, January 20, 2016

Dump RPC stats with JGroups

When using remote procedure calls (RPCs) across a cluster using RpcDispatcher, it would be interesting to know how many RPCs of which type (unicast, multicast, anycast) are invoked by whom to whom.

I added this feature in 3.6.8-SNAPSHOT [1]. The documentation is here: [2].

As a summary, since this feature is costly, it has to be enabled with
probe.sh rpcs-enable-details (and disabled with rpcs-disable-details).

From now on, invocation times of synchronous (blocking) RPCs will be recorded (async RPCs will be ignored).

RPC stats can be dumped with probe.sh rpcs-details:
[belasmac] /Users/bela/JGroups$ probe.sh rpcs rpcs-details

-- sending probe on /224.0.75.75:7500
#1 (481 bytes):
local_addr=C [ip=127.0.0.1:55535, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=0
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67480
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189064
rpcs-details=
D: async: 0, sync: 130434, min/max/avg (ms): 0.13/924.88/2.613
A: async: 0, sync: 130243, min/max/avg (ms): 0.11/926.35/2.541
B: async: 0, sync: 63346, min/max/avg (ms): 0.14/73.94/2.221

#2 (547 bytes):
local_addr=A [ip=127.0.0.1:65387, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=5
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67528
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189200
rpcs-details=
<all>: async: 0, sync: 5, min/max/avg (ms): 2.11/9255.10/4917.072
C: async: 0, sync: 130387, min/max/avg (ms): 0.13/929.71/2.467
D: async: 0, sync: 63340, min/max/avg (ms): 0.13/63.74/2.469
B: async: 0, sync: 130529, min/max/avg (ms): 0.13/929.71/2.328

#3 (481 bytes):
local_addr=B [ip=127.0.0.1:51000, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=0
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67255
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189494
rpcs-details=
C: async: 0, sync: 130616, min/max/avg (ms): 0.13/863.93/2.494
A: async: 0, sync: 63210, min/max/avg (ms): 0.14/54.35/2.066
D: async: 0, sync: 130177, min/max/avg (ms): 0.13/863.93/2.569

#4 (482 bytes):
local_addr=D [ip=127.0.0.1:54944, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=0
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67293
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189353
rpcs-details=
C: async: 0, sync: 63172, min/max/avg (ms): 0.13/860.72/2.399
A: async: 0, sync: 130342, min/max/avg (ms): 0.13/862.22/2.338
B: async: 0, sync: 130424, min/max/avg (ms): 0.13/866.39/2.350

This shows the stats for each member in a given cluster, e.g. number of unicast, multicast and anycast RPCs, per target destination, plus min/max and average invocation times for sync RPCs per target as well.

Probe just become even more powerful! :-)
Enjoy!

[1] https://issues.jboss.org/browse/JGRP-2005
[2] Documentation: http://www.jgroups.org/manual/index.html#_looking_at_details_of_rpcs_with_probe

Monday, January 18, 2016

JGroups workshop in Munich April 4-8 2016


I'm happy to announce another JGroups workshop in Munich April 4-8 2016 !

The registration is now open at [2].

The agenda is at [3] and includes an overview of the basic API, building blocks, advanced topics and an in-depth look at the most frequently used protocols, plus some admin (debugging, tracing,diagnosis) stuff.

We'll be doing some hands-on demos, looking at code and I'm always trying to make the workshops as hands-on as possible.

I'll be teaching the workshop myself, and I'm looking forward to meeting some of you and having beers in Munich downtown! For attendee feedback on courses last year check out [1].

Note that the exact location in Munich has not yet been picked, I'll update the registration and send out an email to already registered attendees once this is the case (by the end of January the latest).

The course has a min limit of 5 and a max limit of 15 attendees.

I'm planning to do another course in Boston or New York in the fall of 2016, but plans have not yet finalized.

Cheers, and I hope to see many of you in Munich!
Bela Ban


[1] http://www.jgroups.org/workshops.html
[2] http://www.amiando.com/WorkshopMunich
[3] https://github.com/belaban/workshop/blob/master/slides/toc.adoc

Tuesday, January 12, 2016

JGroups 3.6.7.Final released

I'm happy to announce that 3.6.7.Final has been released!

This release contains a few bug fixes, but is mainly about optimizations reducing memory consumption and allocation rates.
The other optimization was in TCP_NIO2, which is now as fast as TCP. It is slated to become the successor to TCP, as it uses fewer threads and since it's built on NIO, should be much more scalable.

3.6.7.Final can be downloaded from SourceForge [1] or used via maven (groupId=org.jgroups / artifactId=jgroups, version=3.6.7.Final).

Below is a list of the major issues resolved.
Enjoy!


[1] https://sourceforge.net/projects/javagroups/files/JGroups/3.6.7.Final/


New features


Interoperability between TCP and TCP_NIO2


[https://issues.jboss.org/browse/JGRP-1952]
This allows nodes that have TCP as transport to talk to nodes that have TCP_NIO2 as transport, and vice versa.

Optimizations


Transport: reuse of receive buffers

[https://issues.jboss.org/browse/JGRP-1998]
On a message reception, the transport would create a new buffer in TCP and TCP_NIO2 (not in UDP), read the message into that buffer and then pass it to the one of thread pools, copying single messages (not batches).
This was changed to reusing the same buffers in UDP, TCP and TCP_NIO2, by reading the network data into one of those buffers, de-serializing the message (or message batch) and then passing it to one of the thread pools.
The effect is a much lower memory allocation rate.

Message bundling: reuse of send buffers

[https://issues.jboss.org/browse/JGRP-1989]
When sending messages, a new buffer would be created for marshalling for every message (or message bundle). This was changed to reuse the same buffer for all messages or message bundles.
The effect is a smaller memory allocation rate on the send path.

TCP_NIO2: copy on-demand when sending messages

[https://issues.jboss.org/browse/JGRP-1991]
If a message sent by TCP_NIO2 cannot be put entirely into the network buffer of the OS, then the remainder of that message is copied. This is needed to implement reusing of send buffers, see JGRP-1989 above.

TCP_NIO2: single selector slows down writes and reads

[https://issues.jboss.org/browse/JGRP-1999]
This transport used to have a single selector, processing both writes and reads in the same thread. Writes are not expensive, but reads can be, as de-serialization adds up.
We now have a reader thread for every NioConnection which processes reads (using work stealing) separate from the selector thread. When idle for some time, the reader thread terminates and a new thread is created on subsequent data available to be read.
UPerf (4 nodes) showed a perf increase from 15'000 msgs/sec/node to 24'000. TCP_NIO2's speed is now roughly the same as TCP.

Headers: collapse 2 arrays into 1

[https://issues.jboss.org/browse/JGRP-1990]
A Message had a Headers instance which had an array for header IDs and another one for the actual headers. These 2 arrays were collapsed into a single array and Headers is not a separate class anymore, but the array is managed directly inside Message.
This reduces the memory needed for a message by ca 22 bytes!

RpcDispatcher: removal of unneeded field in a request

[https://issues.jboss.org/browse/JGRP-2001]
The request-id was carried in both the Request (UnicastRequest or MulticastRequest) and the header, which is duplicate and a waste. Removed from Request and also removed rsp_expected from the header, total savings ca. 9 bytes per RPC.

Switched back from DatagramSocket to MulticastSocket for sending of IP multicasts

[https://issues.jboss.org/browse/JGRP-1970]
This caused some issues in MacOS based systems: when the routing table was not setup correctly, multicasting would not work (nodes wouldn't find each other).
Also, on Windows, IPv6 wouldn't work: https://github.com/belaban/JGroups/wiki/FAQ.

Make the default number of headers in a message configurable

[https://issues.jboss.org/browse/JGRP-1985]
The default was 3 (changed to 4 now) and if we had more headers, then the headers array needed to be resized (unneeded memory allocation).

Message bundling

[https://issues.jboss.org/browse/JGRP-1986]
When the threshold of the send queue was exceeded, the bundler thread would send messages one-by-one, leading to bad performance.

TransferQueueBundler: switch to array from linked list for queue

[https://issues.jboss.org/browse/JGRP-1987]
Less memory allocation overhead.

Bug fixes

SASL now handles merges correctly

[https://issues.jboss.org/browse/JGRP-1967]

FRAG2: message corruption when thread pools are disabled

[https://issues.jboss.org/browse/JGRP-1973]

Discovery leaks responses

[https://issues.jboss.org/browse/JGRP-1983]



Manual

The manual is at http://www.jgroups.org/manual/index.html.

The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.


Tuesday, November 03, 2015

Talk at Berlin JUG Nov 19

For those of you living in Berlin, mark your calendars: there's an event [1] held by the JUG Berlin-Brandenburg Nov 19 on
  • JGroups (yours truly)
  • New features of Infinispan 8 (Galder Zamarreno)
  • Infinispan (Tristan Tarrant) and
  • Wildfly clustering (Paul Ferraro)
Free food and beverages will be provided, and - because we're having our clustering team meeting the same week - most clustering devs will be present to mingle with after the talks... :-)

Hope to see many of you there !


[1] http://www.jug-berlin-brandenburg.de/

Wednesday, September 09, 2015

JGroups 3.6.6.Final released

I don't like releasing a week after I released 3.6.5, but the Infinispan team found 2 critical bugs in TCP_NIO2:
  • Messages would get corrupted as they were sent asynchronously and yet the buffer was reused and modified while the send was in transit (JGRP-1961)
  • TCP_NIO2 could start dropping messages because selection key registration was not thread safe: JGRP-1963
But bugs affect TCP_NIO2 only, and no other protocols.

So, there it is: 3.6.6.Final ! :-)

Enjoy (and find more bugs in TCP_NIO2) !

Thursday, September 03, 2015

JGroups 3.6.5 released

I'm happy to announce that 3.6.5 has been released !

One more patch release (3.6.6) is planned, and then I'll start working on 4.0 which will require Java 8. I'm looking forward to finally also being able to start using functional programming ! :-) (Note that I wrote my diploma thesis in Common Lisp back in the days...)

The major feature of 3.6.5 is certainly support for non-blocking TCP, based on NIO.2. While I don't usually add features to a patch release, I didn't want to create a 3.7.0, and I wanted users to be able to still use Java 7, and not require 8 in order to use the NIO stuff.

Here's a summary of the more important changes in 3.6.5:



TCP_NIO2: new non-blocking transport based on NIO.2

[https://issues.jboss.org/browse/JGRP-886]

This new transport is based on NIO.2 and non-blocking, ie. no reads or writes will ever block. The biggest advantage compared to TCP is that we moved from the 1-thread-per-connection model to the 1-selector-for-all-connections model.
This means that we use 1 thread for N connections in TCP_NIO2, while TCP used N threads.
To use this, new classes TcpClient / NioClient and TcpServer / NioServer have been created.
More details at http://belaban.blogspot.ch/2015/07/a-new-nio.html.


Fork channels now support state transfer

[https://issues.jboss.org/browse/JGRP-1941]

Fork channels used to throw an exception on calling ForkChannel.getState(). This is now supported; details in the JIRA issue.



GossipRouter has been reimplemented using NIO

[https://issues.jboss.org/browse/JGRP-1943]

GossipRouter can now use a blocking (TcpServer) or a non-blocking (NioServer) implementation. On the client side, RouterStub (TUNNEL and TCPGOSSIP) can do the same, using TcpClient or NioClient.
Which implementation is used is governed by the -nio flag when starting the router, or in the configuration of TUNNEL / TCPGOSSIP (use_nio).
Blocking clients can interact with a non-blocking GossipRouter, and vice versa.


Retransmissions use the INTERNAL flag

[https://issues.jboss.org/browse/JGRP-1940]

Retransmissions use the internal flag: when a retransmission is a request, a potential response was also flagged as internal. This flag is now cleared on reception of a request.


Lock.tryLock() can wait forever

[https://issues.jboss.org/browse/JGRP-1949]

Caused by a conversion from nanos to millis.



TCPPING: access initial_hosts in the defined order

[https://issues.jboss.org/browse/JGRP-1959]

Was not the case as we used a HashSet which reordered elements.

SWIFT_PING: support JSON

[https://issues.jboss.org/browse/JGRP-1954]

Request/response format has changed from application/xml to application/json in the Identity API.




The manual is at http://www.jgroups.org/manual/index.html.

The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.

Enjoy !

Bela Ban, Kreuzlingen, Switzerland, Sept 2015

Monday, July 27, 2015

A new NIO.2 based transport

I'm happy to announce a new transport based on NIO.2: TCP_NIO2 !

The new transport is completely non-blocking, so - contrary to TCP - never blocks on a socket connect, read or write.

The big advantage of TCP_NIO2 over TCP is that it doesn't need to create one reader thread per connection (and possibly a writer thread as well, if send queues are enabled).

With a cluster of 1000 nodes, in TCP every node would have 999 reader threads and 999 connections. While we still have 999 TCP connections open (max), in TCP_NIO2 we only have a single selector thread servicing all connections. When data is available to be read, we read as much data as we can without blocking, and then pass the read message(s) off to the regular or OOB thread pools for processing.

This makes TCP_NIO2 a more scalable and non-blocking alternative to TCP.

Performance

I ran the UPerf and MPerf tests [3] on a 9 node cluster (8-core boxes with ~5300 bogomips and 1 GB networking) and got the following results:

UPerf (500'000 requests/node, 50 invoker threads/node):
TCP: 62'858 reqs/sec/node, TCP_NIO2: 65'387 reqs/sec/node

MPerf (1 million messages/node, 50 sender threads/node):
TCP: 69'799 msgs/sec/node, TCP_NIO2: 77'126 msgs/sec/node

So TCP_NIO2 was better in both cases, which surprised me a bit as there have been reports claiming that the BIO approach was faster.

I therefore recommend run the tests in your own environment, with your own application, to get numbers that are meaningful in your system.

The documentation is here: [1].
Cheers,


[1] http://www.jgroups.org/manual/index.html#TCP_NIO2

[2] https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/TCP_NIO2.java

[3] http://www.jgroups.org/manual/index.html#PerformanceTests

Friday, May 15, 2015

Release of jgroups-raft 0.2

I'm happy to announce the first usable release of jgroups-raft [1] !

Compared to 0.1, which was a mere prototype, 0.2 has a lot more features and is a lot more robust. Besides fixing quite a few bugs and adding unit tests to prevent future regressions, I
  • switched to Java 8
  • implemented dynamic addition and removal of servers
  • wrote the manual, and
  • wrote a consensus based replicated counter
The full list is at [2]. For questions, feedback etc use the mailing list [3].
Cheers,


[1] http://belaban.github.io/jgroups-raft

[2] https://github.com/belaban/jgroups-raft/issues?q=milestone%3A0.2+is%3Aclosed

[3] https://groups.google.com/forum/#!forum/jgroups-raft


Wednesday, April 29, 2015

JGroups workshops in New York and Mountain View

I'm happy to announce that we're offering 2 JGroups trainings in the US: in New York and Mountain View in Sept 2015 !

The workshop will be interactive and is for medium to advanced developers. I'm teaching both workshops, so I should be able to answer all JGroups related questions ... :-)

An overview of what we'll be doing over the 4.5 days is here:
https://github.com/belaban/workshop/blob/master/slides/toc.adoc.

To get more info and to register visit http://www.jgroups.org/workshops.html.

Registration is now open. The class size is limited to 20 each.

Hope to see someof you at a workshop this year !






Tuesday, March 17, 2015

Everything you always wanted to know about JGroups (but were afraid to ask): JGroups workshop in Berlin

I'm happy to announce a JGroups workshop in Berlin June 1-5 2015 !

This is your chance to learn everything you always wanted to know about JGroups... and more :-)

This is the second in a series of 4 workshops I'll teach this year; 2 in Europe and 2 in the US (NYC and Mountain View, more on the US workshops to be announced here soon).

Rome is unfortunately already sold out, but Berlin's a nice place, too...

The workshop is 5 days and attendees will learn the following [1]:
  • Monday: API [introductory]
  • Tuesday: Building blocks (RPCs, distributed locks, counters etc) [medium]
  • Wednesday/Thursday: advanced topics and protocols [advanced]
  • Friday: admin stuff [medium]
I've written some nice labs and I'm trying to make this as interactive and hands-on as possible. Be aware though that the workshop (especially the middle part) is not for the faint of heart and complete JGroups newbies are not going to benefit as much as people who've already used JGroups...

The price is 1'500 EUR (early bird: 1'000 EUR). This gets you a week of total immersion into JGroups and beers in the evening with me (not sure this is a good thing though :-))...

Registration [2] is now open (15 tickets only because I want to have a max of 20 attendees - 5 already registered). There's an early bird registration rate (500 EUR off) valid until April 10. Use code JGRP2015 to get the early bird.

The recommended hotel is nhow Berlin [3]. Workshop attendees will get a special rate; check here again in a few days (end of March the latest) on how to book a room at a discounted rate.

Hope to see some of you in Berlin in June !
Cheers,

[1] https://github.com/belaban/workshop/blob/master/slides/toc.adoc

[2] http://www.amiando.com/JGroupsWorkshopBerlin

[3] http://www.nh-hotels.de/hotel/nhow-berlin
 
 


Thursday, January 15, 2015

JGroups workshop

I'm happy to announce that I'm putting the finishing touches to a JGroups workshop [1].

It consists of 4 modules with labs:
  1. Using JGroups: API (beginner level, 1 day)
  2. Using JGroups: building blocks (beginner level, 1 day)
  3. Advanced (medium to advanced level, 2 days)
  4. Admin (medium level, 1 day)
The modules can be mixed and matched, but I think that a public workshop will present them in this order. Beginners may wish to attend only the first 2 days, while others may want to skip the first 2 days and only attend the Advanced and Admin parts.

We're also thinking about offering a consulting package which includes selected modules and a few consulting days. Also, a combined JDG and JGroups workshop is being discussed. But this is all up for discussion at our Berlin meeting this February.

The first workshop will probably be a Red Hat internal one somewhere in EMEA.

As for public workshops, I'm shooting for 2 in Europe and 2 in the US (East and West coast) this year.

If you have suggestions regarding locations and dates, please send me an email (belaban at yahoo dot com).

Registration is not yet open, but if you want to pre-register, send me an email and you'll get a notification when it opens. I promise that you won't get any marketing emails, and I'll delete that list after sending that one email... :-)
Cheers,


[1] https://github.com/belaban/workshop/blob/master/slides/toc.adoc

Tuesday, January 13, 2015

RAFT consensus in JGroups

I'm happy to announce the first alpha release of jgroups-raft, which is an implementation of RAFT [2,3] in JGroups. The jgroups-raft project is currently a separate project on GitHub [1], but may be integrated into JGroups at a later stage.

The functionality includes leader election (section 5.2 in [3]), log replication (5.3), snapshotting and log compaction (7). Cluster membership changes (6) has not yet been implemented; the system currently requires a static membership.

The persistent log is implemented using LevelDB (MapDB support is not complete yet). Also, leader election based on the log commit status (and length) (5.4.1) has not been implemented.

The code quality is alpha at best, and the functionality hasn't been tested with unit tests. Use at your own risk.


So what can jgroups-raft currently be used for ?


Mainly to experiment with RAFT consensus in JGroups. The system comes with a demo of a replicated state machine (replicated hashmap) which can be used to update state in a fixed-size cluster with consensus. The majority (RAFT.majority) is 2, so nore more than 3 instances should be started.
Start the 3 instances like this:

bin/demo.sh -name B -follower
bin/demo.sh -name C -follower
bin/demo.sh -name A

The -follower flag is optional, but it skips leader election for a quick startup (and issues with the missing implementation of 5.4.1).

Note that the -name flag is used as both the logical name of a member and the name of the log. So, after starting the 3 instances, the temp directory will contain logs A.log, B.log and C.log (using LevelDB).

If we kill B and start it again as B, then B.log will be used again. If we start a member D, then this is considered a new member and a log D.log will be created.

Here's the output at C after adding a new entry foo=bar and printing the log:
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=3, commit-index=3, log size=55b

1
key: foo
value: bar
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=3, log size=70b

-- put(foo, bar) -> null
5

index (term): command
---------------------
1 (1): put(name, Bela)
2 (1): put(id, 322649)
3 (1): put(name, Bela Ban)
4 (7): put(foo, bar)

[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=4, log size=70b

4
{foo=bar, name=Bela Ban, id=322649}
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=4, log size=70b


We can see that the state consists of 3 entries and the log has 4 elements (name was changed twice).

When a node is killed and restarted, the state machine is reinitialized from the log:

[mac] /Users/bela/jgroups-raft$ bin/demo.sh -name C -follower
LOG is existent, must not be initialized
777 [DEBUG] RAFT: set last_applied=4, commit_index=4, current_term=7
778 [DEBUG] RAFT: snapshot /tmp/C.snapshot not found, initializing state machine from persistent log
781 [DEBUG] RAFT: applied 3 log entries (2 - 4) to the state machine

-------------------------------------------------------------------
GMS: address=C, cluster=rsm, physical address=192.168.1.3:54886
-------------------------------------------------------------------
-- view change: [B|4] (3) [B, A, C]
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=4, log size=70b

4
{foo=bar, name=Bela Ban, id=322649}


We can see that the state machine was initialized from the persistent log.

If a member is down for a considerable amount of time, and then started again, it may be out of sync, and - if a snapshot was taken at the leader - the first log entry of the leader might be higher than the last commited log entry at the member. In this case, the leader will transfer its snapshot to the restarted member first, and then the usual algorithm is used to bring the restarted member up to date.


What's next ?

We're currently experimenting with an implementation of etcd [5] over jgroups-raft. Also, we're looking into how to use RAFT consensus in Infinispan [6].

I'm currently putting the finishing touches on a JGroups workshop (more on this soon), and will return to work on jgroups-raft after that. The next work items include
  • unit tests and code reviews
  • leader election comparing logs (5.4.1)
  • alternative ELECTION protocol using the JGroups built-in features (reduces code)
  • cluster membership changes
  • consistent reads; reads are currently dirty (section 8 has not yet been implemented)


Please use the mailing list [4] for feedback, questions and discussions.

Cheers,
Bela

[1] https://github.com/belaban/jgroups-raft
[2] http://raftconsensus.github.io/
[3] http://ramcloud.stanford.edu/raft.pdf
[4] https://groups.google.com/forum/#!forum/jgroups-raft
[5] https://github.com/redhat-italy/jgroups-etcd
[6] http://www.infinispan.org



Tuesday, October 21, 2014

JGroups 3.6.0.Final released

I just released 3.6.0.Final to SourceForge [1] and Nexus. It contains a few new features, but mostly optimizations and a few bug fixes. It is a small release before starting work on the big 4.0.

A quick look over what was changed:


New features


CENTRAL_LOCK: lock owners can be node rather than individual threads

[https://issues.jboss.org/browse/JGRP-1886]

Added an option to make the node the lock owner rather than the combo node:thread. This was needed by the JGroups clustering plugin for vert.x.


RSVP: option to not block caller

[https://issues.jboss.org/browse/JGRP-1851]

This way, a caller knows that its message will get resent, but it doesn't have to block. Also added code to skip RSVP if the message was unicast and UNICAST3 was used as protocol (only used for multicast messages).


Docker image for JGroups

[https://issues.jboss.org/browse/JGRP-1840]

mcast: new multicast diagnosis tool

[https://issues.jboss.org/browse/JGRP-1832]




Optimizations


UNICAST3 / NAKACK2: limit the max size of retransmission requests

[https://issues.jboss.org/browse/JGRP-1868]

When we have a retransmission request in UNICAST3 or NAKACK2 for a large number of messages, then the size of the retransmit message may become larger than what the transport can handle, and is therefore dropped. This leads to endless retransmissions.

The fix here is to retransmit only the oldest N messages such that the retransmit message doesn't exceed the max bundling size of the transport and to use a bitmap to identify missing messages to be retransmitted.

Also, the bitmaps used by SeqnoList reduce the size of a retransmission message.


Channel creation has unneeded synchronization

[https://issues.jboss.org/browse/JGRP-1887]

Slowing down parallel creation of many channels; this was removed.


UNICAST3: sync receiver table with sender table

[https://issues.jboss.org/browse/JGRP-1875]

In some edge cases, a receiver would not sync correctly with a sender.



Bug fixes





JMX: unregistering a channel which didn't register any protocols issues warnings

[https://issues.jboss.org/browse/JGRP-1894]


UDP: ip_ttl was ignored and is always 1

[https://issues.jboss.org/browse/JGRP-1880]


MERGE3: in some cases, the information about subgroups is incorrect

[https://issues.jboss.org/browse/JGRP-1876]

The actual MergeView was always correct, but the information about subgroups wasn't.


RELAY2 doesn't work with FORK

[https://issues.jboss.org/browse/JGRP-1875]


Enjoy !


[1] http://sourceforge.net/projects/javagroups/files/JGroups/3.6.0.Final/

Monday, October 06, 2014

JGroups and Docker

I've uploaded an image with JGroups and a few demos to DockerHub [2].

The image is based on Fedora 20 and contains JGroups, plus a few scripts which run a chat demo, a distributed lock demo and a distributed counter demo.

To run this, it's as simple as executing

docker run -it belaban/jgroups

This will drop you into a bash shell and you're ready to run any of the three demos.

Start multiple containers and you have a cluster in which you can try out things, e.g. a cluster node acquiring a lock, another node trying to acquire it and blocking, the first node crashing and the second node finally acquiring the lock etc.

Note that this currently runs a cluster on one physical box. I'll still need to investigate what needs to be done to run Docker containers on different hosts and cluster them [3].

The Dockerfile is at [1] and can be used to derive another image from it, or to build the image locally.

[1] https://github.com/belaban/jgroups-docker

[2] https://registry.hub.docker.com/u/belaban/jgroups/

[3] https://issues.jboss.org/browse/JGRP-1840

Friday, August 29, 2014

JGroups 3.5.0.Final released

I'm happy to announce that JGroups 3.5.0.Final has been released !

It's quite a big release with 137 issues resolved.

Downloads at the usual place (SourceForge) or via Maven:

<dependency>
    <groupId>org.jgroups</groupId>
    <artifactId>jgroups</artifactId>
    <version>3.5.0.Final</version>
</dependency>

 

The release notes are here.

Enjoy !

Bela Ban




 











Wednesday, July 30, 2014

New record for a large JDG cluster: 500 nodes

Today I ran a 500 node JDG (JBoss Data Grid) cluster on Google Compute Engine, topping the previous record of 256 nodes. The test used is IspnPerfTest [2] and it does the following:
  • The data grid has keys [1..20000], values are 1KB byte buffers
  • The cache is DIST-SYNC with 2 owners for every key
  • Every node invokes 20'000 requests on random keys
  • A request is a read (with an 80% chance) or a write (with a 20% chance)
    • A read returns a 1KB buffer
    • A write takes a 1KB buffer and sends it to the primary node and the backup node
  • The test invoker collects the results from all 500 nodes and computes an average which it prints to stdout.
But see for yourself; the YouTube video is here: [1].

In terms of cluster size, I could probably have gone higher: it was pretty effortless to go to 500 instances... any takers ? :-)


[1] http://youtu.be/Cz3CDr31EDU
[2] https://github.com/belaban/IspnPerfTest

Wednesday, July 23, 2014

Running a JGroups cluster in Google Compute Engine: part 3 (1000 nodes)

A quick update on my GCE perf tests:

I successfully ran a 1000 node UPerf cluster on Google Compute Engine, the YouTube video is at http://youtu.be/Imn1M7EUTGE. The perf tests are started at around 14'13".

I also managed to get 2286 nodes running, however didn't record it... :-)

Monday, April 07, 2014

Running JGroups on Google Compute Engine


I recently started looking into Google Compute Engine (GCE) [1], an IAAS service similar to Amazon's EC2. I had been looking into EC2 a few years ago and created S3_PING back then, to simplify running JGroups clusters on EC2.

As GCE seems to be taking off, I wanted to be able to provide a simple way to run JGroups clusters on GCE, too. Of course, this also means it's simple to run Infinispan/JDG and WildFly(JBoss EAP) clusters on GCE.

So the first step was creating a discovery protocol named GOOGLE_PING.

Turns out this was surprisingly easy; only 27 lines of code as I could more or less reuse S3_PING. Google provides an S3 compatibility mode which only requires a change of the cloud storage host name !

Next, I wanted to run a number of UPerf instances on GCE and measure performance.

UPerf mimics Infinispan's partial replication, in which every node picks a backup for its data: every node invokes 20'000 synchronous RPCs, 80% of those are READS and 20% WRITES. Random destinations are picked for every request and when all members are done, the times to invoke the 20'000 requests are collected from all members, averaged and printed to the console.

The nice thing about UPerf is that parameters (such as the payload of each request, the number of invoker threads, the number of RPCs etc) can be changed dynamically; this is done in the form of a CONFIG RPC. All members apply the changes and therefore don't need to be restarted. (New members acquire the entire configuration from the coordinator).

I made 2 videos showing how this is done. Part 1 [3] shows how to setup and configure JGroups to run on 2 node cluster, and part 2 [4] shows a 100 node cluster and measures performance of UPerf. I hope to add a part 3 later, when I have a quota to run 1000 cores...

The first part can be seen here:

The second part is here:




Enjoy !

[1] https://cloud.google.com/products/compute-engine/

[2] https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/GOOGLE_PING.java

[3] http://youtu.be/xq7JxeIQTrU

[4]  http://youtu.be/fokCUvB6UNM

Friday, January 24, 2014

JGroups: status and outlook

For our clustering team meeting in Palma, I wrote a presentation on status and future of JGroups. As there isn't anything confidential in it, I thought I might as well share it.
Feedback to the users mailing list, please.
Cheers,

https://github.com/belaban/workshop/blob/master/slides/status2014.adoc

Sunday, October 06, 2013

JGroups 3.4.0.Final released

I'm happy to announce that JGroups 3.4.0.Final has been released !

The major features, bug fixes and optimizations are:
  • View creation and coordinator selection is now pluggable
  • Cross site replication can have more than 1 site master
  • Fork channels: light weight channels
  • Reduction of memory and wire size for views, merge views, digests and various headers
  • Various optimizations for large clusters: the largest JGroups cluster is now at 1'538 nodes !
  • The license is now Apache License 2.0
3.4.0.Final can be downloaded from SourceForge [1] or Maven (central). The complete list of issues is at [2]. Below is a summary of the changes.
Enjoy !

[1] https://sourceforge.net/projects/javagroups/files/JGroups/3.4.0.Final
[2] https://issues.jboss.org/browse/JGRP/fixforversion/12320869


Changes:
****************************************************************************************************
Note that the license was changed from LGPL 2.1 to AL 2.0: https://issues.jboss.org/browse/JGRP-1643
****************************************************************************************************


New features
============


Pluggable policy for picking coordinator
----------------------------------------
[https://issues.jboss.org/browse/JGRP-592]

View and merge-view creation is now pluggable; this means that an application can determine which member is
the coordinator.
Documentation: http://www.jgroups.org/manual/html/user-advanced.html#MembershipChangePolicy.


RELAY2: allow for more than one site master
-------------------------------------------
[https://issues.jboss.org/browse/JGRP-1649]

If we have a lot of traffic between sites, having more than 1 site master increases performance and reduces stress
on the single site master


Fork channels: private light-weight channels
--------------------------------------------
[https://issues.jboss.org/browse/JGRP-1613]

This allows multiple light-weight channels to be created over the same (base) channel. The fork channels are
private to the app which creates them and the app can also add protocols over the default stack. These protocols are
also private to the app.

Doc: http://www.jgroups.org/manual/html/user-advanced.html#ForkChannel
Blog: http://belaban.blogspot.ch/2013/08/how-to-hijack-jgroups-channel-inside.html



Kerberos based authentication
-----------------------------
[https://issues.jboss.org/browse/JGRP-1657]

New AUTH plugin contributed by Martin Swales. Experimental, needs more work


Probe now works with TCP too
----------------------------
[https://issues.jboss.org/browse/JGRP-1568]

If multicasting is not enabled, probe.sh can be started as follows:
probe.sh -addr 192.168.1.5 -port 12345
, where 192.168.1.5:12345 is the physical address:port of a node.
Probe will ask that node for the addresses of all other members and then send the request to all members.



Optimizations
=============

UNICAST3: ack messages sooner
-----------------------------
[https://issues.jboss.org/browse/JGRP-1664]

A message would get acked after delivery, not reception. This was changed, so that long running app code would not
delay acking the message, which could lead to unneeded retransmission by the sender.


Compress Digest and MutableDigest
---------------------------------
[https://issues.jboss.org/browse/JGRP-1317]
[https://issues.jboss.org/browse/JGRP-1354]
[https://issues.jboss.org/browse/JGRP-1391]
[https://issues.jboss.org/browse/JGRP-1690]

- In some cases, when a digest and a view are the same, the members
  field of the digest points to the members field of the view,
  resulting in reduced memory use.
- When a view and digest are the same, we marshal the members only
  once (in JOIN-RSP, MERGE-RSP and INSTALL-MERGE-VIEW).
- We don't send the digest with a VIEW to *existing members*; the full
  view and digest is only sent to the joiner. This means that new
  views are smaller, which is useful in large clusters.
- JIRA:  https://issues.jboss.org/browse/JGRP-1317
- View and MergeView now use arrays rather than lists to store
  membership and subgroups
- Make sure digest matches view when returning JOIN-RSP or installing
  MergeView (https://issues.jboss.org/browse/JGRP-1690)
- More efficient marshalling of GMS$GmsHeader: when view and digest
  are present, we only marshal the members once


Large clusters:
---------------
- https://issues.jboss.org/browse/JGRP-1700: STABLE uses a bitset rather than a list for STABLE msgs, reducing
  memory consumption
- https://issues.jboss.org/browse/JGRP-1704: don't print the full list of members
- https://issues.jboss.org/browse/JGRP-1705: suppression of fake merge-views
- https://issues.jboss.org/browse/JGRP-1710: move contents of GMS headers into message body (otherwise packet at
  transport gets too big)
- https://issues.jboss.org/browse/JGRP-1713: ditto for VIRE-RSP in MERGE3
- https://issues.jboss.org/browse/JGRP-1714: move large data in headers to message body



Bug fixes
=========

FRAG/FRAG2: incorrect ordering with message batches
[https://issues.jboss.org/browse/JGRP-1648]

Reassembled messages would get reinserted into the batch at the end instead of at their original position


RSVP: incorrect ordering with message batches
---------------------------------------------
[https://issues.jboss.org/browse/JGRP-1641]

RSVP-tagged messages in a batch would get delivered immediately, instead of waiting for their turn


Memory leak in STABLE
---------------------
[https://issues.jboss.org/browse/JGRP-1638]

Occurred when send_stable_msg_to_coord_only was enabled.


NAKACK2/UNICAST3: problems with flow control
--------------------------------------------
[https://issues.jboss.org/browse/JGRP-1665]

If an incoming message sent out other messages before returning, it could block forever as new credits would not be
processed. Caused by a regression (removal of ignore_sync_thread) in FlowControl.



AUTH: nodes without AUTH protocol can join cluster
--------------------------------------------------
[https://issues.jboss.org/browse/JGRP-1661]

If a member didn't ship an AuthHeader, the message would get passed up instead of rejected.


LockService issues
------------------
[https://issues.jboss.org/browse/JGRP-1634]
[https://issues.jboss.org/browse/JGRP-1639]
[https://issues.jboss.org/browse/JGRP-1660]

Bug fix for concurrent tryLock() blocks and various optimizations.


Logical name cache is cleared, affecting other channels
-------------------------------------------------------
[https://issues.jboss.org/browse/JGRP-1654]

If we have multiple channels in the same JVM, the a new view in one channel triggers removal of the entries
of all other caches




Manual
======

The manual is at http://www.jgroups.org/manual-3.x/html/index.html.

Monday, September 30, 2013

New record for a large JGroups cluster: 1538 nodes

The largest JGroups cluster so far has been 536 nodes [1], but since last week we have a new record: 1538 nodes !

The specs for this cluster are:
  • 4 X Intel Xeon CPU E3-1220 V2 @ 3.10GHz 4GB: 8 boxes of 13 members = 104
  • 1 X Intel Celeron M CPU 440 @ 1.86GHz 1GB: 8 boxes of 3 members = 24
  • 1 X Intel Celeron M CPU 440 @ 1.86GHz 2GB: 141 boxes of 10 members =1410
    • Total = 1538 members
  • JGroups 3.4.0.Beta2 (custom build with JGRP-1705 [2] included)
  • 120 MB of heap for each node
The time to form a 1538 node cluster was around 115 seconds and it took a total 11 views (due to view bundling being enabled).

Note that the physical memory listed for the hardware above is shared by all the nodes on that box, e.g. in the last HW config of 141 boxes, 10 nodes share the 2GB of physical memory; each member has 120 MB max of heap available.

The picture below shows the management GUI. The last line shows member number 1538, which took roughly 11 seconds to join (due to view bundling) and the total time for the cluster to form was ca. 115 seconds.




[1] http://belaban.blogspot.ch/2011/04/largest-jgroups-cluster-ever-536-nodes.html
[2] https://issues.jboss.org/browse/JGRP-1705

Wednesday, August 21, 2013

How to hijack a JGroups channel inside Infinispan / JBoss ... (and get away with it !)

Have you ever used an Infinispan or JBoss (WildFly) cluster ?
If no --> skip this article.

Have you ever had the need to send messages around the cluster, without resorting to RpcManager offered by Infinispan or HAPartition provided by WildFly ?
If no --> skip this article.

Have you ever wanted to hijack the JGroups channel used by Infinispan and WildFly and use it for your own purposes ?
If no --> skip this article.

I recently added light-weight channels [1] to JGroups. They provide the ability to send and receive messages over an existing channel, yet those messages are private to the light-weight channel. And the hijacked channel doesn't see those private messages either.

This is good when an application wants to reuse an existing channel, and doesn't want to create/configure/maintain its own channel, which would increase resource consumption.

Also, applications can create many (hundreds) of light-weight channels, as they don't use a lot of resources and are easy and quick to create and destroy.

Even better: if an application would like to add capabilities to an existing stack, e.g. atomic counters, a distributed execution service or distributed locking, then it can define the protocol(s) it wants to add and the private channel will add these. Again, the hijacked channel is unaffected by this.

There's documentation on this at [1], so let me show you how to hijack a channel inside of an Infinispan application. The application can be downloaded from [3].

The (original but edited) code that creates an Infinispan cache looks like this:

    protected void start() throws Exception {
        mgr=new DefaultCacheManager("infinispan.xml");
        cache=mgr.getCache("clusteredCache");

        eventLoop();
    }


It creates a CacheManager, then creates a Cache instance off of it. This is it. Now, we want to grab the JGroups channel and hijack it to run a Draw instance on it (HijackTest:53, [4]):


    protected void start() throws Exception {
        mgr=new DefaultCacheManager("infinispan.xml");
        cache=mgr.getCache("clusteredCache");
        Transport   tp;

        ForkChannel fork_ch;
        tp=cache.getAdvancedCache().getRpcManager().getTransport();
        Channel main_ch=((JGroupsTransport)transport).getChannel();
        fork_ch=new ForkChannel(main_ch, 

                                "hijack-stack", 
                                "lead-hijacker",
                                true,

                                ProtocolStack.ABOVE,
                                FRAG2.class);
        Draw draw=new Draw((JChannel)fork_ch);
        fork_ch.connect("ignored");
        draw.go();

        eventLoop();
    }


The code in bold was inserted. So what does it do ? It grabs the JGroups channel from the Infinispan cache, creates a light-weight (fork) channel  and runs Draw [5] over it. Draw is a replicated whiteboard GUI application. To explain this step-by-step:
  • First the Transport instance is retrieved from the cache, then the JGroups channel from it
  • Next a ForkChannel is created. It is passed the hijacked JGroups channel, the name of the newly created fork stack ("hijack-stack") and the name of the fork channel ("lead-hijacker"). Then we state that we want to dynamically insert a FORK protocol if it doesn't exist, above FRAG2. FORK [2] is needed to create fork channels
  • Next, we create an instance of Draw (shipped with JGroups) on the newly created fork channel
  • Then we connect the fork channel and call go() on Draw which starts up the GUI
The result is shown below:




The hijacked application is a text-based Infinispan perf test (shown in the 2 shell windows). The hijacking Draw windows are shown above. Pressing and moving the mouse around will send multicasts of coordinates/color around the cluster over the fork channel, but the hijacked channel will not notice anything.

Seem like we got away with hijacking the JGroups channel after all ! :-)



[1] http://www.jgroups.org/manual/html/user-advanced.html#ForkChannel

[2] https://issues.jboss.org/browse/JGRP-1613

[3] https://github.com/belaban/HijackTest

[4] https://github.com/belaban/HijackTest/blob/master/src/main/java/org/dubious/HijackTest.java#L53

[5] https://github.com/belaban/JGroups/blob/master/src/org/jgroups/demos/Draw.java


Wednesday, July 31, 2013

BYOC: Bring Your Own Coordinator

I've finally resolved one of the older issues in JGroups, created 6 years ago: pluggable policy for picking the coordinator [1].

This adds the ability to let application code create the new View on a join, leave or crash-leave, or the MergeView on a merge.

Why is this important ?

There are 2 scenarios that benefit from this:
  1. The new default merge policy always picks a previous coordinator if available, so that moving around of the coordinatorship is minimized. Because singleton services (services which are instantiated only on one node in the cluster, usually on the coordinator) need to shutdown services and start them every time the coordinatorship moves, minimizing this means less work for singletons.
  2. A node with a coordinator role has more work to do, e.g. join/leave/merge handling, plus it may be running some singleton services. Some systems will therefore prefer bulkier machines (faster CPU, more memory) to run the coordinator. But so far, this hasn't been possible, as JGroups always preferred every node equally. This changes now, and the puggable policy allows for pinning the coordinatorship to a number of machines (a subset of the cluster nodes).
The documentation is at [2]. For questions and feedback, please use the mailing list. 
Cheers,

[1] https://issues.jboss.org/browse/JGRP-592

[2] http://www.jgroups.org/manual/html/user-advanced.html#MembershipChangePolicy

Monday, July 29, 2013

First JGroups release under Apache License 2.0

I'm happy to announce that I just released 3.4.0.Alpha1, which is the first release under the Apache License 2.0 !

In order to change the license, I had to track down all contributors who still have code in the code base and get their permissions [1], which was time consuming.

Given that JGroups came into existence in 1998, even though a lot of code has since been added/removed/replaced, there are still quite a number of people who still have contributions in the current master.

Finding some of those folks turned out to be hard. I ran into a few cases of abandoned email addresses, and sometimes googling for "firstname lastname" led me to people with the same name but no connection to JGroups whatsoever...

Anyway, the change is done now and I wanted to release a first alpha under AL 2.0 as soon as possible, so other projects can start using JGroups with the new license.

Although 'alpha' suggests bad code quality, the opposite is actually true: this release is very stable, at least as stable as the latest 3.3.x. The major changes are listed below.
Enjoy !

RELAY2: allow for more than one site master
-------------------------------------------
[https://issues.jboss.org/browse/JGRP-1649]

If we have a lot of traffic between sites, having more than 1 site master 
increases performance and reduces stress on the single site master


Kerberos based authentication
-----------------------------
[https://issues.jboss.org/browse/JGRP-1657]

New AUTH plugin contributed by Martin Swales. Experimental, needs more work


Probe now works with TCP too
----------------------------
[https://issues.jboss.org/browse/JGRP-1568]

If multicasting is not enabled, probe.sh can be started as follows:
probe.sh -addr 192.168.1.5 -port 12345
, where 192.168.1.5:12345 is the physical address:port of a node.
Probe will ask that node for the addresses of all other members and 
then send the request to all members.



Optimizations
=============

UNICAST3: ack messages sooner
-----------------------------
[https://issues.jboss.org/browse/JGRP-1664]

A message would get acked after delivery, not reception. This was changed, 
so that long running app code would not delay acking the message, which 
could lead to unneeded retransmission by the sender.


Bug fixes
=========

FRAG/FRAG2: incorrect ordering with message batches
[https://issues.jboss.org/browse/JGRP-1648]

Reassembled messages would get reinserted into the batch at the end instead of 
at their original position


RSVP: incorrect ordering with message batches
---------------------------------------------
[https://issues.jboss.org/browse/JGRP-1641]

RSVP-tagged messages in a batch would get delivered immediately,
instead of waiting for their turn


Memory leak in STABLE
---------------------
[https://issues.jboss.org/browse/JGRP-1638]

Occurred when send_stable_msg_to_coord_only was enabled.


NAKACK2/UNICAST3: problems with flow control
--------------------------------------------
[https://issues.jboss.org/browse/JGRP-1665]

If an incoming message sent out other messages before returning, it could 
block forever as new credits would not be processed.
Caused by a regression (removal of ignore_sync_thread) in FlowControl.



AUTH: nodes without AUTH protocol can join cluster
--------------------------------------------------
[https://issues.jboss.org/browse/JGRP-1661]

If a member didn't ship an AuthHeader, the message would get passed up
instead of rejected.


LockService issues
------------------
[https://issues.jboss.org/browse/JGRP-1634]
[https://issues.jboss.org/browse/JGRP-1639]
[https://issues.jboss.org/browse/JGRP-1660]

Bug fix for concurrent tryLock() blocks and various optimizations.


Logical name cache is cleared, affecting other channels
-------------------------------------------------------
[https://issues.jboss.org/browse/JGRP-1654]

If we have multiple channels in the same JVM, the a new view in one channel
triggers removal of the entries of all other caches



Manual
======

The manual is at http://www.jgroups.org/manual-3.x/html/index.html.

The complete list of features and bug fixes can be found at 
http://jira.jboss.com/jira/browse/JGRP.


Bela Ban, Kreuzlingen, Switzerland
Aug 2013


[1] https://issues.jboss.org/browse/JGRP-1643




Tuesday, May 28, 2013

JGroups to investigate adopting Apache License 2.0

We've started to investigate whether JGroups can change its license from LGPL 2.1 [1] to Apache License (AL) 2.0 [2]. If possible, the next JGroups release will be licensed under AL 2.0.

The reason for the switch is that AL 2.0 is more permissive than LGPL 2.1 and can be integrated with other projects / products more easily. Also, AL 2.0 is one of the most frequently used licenses in the open source world [3], and you can't go wrong with that, can you ? :-)

I've received quite a number of requests to change the license over the years, from the open source community and from companies trying to integrate JGroups into their products.

This change should increase acceptance of JGroups with other open source projects or commercial products.

So what would the impact of this change be ?
  • For projects / products using a current (<= 3.3) JGroups release, nothing would change (they're still licensed under LGPL 2.1)
  • For projects / products upgrading to the AL licensed release (once it's out), nothing would change; the AL 2.0 license is even more permissive
Let me repeat this, nothing would change except that LGPL skeptics should now be able to use JGroups.

Note that Infinispan is looking into making the same change, see [4] for details. As Infinispan consumes JGroups, it is important for JGroups to have the same license as Infinispan, or else integration would be a nightmare; it's hard (if not impossible) for an AL project to consume an LGPL project.

Changing the license would be done in the next JGroups release. Whether this will be 3.4 or whether a license change warrants going directly to a 4.0 is still undecided.

We're currently working with Red Hat's legal department and the community to see whether this switch is possible and what needs to be done to make it happen.

Opinions ? Questions ? Feedback ? Please post to this blog, or the JGroups mailing list.

Cheers,

[1] http://opensource.org/licenses/LGPL-2.1
[2] http://opensource.org/licenses/Apache-2.0
[3] http://osrc.blackducksoftware.com/data/licenses
[4] http://infinispan.blogspot.co.uk/2013/05/infinispan-to-adopt-apache-software.html



Tuesday, May 07, 2013

JGroups 3.3.0.Final released

I'm happy to announce that JGroups 3.3.0.Final was released today.

It contains quite a few optimizations and new features, the most prominent ones being:
  • Message batching: messages received as bundles by the transport are passed up as batches. Compared to passing individual messages up the stack, the advantage is that we have to acquire resources (such as locks) only once per batch instead of once per message. This reduces the number of lock acquisitions and should lead to less context switching and better performance in contended scenarios.
  • Asynchronous Invocation API: This allows the recipient of a message in MessageDispatcher or RpcDispatcher to make the delivering thread return immediately (making it available for other requests) and to send the response later. The advantage is that it is the application which now decides how to deliver messages (e.g. sequentially or in parallel), and not JGroups. The documentation is here: http://www.jgroups.org/manual-3.x/html/user-building-blocks.html#AsyncInvocation
  • UNICAST3: this is a new reliable unicast protocol, combining the advantages of UNICAST (immediate delivery) and UNICAST3 (speed). It is the default in 3.3.
  • New internal thread pool: to be used for internal JGroups messages only. This way, important internal JGroups messages are not impeded by the delivery of application messages ahead of them in delivery order. While this pool is not supposed to be used by application messages, it will help for example to decrease unneeded blocking (by credit messages getting queued up behind application messages), or reduce false suspicions (due to heartbeats getting handled too late, or even getting dropped).
  • RELAY2 improvements: RELAY2 is the protocol which relays traffic between geographically separate sites. This will be the topic of my talk at Red Hat Summit in June.
  • New timer implementation: a better, simpler and faster implementation; the default in 3.3.
  • New message bundler: this new bundler handles sending of individual messages and message batches equally well. Default in 3.3.
A more detailed version of these release notes can be found at [1]. The documentation can be found at [2]. The JIRA is at [3].

Please post questions and feedback as usual on the mailing list.


[1] https://github.com/belaban/JGroups/blob/3.3/doc/ReleaseNotes-3.3.0.txt

[2]  http://www.jgroups.org/manual-3.x/html/index.html

[3] https://issues.jboss.org/browse/JGRP

Monday, April 22, 2013

JBossWorld 2013 in Boston is around the corner

FYI,

I'm going to have a talk at JBossWorld / Red Hat Summit 2013 in Boston on June 13: http://www.redhat.com/summit/sessions/index.html#54.

This will be about a feature in JBoss Data Grid (JDG) which provides geographic failover between sites. I'm going to run a couple of JDG clusters (Boston, London and San Francisco), and initially route all clients to London. Then, as London shuts down at the end of their day, I'll route all clients over to Boston, and finally to San Francisco.

I have a demo that can be run by anyone with internet access, in their browser. Users can punch in some data, and will see their clients fail over between sites seamlessly, without data loss.

Hope to see / meet some of you in Boston !
Cheers,