Thursday, September 08, 2016

Removing thread pools in JGroups 4.0

JGroups 3.x has 4 thread pools:
  • Regular thread pool: used for regular messages (default has a queue)
  • OOB thread pool: used for OOB messages (no queue)
  • Internal thread pool: used for JGroups internal messages only. The main raison d'etre for this pool was that internal messages such as heartbeats or credits should never get queued up behind other messages, and get processed immediately.
  • Timer thread pool: all tasks in a timer need to be executed by a thread pool as they can potentially block
Over time, I found out that - with most configurations - I had the queue in the regular thread pool disabled, as I wanted the pool to dynamically create new threads (up to max-threads) when required, and terminate them again after some idle time.

Hence the idea to club regular and OOB thread pools into one.

When I further thought about this, I realized that incoming messages could also be handled by a queue-less thread pool: by handling RejectedExecutionException thrown when the pool is full and simply spawning a new thread to process the internal message, so it wouldn't get dropped.

The same goes for timer tasks: a timer task (e.g. a message retransmission task) cannot get dropped, or retransmission would cease. However, by using the same mechanism as for internal messages, namely spawning a new thread when the thread pool is full, this can be solved.

Therefore, in 4.0 there's only a single thread pool handling regular, OOB and internal messages, plus timer tasks.

The new thread pool has no queue, or else it would not throw a RejectedExecutionException when a task is added, but simply queue it, which is not what we want for internal messages or timer tasks.

It also has a default rejection policy of "abort" which cannot be changed (only by substituting the thread pool with a custom pool).

This dramatically reduces configuration complexity: from 4 to 1 pools, and the new thread pool only exposes min-threads, max-threads, idle-time and enabled as configuration options.

Here's an example of a 3.x configuration:

        timer_type="new3"
        timer.min_threads="2"
        timer.max_threads="4"
        timer.keep_alive_time="3000"
        timer.queue_max_size="500"

        thread_pool.enabled="true"
        thread_pool.min_threads="2"
        thread_pool.max_threads="8"
        thread_pool.keep_alive_time="5000"
        thread_pool.queue_enabled="true"
        thread_pool.queue_max_size="10000"
        thread_pool.rejection_policy="discard"

        internal_thread_pool.enabled="true"
        internal_thread_pool.min_threads="1"
        internal_thread_pool.max_threads="4"
        internal_thread_pool.keep_alive_time="5000"
        internal_thread_pool.queue_enabled="false"
        internal_thread_pool.rejection_policy="discard"

        oob_thread_pool.enabled="true"
        oob_thread_pool.min_threads="1"
        oob_thread_pool.max_threads="8"
        oob_thread_pool.keep_alive_time="5000"
        oob_thread_pool.queue_enabled="false"
        oob_thread_pool.queue_max_size="100"
        oob_thread_pool.rejection_policy="discard"


and here's the 4.0 configuration:

        thread_pool.enabled="true"
        thread_pool.min_threads="2"
        thread_pool.max_threads="8"
        thread_pool.keep_alive_time="5000"



Nice, isn't it?

Cheers,

Tuesday, August 23, 2016

JGroups 4.0.0.Alpha1 released

Howdy folks,

I just released a first alpha of JGroups 4.0.0 to SourceForge and maven. There are 12+ issues left, and I expect a final release in October.

The major version number was incremented because there are a lot of API changes and removal of deprecated classes/methods.

I'd be happy to get feedback on the API; there's still time to make a few API changes until the final release.

Major API changes



New features 

 

Deliver message batches

Ability to handle message batches in addition to individual messages: receive(MessageBatch) was added to ReceiverAdapter: https://issues.jboss.org/browse/JGRP-2003

Encryption and authentication

ENCRYPT was removed and replaced by SYM_ENCRYPT (keystore-based) and ASYM_ENCRYPT (asymmetric key exchange): https://issues.jboss.org/browse/JGRP-2021.

ENCRYPT had a couple of security flaws, and since the code hasn't been touched for almost a decade, I took this chance and completely refactored it into the 2 protocols above.

The code is now much more readable and maintainable.


Optimizations

 

Removal of Java serialization

JGroups marshalling falls back to Java serialization if a class is not primitive or implements Streamable. This was completely removed for JGroups-internal classes:  https://issues.jboss.org/browse/JGRP-2033

MessageBundler improvements

The performance of existing bundlers has been improved, and new bundlers (RingBufferBundler, NoBundler) have been added. Since the shared transport feature was removed, SingletonAddress used as key into a bundler's hashmap could also be removed.

For instance, TransferQueueBundler now removes chunks of message from the queue, rather than one-by-one: https://issues.jboss.org/browse/JGRP-2076.

I also added the ability to switch bundlers at runtime (via (e.g.) probe.sh op=TCP.bundler["rb"]): https://issues.jboss.org/browse/JGRP-2057 / https://issues.jboss.org/browse/JGRP-2058


Reduction of event creation

For every message sent up or down, a new Event (wrapping the message) had to be created. Adding 2 separate up(Message) and down(Message) callbacks removed that unneeded creation: https://issues.jboss.org/browse/JGRP-2067


Faster header marshalling and creation

Headers used to do a hashmap lookup to get their magic-id when getting marshalled. Now they carry the magic ID directly in the header. This saves a hashmap lookup per header (usually we have 3-4 headers per message): https://issues.jboss.org/browse/JGRP-2042.

Header creation was getting the class using a lookup with the magic ID, and then creating an instance. This was replaced by each header class implementing a create() method, resulting in header creation getting from 300ns down to 25ns.
https://issues.jboss.org/browse/JGRP-2043.


Bug fixes


MFC and UFC headers getting mixed up

This is a regression that led to premature exhaustion of credits, slowing things down:
https://issues.jboss.org/browse/JGRP-2072


Download


<groupId>org.jgroups</groupId>
<artifactId>jgroups</artifactId>
<version>4.0.0-Alpha1</version>


The manual has not been changed yet, so browse the source code or javadocs for details.

Enjoy! and feedback via the mailing list, please!

Cheers,

Bela Ban


Wednesday, February 24, 2016

JGroups 3.6.8 released

FYI, 3.6.8.Final has been released.

Not a big release; it contains mostly optimizations and a nice probe improvement. The main issues are listed below.
Enjoy!

Bela

New features
============

Probe improvements
------------------
[https://issues.jboss.org/browse/JGRP-2004]
[https://issues.jboss.org/browse/JGRP-2005]

- Proper discarding of messages from a different cluster with '-cluster' option.
- Less information per cluster member; only the requested information is returned
- Detailed information about RPCs (number of sync, async RPCs, plus timings)
  - http://www.jgroups.org/manual/index.html#_looking_at_details_of_rpcs_with_probe



Optimizations
=============

DONT_BUNDLE and OOB: messages are not removed from batch when execution fails
-----------------------------------------------------------------------------
[https://issues.jboss.org/browse/JGRP-2015]

- Messages are not removed from batch when execution fails
- Rejections are not counted to num_rejected_msgs


COMPRESS: removed 1 byte[] buff copy
------------------------------------
[https://issues.jboss.org/browse/JGRP-2017]

An unneeded copy of the compressed payload was created when sending and compressing a message. The fix should reduce
memory allocation pressure quite a bit.


RpcDispatcher: don't copy the first anycast
-------------------------------------------
[https://issues.jboss.org/browse/JGRP-2010]

When sending an anycast to 3 destinations, JGroups sends a copy of the original message to all 3. However, the first
doesn't need to be copied (less memory allocation pressure). For an anycast to a single destination, no copy is
needed, either.


Compaction of in-memory size
----------------------------
[https://issues.jboss.org/browse/JGRP-2011]
[https://issues.jboss.org/browse/JGRP-2012]

- Reduced size of Rsp (used in every RPC) from 32 -> 24 bytes
- Request/UnicastRequest/GroupRequest: reduced size


RequestCorrelator.done() is slow
--------------------------------
[https://issues.jboss.org/browse/JGRP-2013]

Used by RpcDispatcher. Fixed by eliminating the linear search done previously.



Bug fixes
=========

FILE_PING: consider special characters in file names
----------------------------------------------------
[https://issues.jboss.org/browse/JGRP-2014]

Names like "A/web-cluster" would fail on Windows as the slash char ('/') was treated as demarcation char in some clouds.



Manual
======

The manual is at http://www.jgroups.org/manual/index.html.

The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.


Bela Ban, Kreuzlingen, Switzerland
Feb 2016

Wednesday, January 20, 2016

Dump RPC stats with JGroups

When using remote procedure calls (RPCs) across a cluster using RpcDispatcher, it would be interesting to know how many RPCs of which type (unicast, multicast, anycast) are invoked by whom to whom.

I added this feature in 3.6.8-SNAPSHOT [1]. The documentation is here: [2].

As a summary, since this feature is costly, it has to be enabled with
probe.sh rpcs-enable-details (and disabled with rpcs-disable-details).

From now on, invocation times of synchronous (blocking) RPCs will be recorded (async RPCs will be ignored).

RPC stats can be dumped with probe.sh rpcs-details:
[belasmac] /Users/bela/JGroups$ probe.sh rpcs rpcs-details

-- sending probe on /224.0.75.75:7500
#1 (481 bytes):
local_addr=C [ip=127.0.0.1:55535, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=0
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67480
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189064
rpcs-details=
D: async: 0, sync: 130434, min/max/avg (ms): 0.13/924.88/2.613
A: async: 0, sync: 130243, min/max/avg (ms): 0.11/926.35/2.541
B: async: 0, sync: 63346, min/max/avg (ms): 0.14/73.94/2.221

#2 (547 bytes):
local_addr=A [ip=127.0.0.1:65387, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=5
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67528
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189200
rpcs-details=
<all>: async: 0, sync: 5, min/max/avg (ms): 2.11/9255.10/4917.072
C: async: 0, sync: 130387, min/max/avg (ms): 0.13/929.71/2.467
D: async: 0, sync: 63340, min/max/avg (ms): 0.13/63.74/2.469
B: async: 0, sync: 130529, min/max/avg (ms): 0.13/929.71/2.328

#3 (481 bytes):
local_addr=B [ip=127.0.0.1:51000, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=0
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67255
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189494
rpcs-details=
C: async: 0, sync: 130616, min/max/avg (ms): 0.13/863.93/2.494
A: async: 0, sync: 63210, min/max/avg (ms): 0.14/54.35/2.066
D: async: 0, sync: 130177, min/max/avg (ms): 0.13/863.93/2.569

#4 (482 bytes):
local_addr=D [ip=127.0.0.1:54944, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=0
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67293
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189353
rpcs-details=
C: async: 0, sync: 63172, min/max/avg (ms): 0.13/860.72/2.399
A: async: 0, sync: 130342, min/max/avg (ms): 0.13/862.22/2.338
B: async: 0, sync: 130424, min/max/avg (ms): 0.13/866.39/2.350

This shows the stats for each member in a given cluster, e.g. number of unicast, multicast and anycast RPCs, per target destination, plus min/max and average invocation times for sync RPCs per target as well.

Probe just become even more powerful! :-)
Enjoy!

[1] https://issues.jboss.org/browse/JGRP-2005
[2] Documentation: http://www.jgroups.org/manual/index.html#_looking_at_details_of_rpcs_with_probe

Monday, January 18, 2016

JGroups workshop in Munich April 4-8 2016


I'm happy to announce another JGroups workshop in Munich April 4-8 2016 !

The registration is now open at [2].

The agenda is at [3] and includes an overview of the basic API, building blocks, advanced topics and an in-depth look at the most frequently used protocols, plus some admin (debugging, tracing,diagnosis) stuff.

We'll be doing some hands-on demos, looking at code and I'm always trying to make the workshops as hands-on as possible.

I'll be teaching the workshop myself, and I'm looking forward to meeting some of you and having beers in Munich downtown! For attendee feedback on courses last year check out [1].

Note that the exact location in Munich has not yet been picked, I'll update the registration and send out an email to already registered attendees once this is the case (by the end of January the latest).

The course has a min limit of 5 and a max limit of 15 attendees.

I'm planning to do another course in Boston or New York in the fall of 2016, but plans have not yet finalized.

Cheers, and I hope to see many of you in Munich!
Bela Ban


[1] http://www.jgroups.org/workshops.html
[2] http://www.amiando.com/WorkshopMunich
[3] https://github.com/belaban/workshop/blob/master/slides/toc.adoc

Tuesday, January 12, 2016

JGroups 3.6.7.Final released

I'm happy to announce that 3.6.7.Final has been released!

This release contains a few bug fixes, but is mainly about optimizations reducing memory consumption and allocation rates.
The other optimization was in TCP_NIO2, which is now as fast as TCP. It is slated to become the successor to TCP, as it uses fewer threads and since it's built on NIO, should be much more scalable.

3.6.7.Final can be downloaded from SourceForge [1] or used via maven (groupId=org.jgroups / artifactId=jgroups, version=3.6.7.Final).

Below is a list of the major issues resolved.
Enjoy!


[1] https://sourceforge.net/projects/javagroups/files/JGroups/3.6.7.Final/


New features


Interoperability between TCP and TCP_NIO2


[https://issues.jboss.org/browse/JGRP-1952]
This allows nodes that have TCP as transport to talk to nodes that have TCP_NIO2 as transport, and vice versa.

Optimizations


Transport: reuse of receive buffers

[https://issues.jboss.org/browse/JGRP-1998]
On a message reception, the transport would create a new buffer in TCP and TCP_NIO2 (not in UDP), read the message into that buffer and then pass it to the one of thread pools, copying single messages (not batches).
This was changed to reusing the same buffers in UDP, TCP and TCP_NIO2, by reading the network data into one of those buffers, de-serializing the message (or message batch) and then passing it to one of the thread pools.
The effect is a much lower memory allocation rate.

Message bundling: reuse of send buffers

[https://issues.jboss.org/browse/JGRP-1989]
When sending messages, a new buffer would be created for marshalling for every message (or message bundle). This was changed to reuse the same buffer for all messages or message bundles.
The effect is a smaller memory allocation rate on the send path.

TCP_NIO2: copy on-demand when sending messages

[https://issues.jboss.org/browse/JGRP-1991]
If a message sent by TCP_NIO2 cannot be put entirely into the network buffer of the OS, then the remainder of that message is copied. This is needed to implement reusing of send buffers, see JGRP-1989 above.

TCP_NIO2: single selector slows down writes and reads

[https://issues.jboss.org/browse/JGRP-1999]
This transport used to have a single selector, processing both writes and reads in the same thread. Writes are not expensive, but reads can be, as de-serialization adds up.
We now have a reader thread for every NioConnection which processes reads (using work stealing) separate from the selector thread. When idle for some time, the reader thread terminates and a new thread is created on subsequent data available to be read.
UPerf (4 nodes) showed a perf increase from 15'000 msgs/sec/node to 24'000. TCP_NIO2's speed is now roughly the same as TCP.

Headers: collapse 2 arrays into 1

[https://issues.jboss.org/browse/JGRP-1990]
A Message had a Headers instance which had an array for header IDs and another one for the actual headers. These 2 arrays were collapsed into a single array and Headers is not a separate class anymore, but the array is managed directly inside Message.
This reduces the memory needed for a message by ca 22 bytes!

RpcDispatcher: removal of unneeded field in a request

[https://issues.jboss.org/browse/JGRP-2001]
The request-id was carried in both the Request (UnicastRequest or MulticastRequest) and the header, which is duplicate and a waste. Removed from Request and also removed rsp_expected from the header, total savings ca. 9 bytes per RPC.

Switched back from DatagramSocket to MulticastSocket for sending of IP multicasts

[https://issues.jboss.org/browse/JGRP-1970]
This caused some issues in MacOS based systems: when the routing table was not setup correctly, multicasting would not work (nodes wouldn't find each other).
Also, on Windows, IPv6 wouldn't work: https://github.com/belaban/JGroups/wiki/FAQ.

Make the default number of headers in a message configurable

[https://issues.jboss.org/browse/JGRP-1985]
The default was 3 (changed to 4 now) and if we had more headers, then the headers array needed to be resized (unneeded memory allocation).

Message bundling

[https://issues.jboss.org/browse/JGRP-1986]
When the threshold of the send queue was exceeded, the bundler thread would send messages one-by-one, leading to bad performance.

TransferQueueBundler: switch to array from linked list for queue

[https://issues.jboss.org/browse/JGRP-1987]
Less memory allocation overhead.

Bug fixes

SASL now handles merges correctly

[https://issues.jboss.org/browse/JGRP-1967]

FRAG2: message corruption when thread pools are disabled

[https://issues.jboss.org/browse/JGRP-1973]

Discovery leaks responses

[https://issues.jboss.org/browse/JGRP-1983]



Manual

The manual is at http://www.jgroups.org/manual/index.html.

The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.