Thursday, January 15, 2015

JGroups workshop

I'm happy to announce that I'm putting the finishing touches to a JGroups workshop [1].

It consists of 4 modules with labs:
  1. Using JGroups: API (beginner level, 1 day)
  2. Using JGroups: building blocks (beginner level, 1 day)
  3. Advanced (medium to advanced level, 2 days)
  4. Admin (medium level, 1 day)
The modules can be mixed and matched, but I think that a public workshop will present them in this order. Beginners may wish to attend only the first 2 days, while others may want to skip the first 2 days and only attend the Advanced and Admin parts.

We're also thinking about offering a consulting package which includes selected modules and a few consulting days. Also, a combined JDG and JGroups workshop is being discussed. But this is all up for discussion at our Berlin meeting this February.

The first workshop will probably be a Red Hat internal one somewhere in EMEA.

As for public workshops, I'm shooting for 2 in Europe and 2 in the US (East and West coast) this year.

If you have suggestions regarding locations and dates, please send me an email (belaban at yahoo dot com).

Registration is not yet open, but if you want to pre-register, send me an email and you'll get a notification when it opens. I promise that you won't get any marketing emails, and I'll delete that list after sending that one email... :-)
Cheers,


[1] https://github.com/belaban/workshop/blob/master/slides/toc.adoc

Tuesday, January 13, 2015

RAFT consensus in JGroups

I'm happy to announce the first alpha release of jgroups-raft, which is an implementation of RAFT [2,3] in JGroups. The jgroups-raft project is currently a separate project on GitHub [1], but may be integrated into JGroups at a later stage.

The functionality includes leader election (section 5.2 in [3]), log replication (5.3), snapshotting and log compaction (7). Cluster membership changes (6) has not yet been implemented; the system currently requires a static membership.

The persistent log is implemented using LevelDB (MapDB support is not complete yet). Also, leader election based on the log commit status (and length) (5.4.1) has not been implemented.

The code quality is alpha at best, and the functionality hasn't been tested with unit tests. Use at your own risk.


So what can jgroups-raft currently be used for ?


Mainly to experiment with RAFT consensus in JGroups. The system comes with a demo of a replicated state machine (replicated hashmap) which can be used to update state in a fixed-size cluster with consensus. The majority (RAFT.majority) is 2, so nore more than 3 instances should be started.
Start the 3 instances like this:

bin/demo.sh -name B -follower
bin/demo.sh -name C -follower
bin/demo.sh -name A

The -follower flag is optional, but it skips leader election for a quick startup (and issues with the missing implementation of 5.4.1).

Note that the -name flag is used as both the logical name of a member and the name of the log. So, after starting the 3 instances, the temp directory will contain logs A.log, B.log and C.log (using LevelDB).

If we kill B and start it again as B, then B.log will be used again. If we start a member D, then this is considered a new member and a log D.log will be created.

Here's the output at C after adding a new entry foo=bar and printing the log:
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=3, commit-index=3, log size=55b

1
key: foo
value: bar
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=3, log size=70b

-- put(foo, bar) -> null
5

index (term): command
---------------------
1 (1): put(name, Bela)
2 (1): put(id, 322649)
3 (1): put(name, Bela Ban)
4 (7): put(foo, bar)

[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=4, log size=70b

4
{foo=bar, name=Bela Ban, id=322649}
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=4, log size=70b


We can see that the state consists of 3 entries and the log has 4 elements (name was changed twice).

When a node is killed and restarted, the state machine is reinitialized from the log:

[mac] /Users/bela/jgroups-raft$ bin/demo.sh -name C -follower
LOG is existent, must not be initialized
777 [DEBUG] RAFT: set last_applied=4, commit_index=4, current_term=7
778 [DEBUG] RAFT: snapshot /tmp/C.snapshot not found, initializing state machine from persistent log
781 [DEBUG] RAFT: applied 3 log entries (2 - 4) to the state machine

-------------------------------------------------------------------
GMS: address=C, cluster=rsm, physical address=192.168.1.3:54886
-------------------------------------------------------------------
-- view change: [B|4] (3) [B, A, C]
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=4, log size=70b

4
{foo=bar, name=Bela Ban, id=322649}


We can see that the state machine was initialized from the persistent log.

If a member is down for a considerable amount of time, and then started again, it may be out of sync, and - if a snapshot was taken at the leader - the first log entry of the leader might be higher than the last commited log entry at the member. In this case, the leader will transfer its snapshot to the restarted member first, and then the usual algorithm is used to bring the restarted member up to date.


What's next ?

We're currently experimenting with an implementation of etcd [5] over jgroups-raft. Also, we're looking into how to use RAFT consensus in Infinispan [6].

I'm currently putting the finishing touches on a JGroups workshop (more on this soon), and will return to work on jgroups-raft after that. The next work items include
  • unit tests and code reviews
  • leader election comparing logs (5.4.1)
  • alternative ELECTION protocol using the JGroups built-in features (reduces code)
  • cluster membership changes
  • consistent reads; reads are currently dirty (section 8 has not yet been implemented)


Please use the mailing list [4] for feedback, questions and discussions.

Cheers,
Bela

[1] https://github.com/belaban/jgroups-raft
[2] http://raftconsensus.github.io/
[3] http://ramcloud.stanford.edu/raft.pdf
[4] https://groups.google.com/forum/#!forum/jgroups-raft
[5] https://github.com/redhat-italy/jgroups-etcd
[6] http://www.infinispan.org