Monday, May 08, 2017

Running an Infinispan cluster with Kubernetes on Google Container Engine (GKE)

In this post, I'm going to show the steps needed to get a 10 node Infinispan cluster up and running on Google Container Engine (GKE).

The test we'll be running is IspnPerfTest and the corresponding docker image is belaban/ispn_perf_test on dockerhub.

All that's needed to run this youselves is a Google Compute Engine account, so head on over there now and create one if you want to reproduce this demo! :-)

Alternatively, the cluster could be run locally in minikube, but for this post I chose GKE instead.

Ready? Then let's get cracking...

First, let's create a 10-node cluster in GKE. The screen shot below shows the form that needs to be filled out to create a 10 node cluster in GKE. This results in 10 nodes getting created in Google Compute Engine (GCE):


































































As shown, we'll use 10 v16-cpu instances with 14GB of memory each.

Press "Create" and the cluster is being created:



If you logged into your Google Compute Engine console, it would show the 10 nodes that are getting created.

When the cluster has been created, click on "Connect" and execute the "gcloud" command that's shown as a result:

gcloud container clusters get-credentials ispn-cluster \ --zone us-central1-a --project ispnperftest

We can now see the 10 GCE nodes:
[belasmac] /Users/bela/kubetest$ kubectl get nodes
NAME                                          STATUS    AGE       VERSION
gke-ispn-cluster-default-pool-59ed0e14-1zdb   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-33pk   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-3t95   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-5sn9   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-9lmz   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-j646   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-k797   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-q80q   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-r96s   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-zhdj   Ready     4m        v1.5.7


Next we'll run an interactive instance of IspnPerfTest and 3 non-interactive (no need for a TTY) instances, forming a cluster of 4. First, we start the interactive instance. Note that it might take a while until Kubernetes has downloaded image belaban/ispn_perf_test from dockerhub. When done, the following is shown:

[belasmac] /Users/bela/kubetest$ kubectl run infinispan --rm=true -it --image=belaban/ispn_perf_test kube.sh
If you don't see a command prompt, try pressing enter.

-------------------------------------------------------------------
GMS: address=infinispan-749052960-pl066-27417, cluster=default, physical address=10.40.0.4:7800
-------------------------------------------------------------------

-------------------------------------------------------------------
GMS: address=infinispan-749052960-pl066-18029, cluster=cfg, physical address=10.40.0.4:7900
-------------------------------------------------------------------
created 100,000 keys: [1-100,000]
[1] Start test [2] View [3] Cache size [4] Threads (100)
[5] Keys (100,000) [6] Time (secs) (60) [7] Value size (1.00KB) [8] Validate
[p] Populate cache [c] Clear cache [v] Versions
[r] Read percentage (1.00)
[d] Details (true)  [i] Invokers (false) [l] dump local cache
[q] Quit [X] Quit all


Now we'll start 3 more instances, the definition of which is taken from a Yaml file (ispn.yaml):

## Creates a number of pods on kubernetes running IspnPerfTest
## Run with: kubectl create -f ispn.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ispn
  namespace: default
spec:
  replicas: 3
  template:
    metadata:
      labels:
        run: ispn-perf-test
    spec:
      hostNetwork: false
      containers:
      - args:
        - kube.sh
        - -nohup
        name: ispn-perf-test
        image: belaban/ispn_perf_test
        # imagePullPolicy: IfNotPresent


The YAML file creates 3 instances (replicas: 3):

belasmac] /Users/bela/kubetest$ kubectl create -f ispn.yaml
deployment "ispn" created


Running "kubectl get pods -o wide" shows:

[belasmac] /Users/bela/kubetest$ kubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       IP          NODE
infinispan-749052960-pl066   1/1       Running   0          4m        10.40.0.4   gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-hm785        1/1       Running   0          46s       10.40.2.4   gke-ispn-cluster-default-pool-59ed0e14-r96s
ispn-1255975377-jx70d        1/1       Running   0          46s       10.40.4.3   gke-ispn-cluster-default-pool-59ed0e14-1zdb
ispn-1255975377-xf9r8        1/1       Running   0          46s       10.40.5.3   gke-ispn-cluster-default-pool-59ed0e14-3t95


This shows 1 infinispan instance (interactive terminal)  and 3 ispn instances (non-interactive).

We can now exec into one of the instances are run probe.sh to verify that a cluster of 4 has formed:

[belasmac] /Users/bela/kubetest$ kubectl exec -it ispn-1255975377-hm785 bash
bash-4.3$ probe.sh -addr localhost                                                                                                                                         
-- sending probe request to /10.40.5.3:7500
-- sending probe request to /10.40.4.3:7500
-- sending probe request to /10.40.2.4:7500
-- sending probe request to /10.40.0.4:7500

#1 (287 bytes):
local_addr=ispn-1255975377-xf9r8-60537
physical_addr=10.40.5.3:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

#2 (287 bytes):
local_addr=ispn-1255975377-hm785-39319
physical_addr=10.40.2.4:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

#3 (284 bytes):
local_addr=ispn-1255975377-jx70d-16
physical_addr=10.40.4.3:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

#4 (292 bytes):
local_addr=infinispan-749052960-pl066-27417
physical_addr=10.40.0.4:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

4 responses (4 matches, 0 non matches)


The 'view' item shows the same view (list of cluster members) across all 4 nodes, so the cluster has formed successfully. Also, if we look at our interactive instance, pressing [2] shows the cluster has 4 members:

-- local: infinispan-749052960-pl066-18029
-- view: [infinispan-749052960-pl066-18029|3] (4) [infinispan-749052960-pl066-18029, ispn-1255975377-xf9r8-34843, ispn-1255975377-jx70d-20952, ispn-1255975377-hm785-48520]


This means we have view number 4 (0-3) and 4 cluster members (the number in parens).

Next, let's scale the cluster to 10 members. To do this, we'll tell Kubernetes to scale the ispn deployment to 9 instances (from 3):

[belasmac] /Users/bela/IspnPerfTest$ kubectl scale deployment ispn --replicas=9
deployment "ispn" scaled


After a few seconds, the interactive terminal shows a new view containing 10 members:

** view: [infinispan-749052960-pl066-27417|9] (10) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319, ispn-1255975377-6191p-9724, ispn-1255975377-1g2kx-5547, ispn-1255975377-333rl-13052, ispn-1255975377-57zgl-28575, ispn-1255975377-j8ckh-35528, ispn-1255975377-lgvmt-32173]

We also see a minor inconvenience when looking at the pods:

[belasmac] /Users/bela/jgroups-docker$ lubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       IP          NODE
infinispan-749052960-pl066   1/1       Running   0          13m       10.40.0.4   gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-1g2kx        1/1       Running   0          1m        10.40.7.4   gke-ispn-cluster-default-pool-59ed0e14-k797
ispn-1255975377-333rl        1/1       Running   0          1m        10.40.9.3   gke-ispn-cluster-default-pool-59ed0e14-9lmz
ispn-1255975377-57zgl        1/1       Running   0          1m        10.40.1.4   gke-ispn-cluster-default-pool-59ed0e14-q80q
ispn-1255975377-6191p        1/1       Running   0          1m        10.40.0.5   gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-hm785        1/1       Running   0          10m       10.40.2.4   gke-ispn-cluster-default-pool-59ed0e14-r96s
ispn-1255975377-j8ckh        1/1       Running   0          1m        10.40.6.4   gke-ispn-cluster-default-pool-59ed0e14-j646
ispn-1255975377-jx70d        1/1       Running   0          10m       10.40.4.3   gke-ispn-cluster-default-pool-59ed0e14-1zdb
ispn-1255975377-lgvmt        1/1       Running   0          1m        10.40.8.3   gke-ispn-cluster-default-pool-59ed0e14-33pk
ispn-1255975377-xf9r8        1/1       Running   0          10m       10.40.5.3   gke-ispn-cluster-default-pool-59ed0e14-3t95


The infinispan pod  and one of the ispn pods have been created on the same GCE node gke-ispn-cluster-default-pool-59ed0e14-5sn9. The reason is that they are different deployments, and so GKE deploys them in an unrelated manner. Had all pods been created in the same depoyment, Kubernetes would have assigned pods to nodes in a round-robin fashion.

This could be fixed by using labels, but I didn't want to complicate the demo. Note that running more that one pod per GCE node will harm performance slightly...

Now we can run the performance test by populating the cluster (grid) with key/value pairs ([p]) and then running the test ([1]). This inserts 100'000 key/value pairs into the grid and executes read operations on every node for 1 minute (for details on IspnPerfTest consult the Github URL given earlier):

 Running test for 60 seconds:
1: 47,742 reqs/sec (286,351 reads 0 writes)
2: 56,854 reqs/sec (682,203 reads 0 writes)
3: 60,628 reqs/sec (1,091,264 reads 0 writes)
4: 62,922 reqs/sec (1,510,092 reads 0 writes)
5: 64,413 reqs/sec (1,932,427 reads 0 writes)
6: 65,050 reqs/sec (2,341,846 reads 0 writes)
7: 65,517 reqs/sec (2,751,828 reads 0 writes)
8: 66,172 reqs/sec (3,176,344 reads 0 writes)
9: 66,588 reqs/sec (3,595,839 reads 0 writes)
10: 67,183 reqs/sec (4,031,168 reads 0 writes)

done (in 60020 ms)


all: get 1 / 1,486.77 / 364,799.00, put: 0 / 0.00 / 0.00

======================= Results: ===========================
ispn-1255975377-1g2kx-51998: 100,707.90 reqs/sec (6,045,193 GETs, 0 PUTs), avg RTT (us) = 988.79 get, 0.00 put
ispn-1255975377-xf9r8-34843: 95,986.15 reqs/sec (5,760,705 GETs, 0 PUTs), avg RTT (us) = 1,036.78 get, 0.00 put
ispn-1255975377-jx70d-20952: 103,935.58 reqs/sec (6,239,149 GETs, 0 PUTs), avg RTT (us) = 961.14 get, 0.00 put
ispn-1255975377-j8ckh-11479: 100,869.08 reqs/sec (6,054,263 GETs, 0 PUTs), avg RTT (us) = 987.95 get, 0.00 put
ispn-1255975377-lgvmt-26968: 104,007.33 reqs/sec (6,243,144 GETs, 0 PUTs), avg RTT (us) = 960.05 get, 0.00 put
ispn-1255975377-6191p-15331: 69,004.31 reqs/sec (4,142,053 GETs, 0 PUTs), avg RTT (us) = 1,442.04 get, 0.00 put
ispn-1255975377-57zgl-58007: 92,282.75 reqs/sec (5,538,903 GETs, 0 PUTs), avg RTT (us) = 1,078.14 get, 0.00 put
ispn-1255975377-333rl-8583: 99,130.95 reqs/sec (5,949,542 GETs, 0 PUTs), avg RTT (us) = 1,004.08 get, 0.00 put
infinispan-749052960-pl066-18029: 67,166.91 reqs/sec (4,031,358 GETs, 0 PUTs), avg RTT (us) = 1,486.77 get, 0.00 put
ispn-1255975377-hm785-48520: 79,616.70 reqs/sec (4,778,196 GETs, 0 PUTs), avg RTT (us) = 1,254.87 get, 0.00 put


Throughput: 91,271 reqs/sec/node (91.27MB/sec) 912,601 reqs/sec/cluster
Roundtrip:  gets min/avg/max = 0/1,092.14/371,711.00 us (50.0=938 90.0=1,869 95.0=2,419 99.0=5,711 99.9=20,319 [percentile at mean: 62.50]),
            puts n/a


As suspected, instances infinispan-749052960-pl066-18029 and ispn-1255975377-6191p-15331 show lower performance than the other nodes, as they are co-located on the same GCE node.


The Kubernetes integration in JGroups is done by KUBE_PING which interacts with the Kubernetes master (API server) to fetch the IP addresses of the pods that have been started by Kubernetes.

The KUBE_PING protocol is new, so direct problems, issues, configuration questions etc to the JGroups mailing list.