NATS Streaming with Fault Tolerance.

NATS Streaming Cluster with FT Mode on AWS

Preparation

First, we need a Kubernetes cluster with a provider that offers a service with a ReadWriteMany filesystem available. In this short guide, we will create the cluster on AWS and then use EFS for the filesystem:
1
# Create 3 nodes Kubernetes cluster
2
eksctl create cluster --name stan-k8s \
3
--nodes 3 \
4
--node-type=t3.large \ # t3.small
5
--region=us-east-2
6
7
# Get the credentials for your cluster
8
eksctl utils write-kubeconfig --name stan-k8s --region us-east-2
Copied!
For the FT mode to work, we will need to create an EFS volume which can be shared by more than one pod. Go into the AWS console and create one and make the sure that it is in a security group where the k8s nodes will have access to it. In case of clusters created via eksctl, this will be a security group named ClusterSharedNodeSecurityGroup:
Screen Shot 2019-12-04 at 11 25 08 AM
Screen Shot 2019-12-04 at 12 40 13 PM

Creating the EFS provisioner

Confirm from the FilesystemID from the cluster and the DNS name, we will use those values to create an EFS provisioner controller within the K8S cluster:
Screen Shot 2019-12-04 at 12 08 35 PM
1
---
2
apiVersion: v1
3
kind: ServiceAccount
4
metadata:
5
name: efs-provisioner
6
---
7
kind: ClusterRole
8
apiVersion: rbac.authorization.k8s.io/v1
9
metadata:
10
name: efs-provisioner-runner
11
rules:
12
- apiGroups: [""]
13
resources: ["persistentvolumes"]
14
verbs: ["get", "list", "watch", "create", "delete"]
15
- apiGroups: [""]
16
resources: ["persistentvolumeclaims"]
17
verbs: ["get", "list", "watch", "update"]
18
- apiGroups: ["storage.k8s.io"]
19
resources: ["storageclasses"]
20
verbs: ["get", "list", "watch"]
21
- apiGroups: [""]
22
resources: ["events"]
23
verbs: ["create", "update", "patch"]
24
---
25
kind: ClusterRoleBinding
26
apiVersion: rbac.authorization.k8s.io/v1
27
metadata:
28
name: run-efs-provisioner
29
subjects:
30
- kind: ServiceAccount
31
name:
32
efs-provisioner
33
# replace with namespace where provisioner is deployed
34
namespace: default
35
roleRef:
36
kind: ClusterRole
37
name: efs-provisioner-runner
38
apiGroup: rbac.authorization.k8s.io
39
---
40
kind: Role
41
apiVersion: rbac.authorization.k8s.io/v1
42
metadata:
43
name: leader-locking-efs-provisioner
44
rules:
45
- apiGroups: [""]
46
resources: ["endpoints"]
47
verbs: ["get", "list", "watch", "create", "update", "patch"]
48
---
49
kind: RoleBinding
50
apiVersion: rbac.authorization.k8s.io/v1
51
metadata:
52
name: leader-locking-efs-provisioner
53
subjects:
54
- kind: ServiceAccount
55
name: efs-provisioner
56
# replace with namespace where provisioner is deployed
57
namespace: default
58
roleRef:
59
kind: Role
60
name: leader-locking-efs-provisioner
61
apiGroup: rbac.authorization.k8s.io
62
---
63
apiVersion: v1
64
kind: ConfigMap
65
metadata:
66
name: efs-provisioner
67
data:
68
file.system.id: fs-c22a24bb
69
aws.region: us-east-2
70
provisioner.name: synadia.com/aws-efs
71
dns.name: ""
72
---
73
kind: Deployment
74
apiVersion: apps/v1
75
metadata:
76
name: efs-provisioner
77
spec:
78
replicas: 1
79
selector:
80
matchLabels:
81
app: efs-provisioner
82
strategy:
83
type: Recreate
84
template:
85
metadata:
86
labels:
87
app: efs-provisioner
88
spec:
89
serviceAccountName: efs-provisioner
90
containers:
91
- name: efs-provisioner
92
image: quay.io/external_storage/efs-provisioner:latest
93
env:
94
- name: FILE_SYSTEM_ID
95
valueFrom:
96
configMapKeyRef:
97
name: efs-provisioner
98
key: file.system.id
99
- name: AWS_REGION
100
valueFrom:
101
configMapKeyRef:
102
name: efs-provisioner
103
key: aws.region
104
- name: DNS_NAME
105
valueFrom:
106
configMapKeyRef:
107
name: efs-provisioner
108
key: dns.name
109
110
- name: PROVISIONER_NAME
111
valueFrom:
112
configMapKeyRef:
113
name: efs-provisioner
114
key: provisioner.name
115
volumeMounts:
116
- name: pv-volume
117
mountPath: /efs
118
volumes:
119
- name: pv-volume
120
nfs:
121
server: fs-c22a24bb.efs.us-east-2.amazonaws.com
122
path: /
123
---
124
kind: StorageClass
125
apiVersion: storage.k8s.io/v1
126
metadata:
127
name: aws-efs
128
provisioner: synadia.com/aws-efs
129
---
130
kind: PersistentVolumeClaim
131
apiVersion: v1
132
metadata:
133
name: efs
134
annotations:
135
volume.beta.kubernetes.io/storage-class: "aws-efs"
136
spec:
137
accessModes:
138
- ReadWriteMany
139
resources:
140
requests:
141
storage: 1Mi
Copied!
Result of deploying the manifest:
1
serviceaccount/efs-provisioner created
2
clusterrole.rbac.authorization.k8s.io/efs-provisioner-runner created
3
clusterrolebinding.rbac.authorization.k8s.io/run-efs-provisioner created
4
role.rbac.authorization.k8s.io/leader-locking-efs-provisioner created
5
rolebinding.rbac.authorization.k8s.io/leader-locking-efs-provisioner created
6
configmap/efs-provisioner created
7
deployment.extensions/efs-provisioner created
8
storageclass.storage.k8s.io/aws-efs created
9
persistentvolumeclaim/efs created
Copied!

Setting up the NATS Streaming cluster

Now create a NATS Streaming cluster with FT mode enabled and using NATS embedded mode that is mounting the EFS volume:
1
---
2
apiVersion: v1
3
kind: Service
4
metadata:
5
name: stan
6
labels:
7
app: stan
8
spec:
9
selector:
10
app: stan
11
clusterIP: None
12
ports:
13
- name: client
14
port: 4222
15
- name: cluster
16
port: 6222
17
- name: monitor
18
port: 8222
19
- name: metrics
20
port: 7777
21
---
22
apiVersion: v1
23
kind: ConfigMap
24
metadata:
25
name: stan-config
26
data:
27
stan.conf: |
28
http: 8222
29
30
cluster {
31
port: 6222
32
routes [
33
nats://stan-0.stan:6222
34
nats://stan-1.stan:6222
35
nats://stan-2.stan:6222
36
]
37
cluster_advertise: $CLUSTER_ADVERTISE
38
connect_retries: 10
39
}
40
41
streaming {
42
id: test-cluster
43
store: file
44
dir: /data/stan/store
45
ft_group_name: "test-cluster"
46
file_options {
47
buffer_size: 32mb
48
sync_on_flush: false
49
slice_max_bytes: 512mb
50
parallel_recovery: 64
51
}
52
store_limits {
53
max_channels: 10
54
max_msgs: 0
55
max_bytes: 256gb
56
max_age: 1h
57
max_subs: 128
58
}
59
}
60
---
61
apiVersion: apps/v1
62
kind: StatefulSet
63
metadata:
64
name: stan
65
labels:
66
app: stan
67
spec:
68
selector:
69
matchLabels:
70
app: stan
71
serviceName: stan
72
replicas: 3
73
volumeClaimTemplates:
74
template:
75
metadata:
76
labels:
77
app: stan
78
spec:
79
# STAN Server
80
terminationGracePeriodSeconds: 30
81
82
containers:
83
- name: stan
84
image: nats-streaming:alpine
85
86
ports:
87
# In case of NATS embedded mode expose these ports
88
- containerPort: 4222
89
name: client
90
- containerPort: 6222
91
name: cluster
92
- containerPort: 8222
93
name: monitor
94
args:
95
- "-sc"
96
- "/etc/stan-config/stan.conf"
97
98
# Required to be able to define an environment variable
99
# that refers to other environment variables. This env var
100
# is later used as part of the configuration file.
101
env:
102
- name: POD_NAME
103
valueFrom:
104
fieldRef:
105
fieldPath: metadata.name
106
- name: POD_NAMESPACE
107
valueFrom:
108
fieldRef:
109
fieldPath: metadata.namespace
110
- name: CLUSTER_ADVERTISE
111
value: $(POD_NAME).stan.$(POD_NAMESPACE).svc
112
volumeMounts:
113
- name: config-volume
114
mountPath: /etc/stan-config
115
- name: efs
116
mountPath: /data/stan
117
resources:
118
requests:
119
cpu: 0
120
livenessProbe:
121
httpGet:
122
path: /
123
port: 8222
124
initialDelaySeconds: 10
125
timeoutSeconds: 5
126
- name: metrics
127
image: synadia/prometheus-nats-exporter:0.5.0
128
args:
129
- -connz
130
- -routez
131
- -subz
132
- -varz
133
- -channelz
134
- -serverz
135
- http://localhost:8222
136
ports:
137
- containerPort: 7777
138
name: metrics
139
volumes:
140
- name: config-volume
141
configMap:
142
name: stan-config
143
- name: efs
144
persistentVolumeClaim:
145
claimName: efs
Copied!
Your cluster now will look something like this:
1
kubectl get pods
2
NAME READY STATUS RESTARTS AGE
3
efs-provisioner-6b7866dd4-4k5wx 1/1 Running 0 21m
4
stan-0 2/2 Running 0 6m35s
5
stan-1 2/2 Running 0 4m56s
6
stan-2 2/2 Running 0 4m42s
Copied!
If everything was setup properly, one of the servers will be the active node.
1
$ kubectl logs stan-0 -c stan
2
[1] 2019/12/04 20:40:40.429359 [INF] STREAM: Starting nats-streaming-server[test-cluster] version 0.16.2
3
[1] 2019/12/04 20:40:40.429385 [INF] STREAM: ServerID: 7j3t3Ii7e2tifWqanYKwFX
4
[1] 2019/12/04 20:40:40.429389 [INF] STREAM: Go version: go1.11.13
5
[1] 2019/12/04 20:40:40.429392 [INF] STREAM: Git commit: [910d6e1]
6
[1] 2019/12/04 20:40:40.454212 [INF] Starting nats-server version 2.0.4
7
[1] 2019/12/04 20:40:40.454360 [INF] Git commit [c8ca58e]
8
[1] 2019/12/04 20:40:40.454522 [INF] Starting http monitor on 0.0.0.0:8222
9
[1] 2019/12/04 20:40:40.454830 [INF] Listening for client connections on 0.0.0.0:4222
10
[1] 2019/12/04 20:40:40.454841 [INF] Server id is NB3A5RSGABLJP3WUYG6VYA36ZGE7MP5GVQIQVRG6WUYSRJA7B2NNMW57
11
[1] 2019/12/04 20:40:40.454844 [INF] Server is ready
12
[1] 2019/12/04 20:40:40.456360 [INF] Listening for route connections on 0.0.0.0:6222
13
[1] 2019/12/04 20:40:40.481927 [INF] STREAM: Starting in standby mode
14
[1] 2019/12/04 20:40:40.488193 [ERR] Error trying to connect to route (attempt 1): dial tcp: lookup stan on 10.100.0.10:53: no such host
15
[1] 2019/12/04 20:40:41.489688 [INF] 192.168.52.76:40992 - rid:6 - Route connection created
16
[1] 2019/12/04 20:40:41.489788 [INF] 192.168.52.76:40992 - rid:6 - Router connection closed
17
[1] 2019/12/04 20:40:41.489695 [INF] 192.168.52.76:6222 - rid:5 - Route connection created
18
[1] 2019/12/04 20:40:41.489955 [INF] 192.168.52.76:6222 - rid:5 - Router connection closed
19
[1] 2019/12/04 20:40:41.634944 [INF] STREAM: Server is active
20
[1] 2019/12/04 20:40:41.634976 [INF] STREAM: Recovering the state...
21
[1] 2019/12/04 20:40:41.655526 [INF] STREAM: No recovered state
22
[1] 2019/12/04 20:40:41.671435 [INF] STREAM: Message store is FILE
23
[1] 2019/12/04 20:40:41.671448 [INF] STREAM: Store location: /data/stan/store
24
[1] 2019/12/04 20:40:41.671524 [INF] STREAM: ---------- Store Limits ----------
25
[1] 2019/12/04 20:40:41.671527 [INF] STREAM: Channels: 10
26
[1] 2019/12/04 20:40:41.671529 [INF] STREAM: --------- Channels Limits --------
27
[1] 2019/12/04 20:40:41.671531 [INF] STREAM: Subscriptions: 128
28
[1] 2019/12/04 20:40:41.671533 [INF] STREAM: Messages : unlimited
29
[1] 2019/12/04 20:40:41.671535 [INF] STREAM: Bytes : 256.00 GB
30
[1] 2019/12/04 20:40:41.671537 [INF] STREAM: Age : 1h0m0s
31
[1] 2019/12/04 20:40:41.671539 [INF] STREAM: Inactivity : unlimited *
32
[1] 2019/12/04 20:40:41.671541 [INF] STREAM: ----------------------------------
33
[1] 2019/12/04 20:40:41.671546 [INF] STREAM: Streaming Server is ready
Copied!

NATS Streaming Cluster with FT Mode on Azure

First need to create a PVC (PersistentVolumeClaim), in Azure we can use azurefile to get a volume with ReadWriteMany:
1
---
2
kind: PersistentVolumeClaim
3
apiVersion: v1
4
metadata:
5
name: stan-efs
6
annotations:
7
volume.beta.kubernetes.io/storage-class: "azurefile"
8
spec:
9
accessModes:
10
- ReadWriteMany
11
resources:
12
requests:
13
storage: 100Mi
Copied!
Next create a NATS cluster using the Helm charts:
1
helm repo add nats https://nats-io.github.io/k8s/helm/charts/
2
helm install nats nats/nats
Copied!
To create an FT setup using AzureFile you can use the following Helm chart values file:
1
stan:
2
image: nats-streaming:alpine
3
replicas: 2
4
nats:
5
url: nats://nats:4222
6
store:
7
type: file
8
ft:
9
group: my-group
10
file:
11
path: /data/stan/store
12
volume:
13
enabled: true
14
15
# Mount path for the volume.
16
mount: /data/stan
17
18
# FT mode requires a single shared ReadWriteMany PVC volume.
19
persistentVolumeClaim:
20
claimName: stan-efs
Copied!
Now deploy with Helm:
1
helm install stan nats/stan -f ./examples/deploy-stan-ft-file.yaml
Copied!
Send a few commands to the NATS Server to which STAN/NATS Streaming is connected:
1
kubectl port-forward nats-0 4222:4222 &
2
3
stan-pub -c stan foo bar.1
4
stan-pub -c stan foo bar.2
5
stan-pub -c stan foo bar.3
Copied!
Subscribe to get all the messages:
1
stan-sub -c stan -all foo
Copied!
Last modified 9mo ago