NATS supports running each server in clustered mode. You can cluster servers together for high volume messaging systems and resiliency and high availability.
NATS servers achieve this by gossiping about and connecting to, all of the servers they know, thus dynamically forming a full mesh. Once clients connect or re-connect to a particular server, they are informed about current cluster members. Because of this behavior, a cluster can grow, shrink and self heal. The full mesh does not necessarily have to be explicitly configured either.
Note that NATS clustered servers have a forwarding limit of one hop. This means that each nats-server
instance will only forward messages that it has received from a client to the immediately adjacent nats-server
instances to which it has routes. Messages received from a route will only be distributed to local clients.
For the cluster to successfully form a full mesh and NATS to function as intended and described throughout the documentation - temporary errors permitting - it is necessary that servers can connect to each other and that clients can connect to each server in the cluster.
In addition to a port to listen for clients, nats-server
listens on a "cluster" URL (the -cluster
option). Additional nats-server
servers can then add that URL to their -routes
argument to join the cluster. These options can also be specified in a config file, but only the command-line version is shown in this overview for simplicity.
Here is a simple cluster running on the same machine:
Server A - the 'seed server'
Server B
Check the output of the server for the selected client and route ports.
Server C
Check the output of the server for the selected client and route ports.
Each server has a client and cluster port specified. Servers with the routes option establish a route to the seed server. Because the clustering protocol gossips members of the cluster, all servers are able to discover other server in the cluster. When a server is discovered, the discovering server will automatically attempt to connect to it in order to form a full mesh. Typically only one instance of the server will run per machine, so you can reuse the client port (4222) and the cluster port (4248), and simply the route to the host/port of the seed server.
Similarly, clients connecting to any server in the cluster will discover other servers in the cluster. If the connection to the server is interrupted, the client will attempt to connect to all other known servers.
There is no explicit configuration for seed server. They simply serve as the starting point for server discovery by other members of the cluster as well as clients. As such these are the servers that clients have in their list of connect urls and cluster members have in their list of routes. They reduce configuration as not every server needs to be in these lists. But the ability for other server and clients to successfully connect depends on seed server running. If multiple seed server are used, they make use of the routes option as well, so they can establish routes to one another.
The following cluster options are supported:
When a NATS server routes to a specified URL, it will advertise its own cluster URL to all other servers in the route effectively creating a routing mesh to all other servers.
Note: when using the -routes
option, you must also specify a -cluster
option.
Clustering can also be configured using the server config file.
The following example demonstrates how to run a cluster of 3 servers on the same host. We will start with the seed server and use the -D
command line parameter to produce debug information.
Alternatively, you could use a configuration file, let's call it seed.conf
, with a content similar to this:
And start the server like this:
This will produce an output similar to:
It is also possible to specify the hostname and port independently. At the minimum, the port is required. If you leave the hostname off it will bind to all the interfaces ('0.0.0.0').
Now let's start two more servers, each one connecting to the seed server.
When running on the same host, we need to pick different ports for the client connections -p
, and for the port used to accept other routes -cluster
. Note that -routes
points to the -cluster
address of the seed server (localhost:4248
).
Here is the log produced. See how it connects and registers a route to the seed server (...GzM
).
From the seed's server log, we see that the route is indeed accepted:
Finally, let's start the third server:
Again, notice that we use a different client port and cluster address, but still point to the same seed server at the address nats://localhost:4248
:
First a route is created to the seed server (...IOW
) and after that, a route from ...3PK
- which is the ID of the second server - is accepted.
The log from the seed server shows that it accepted the route from the third server:
And the log from the second server shows that it connected to the third.
At this point, there is a full mesh cluster of NATS servers.
Now, the following should work: make a subscription to the first server (port 4222). Then publish to each server (ports 4222, 5222, 6222). You should be able to receive messages without problems.
Testing server A
Testing server B
Testing server C
Testing using seed (i.e. A, B and C) server URLs
The cluster
configuration map has the following configuration options:
host
Interface where the gateway will listen for incoming route connections.
port
Port where the gateway will listen for incoming route connections.
name
Name of the cluster (recommended for NATS +v2.2)
listen
Combines host
and port
as <host>:<port>
.
tls
A tls
configuration map for securing the clustering connection. verify
is always enabled and cert_file
is used for client and server. See for certificate pitfalls.
advertise
or cluster_advertise
Hostport <host>:<port>
to advertise how this server can be contacted by other cluster members. This is useful in setups with NAT. When using TLS this is important to set to control the hostname that clients will use when discovering the route since by default this will be an IP, otherwise TLS hostname verification may fail with an IP SANs error.
no_advertise
When set to 'true', the server will not gossip its server URLs to clients. This also disables client_advertise
in the general configuration section. Disabling advertise completely can be useful if the server can be reached via different load balancers and interfaces from different networks. Automatically advertising an internal IP address could result in reconnect delays due to failed attempts.
routes
A list of other servers (URLs) to cluster with. Self-routes are ignored. Should authentication via token
or username
/password
be required, specify them as part of the URL.
connect_retries
After how many failed connect attempts to give up establishing a connection to a discovered route. Default is 0
, do not retry. When enabled, attempts will be made once a second. This, does not apply to explicitly configured routes.
authorization
Authorization map for configuring cluster routes. When a single username
/password
is used, it defines the authentication mechanism this server expects, and how this server will authenticate itself when establishing a connection to a discovered route. This will not be used for routes explicitly listed in routes
and therefore have to be provided as part of the URL. With this authentication mode, either use the same credentials throughout the system or list every route explicitly on every server. If the tls
configuration map specifies verify_and_map
only provide the expected username
. Here different certificates can be used, but they have to map to the same username
. The authorization map also allows for timeout
which is honored but users
and token
configuration are not supported and will prevent the server from starting. The permissions
block is ignored.
pool_size
The size of the connection pool used to distribute load across non-pinned accounts. Default is 3
. Refer to v2 Routes for details.
accounts
An optional list of accounts that will have a pinned route connection. Refer to v2 Routes for details.
compression
The compression mode the server will use when connecting with peer servers. Default is accept
. Refer to v2 Routes for details.
Introduced in NATS v2.10.0
Before the v2.10.0 release, two servers in a cluster had only one connection to transmit all messages for all accounts, which could lead to slow down and increased memory usage when data was not transmitted fast enough.
The v2.10.0 release introduces the ability to have multiple route connections between two servers. By default, without any configuration change, clustering two v2.10.0+ servers together will create 3 route connections. This can of course be configured to a different value by explicitly configuring the pool size.
Each of those connections will handle a specific subset of the accounts, and the assignment of an account to a specific connection index in the pool is the same in any server in the cluster. It is required that each server in the cluster have the same pool size value, otherwise, clustering will fail to be established with an error similar to:
In the event that a given connection of the pool breaks, automatic reconnection occurs as usual. However, while the disconnection is happening, traffic for accounts handled by that connection is stopped (that is, traffic is not routed through other connections), the same way that it was when there was a single route connection.
Note that although the cluster's pool_size
configuration parameter can be changed through a configuration reload, connections between servers will likely break since there will be a mismatch between servers. It is possible though to do a rolling reload by setting the same value on all servers. Client and other connections (including dedicated account routes - see next section) will not be closed.
When monitoring is enabled, you will see that the /routez
page now has more connections than before, which is expected.
It is possible to disable route connection pooling by setting the pool_size
configuration parameter to the value -1
. When that is the case, a server will behave as a server pre-v2.10.0 and create a single route connection between its peers.
Note that in that mode, no accounts
list can be defined (see "Accounts Pinning" below).
In addition to connection pooling, the release v2.10.0 has introduced the ability to configure a list of accounts that will have a dedicated route connection.
Note that by default, the server will create a dedicated route for the system account (no specific configuration is needed).
Having a dedicated route improves performance and reduces latency, but another benefit is that since the route is dedicated to an account, the account name does not need to be added to the message route protocols. Since account names can be quite long, this reduces the number of bytes that need to be transmitted in all other cases.
In the event that an account route connection breaks, automatic reconnection occurs as usual. However, while the disconnection is happening, traffic for this account is stopped.
The accounts
list can be modified and a configuration signal be sent to the server. All servers need to have the same list, however, it is possible to perform a rolling configuration reload.
For instance, adding an account to the list of server A
and issuing a configuration reload will not produce an error, even though the other server in the cluster does not have that account in the list yet. A dedicated connection will not yet be established, but traffic for this account in the pooled connection currently handling it will stop. When the configuration reload happens on the other server, a dedicated connection will then be established and this account’s traffic will resume.
When removing an account from the list and issuing a configuration reload, the connection for this account will be closed, and traffic for this account will stop. Other server(s) that still have this account configured with a dedicated connection will fail to reconnect. When they are also sent the configuration reload (with updated accounts
configuration), the account traffic will now be handled by a connection in the pool.
Note that configuration reload of changes in the accounts
list do not affect existing pool connections, and therefore should not affect traffic for other accounts.
When monitoring is enabled, the route connection's information now has a new field called account
that displays the name of the account this route is for.
As indicated in the connection pooling section, if pool_size
is set to -1
, the accounts
list cannot be configured nor would be in use.
Although it is recommended that all servers part of the same cluster be at the same version number, a v2.10.0 server will be able to connect to an older server and in that case create a single route to that server. This allows for an easy deployment of a v2.10.0 release into an existing cluster running an older release.
Release v2.10.0 introduces the ability to configure compression between servers having route connections. The current compression algorithm used is S2, an extension of Snappy. By default, routes do not use compression and it needs to be explicitly enabled.
There are several modes of compression and there is no requirement to have the same mode between routed servers.
off
- Explicitly disables compression for any route between the server and a peer.
accept
(default) - Does not initiate compression, but will accept the compression mode of the peer it connects to.
s2_fast
- Applies compression, but optimizes for speed over compression ratio.
s2_better
- Applies compression, providing a balance of speed and compression ratio.
s2_best
- Applies compression and optimizes for compression ratio over speed.
s2_auto
- Choose the appropriate s2_*
mode relative to the round-trip time (RTT) measured between the server and the peer. See rtt_thresholds
below.
When s2_auto
compression is used, it relies on a rtt_thresholds
option, which is a list of three latency thresholds that dictate increasing or decreasing the compression mode.
The default rtt_thresholds
value is [10ms, 50ms, 100ms]
. The way to read this is that if the RTT is under 10ms, no compression is applied. Once 10ms is reached, s2_fast
is applied and so on with the remaining two thresholds for s2_better
and s2_best
.
The compression
configuration can be changed through configuration reload. If the value is changed from off
to anything else, then connections are closed and recreated, same as if the compression mode was set to something and disabled by setting it to off
.
For all other compression modes, the mode is changed dynamically without a need to close the route connections.
When monitoring is enabled, the route connection's information now has a new field called compression
that displays the current compression mode. It can be off
or any other mode described above. Note that s2_auto
is not displayed, instead, what will be displayed is the current mode, say s2_best
or s2_uncompressed
.
If connected to an older server, the compression
field will display not supported
.
It is possible to have a v2.10.0+ server, with a compression mode configured, connect to an older server that does not support compression. The connection will simply not use compression.
Clustering in JetStream is required for a highly available and scalable system. Behind clustering is RAFT. There's no need to understand RAFT in depth to use clustering, but knowing a little explains some of the requirements behind setting up JetStream clusters.
JetStream uses a NATS optimized RAFT algorithm for clustering. Typically RAFT generates a lot of traffic, but the NATS server optimizes this by combining the data plane for replicating messages with the messages RAFT would normally use to ensure consensus. Each server participating requires an unique server_name
(only applies within the same domain).
The RAFT groups include API handlers, streams, consumers, and an internal algorithm designates which servers handle which streams and consumers.
The RAFT algorithm has a few requirements:
A log to persist state
A quorum for consensus
In order to ensure data consistency across complete restarts, a quorum of servers is required. A quorum is ½ cluster size + 1. This is the minimum number of nodes to ensure at least one node has the most recent data and state after a catastrophic failure. So for a cluster size of 3, you’ll need at least two JetStream enabled NATS servers available to store new messages. For a cluster size of 5, you’ll need at least 3 NATS servers, and so forth.
Meta Group - all servers join the Meta Group and the JetStream API is managed by this group. A leader is elected and this owns the API and takes care of server placement.
Stream Group - each Stream creates a RAFT group, this group synchronizes state and data between its members. The elected leader handles ACKs and so forth, if there is no leader the stream will not accept messages.
Consumer Group - each Consumer creates a RAFT group, this group synchronizes consumer state between its members. The group will live on the machines where the Stream Group is and handle consumption ACKs etc. Each Consumer will have their own group.
Generally, we recommend 3 or 5 JetStream enabled servers in a NATS cluster. This balances scalability with a tolerance for failure. For example, if 5 servers are JetStream enabled you would want two servers in one “zone”, two servers in another, and the remaining server in a third. This means you can lose any one “zone” at any time and continue operating.
This is possible and even recommended in some cases. By mixing server types you can dedicate certain machines optimized for storage for Jetstream and others optimized solely for compute for standard NATS servers, reducing operational expense. With the right configuration, the standard servers would handle non-persistent NATS traffic and the JetStream enabled servers would handle JetStream traffic.
To configure JetStream clusters, just configure clusters as you normally would by specifying a cluster block in the configuration. Any JetStream enabled servers in the list of clusters will automatically chatter and set themselves up. Unlike core NATS clustering though, each JetStream node must specify a server name and cluster name.
Below are explicitly listed server configuration for a three-node cluster across three machines, n1-c1
, n2-c1
, and n3-c1
.
A user and password under the system account ($SYS) should be configured. The following configuration uses a bcrypted password: a very long s3cr3t! password
.
Add nodes as necessary. Choose a data directory that makes sense for your environment, ideally a fast SSD, and launch each server. After two servers are running you'll be ready to use JetStream.
Once a JetStream cluster is operating interactions with the CLI and with nats
CLI is the same as before. For these examples, lets assume we have a 5 server cluster, n1-n5 in a cluster named C1.
Within an account there are operations and reports that show where users data is placed and which allow them some basic interactions with the RAFT system.
When adding a stream using the nats
CLI the number of replicas will be asked, when you choose a number more than 1, (we suggest 1, 3 or 5), the data will be stored on multiple nodes in your cluster using the RAFT protocol as above.
Example output extract:
Above you can see that the cluster information will be reported in all cases where Stream info is shown such as after add or using nats stream info
.
Here we have a stream in the NATS cluster C1
, its current leader is a node n1-c1
and it has 2 followers - n4-c1
and n3-c1
.
The current
indicates that followers are up to date and have all the messages, here both cluster peers were seen very recently.
The replica count cannot be edited once configured.
Users can get overall statistics about their streams and also where these streams are placed:
Every RAFT group has a leader that's elected by the group when needed. Generally there is no reason to interfere with this process, but you might want to trigger a leader change at a convenient time. Leader elections will represent short interruptions to the stream so if you know you will work on a node later it might be worth moving leadership away from it ahead of time.
Moving leadership away from a node does not remove it from the cluster and does not prevent it from becoming a leader again, this is merely a triggered leader election.
The same is true for consumers, nats consumer cluster step-down ORDERS NEW
.
Systems users can view state of the Meta Group - but not individual Stream or Consumers.
We have a high level report of cluster state:
This is a full cluster wide report, the report can be limited to a specific account using --account
.
Here we see the distribution of streams, messages, api calls etc by across 2 super clusters and an overview of the RAFT meta group.
In the Meta Group report the server n2-c1
is not current and has not been seen for 9 seconds, it's also behind by 2 raft operations.
This report is built using raw data that can be obtained from the monitor port on the /jsz
url, or over nats using:
This will produce a wealth of raw information about the current state of your cluster - here requesting it from the leader only.
Similar to Streams and Consumers above the Meta Group allows leader stand down. The Meta Group is cluster wide and spans all accounts, therefore to manage the meta group you have to use a SYSTEM
user.
Generally when shutting down NATS, including using Lame Duck Mode, the cluster will notice this and continue to function. A 5 node cluster can withstand 2 nodes being down.
There might be a case though where you know a machine will never return, and you want to signal to JetStream that the machine will not return. This will remove it from the Stream in question and all it's Consumers.
After the node is removed the cluster will notice that the replica count is not honored anymore and will immediately pick a new node and start replicating data to it. The new node will be selected using the same placement rules as the existing stream.
At this point the stream and all consumers will have removed n4-c1
from the group, they will all start new peer selection and data replication.
We can see a new replica was picked, the stream is back to replication level of 3 and n4-c1
is not active any more in this Stream or any of its Consumers.
Diagnosing problems in NATS JetStream clusters requires:
knowledge of JetStream concepts
knowledge of the NATS Command Line Interface (CLI)
The following tips and commands (while not an exhaustive list) can be useful when diagnosing problems in NATS JetStream clusters:
Look at nats-server logs. By default, only warning and error logs are produced, but debug and trace logs can be turned on from the command line using -D
and -DV
, respectively. Alternatively, enabling debug
or trace
in the server config.
Make sure that in the NATS JetStream configuration, at least one system user is configured in this section: { $SYS { users } }
.
nats account
commandsVerify that JetStream is enabled on account
nats server
commandsnats server ls
List known servers
nats server ping
Ping all servers
nats server info
Show information about a single server
Health check for NATS servers
nats server report
commandsnats server report connections
Report on connections
nats server report accounts
Report on account activity
Report on JetStream activity
nats server request
commandsShow JetStream details
nats server request subscriptions
Show subscription information
nats server request variables
Show runtime variables
nats server request connections
Show connection details
nats server request routes
Show route details
nats server request gateways
Show gateway details
nats server request leafnodes
Show leafnode details
nats server request accounts
Show account details
nats server cluster
commandsForce a new leader election by standing down the current meta leader
Removes a server from a JetStream cluster
Monitor NATS traffic. (Experimental command)