About FireCluster Failover
The FireCluster failover process is the same for an active/active cluster or an active/passive cluster. With both types of clusters, each cluster member maintains state and session information at all times. When failover occurs, the packet filter connections, branch office VPN tunnels, and user sessions from the failed device fail over automatically to the other device in the cluster.
One Firebox is the cluster master and the other device is the backup master. The backup master uses the primary cluster interface to synchronize connection and session information with the cluster master. If the primary cluster interface fails or is disconnected, the backup master uses the backup cluster interface to communicate with the cluster master. The cluster master also uses both the primary and backup cluster interfaces to send a heartbeat packet once per second to the backup master. We recommend that you always configure both a primary cluster interface and a backup cluster interface.
Events that Trigger a Failover
There are three types of events that can trigger a failover of the cluster master.
Health index of the cluster master is lower than the health index of the backup master
Each cluster member has a calculated health index that indicates the overall health of the device. If the health index of the cluster master is lower than the health index of the backup master, this triggers failover of the cluster master.
For more information about the cluster health index, see Monitor Cluster Health.
Lost heartbeat from the cluster master
The cluster master sends a heartbeat packet through the primary and backup cluster interfaces once per second. If the backup master does not receive three consecutive heartbeats from the cluster master, this triggers failover of the cluster master. The default threshold for lost heartbeats is three. You can increase the lost heartbeat threshold that triggers a failover in the FireCluster Advanced settings.
For more information about the lost heartbeat threshold, see Configure FireCluster Advanced Settings.
Cluster receives the Failover Master command
In Firebox System Manager, when you select Tools > Cluster > Failover Master, you force a failover from the cluster master to the backup master.
For more information about this command, see Force a Failover of the Cluster Master.
What Happens When a Failover Occurs
When a failover of the cluster master occurs, the backup master becomes the cluster master. Then, the original cluster master rejoins the cluster as the backup master. When a failover occurs, the cluster maintains all packet filter connections, branch office VPN tunnels, and user sessions. This behavior is the same for an active/active or an active/passive FireCluster.
In an active/active cluster, if the backup master fails, the cluster master maintains all packet filter connections, branch office VPN tunnels, and user sessions. Proxy connections and Mobile VPN connections can be interrupted, as described in the subsequent table. In an active/passive cluster, if the backup master fails, there is no interruption of connections or sessions because no traffic is assigned to the backup master.
|Connection/Session Type||Impact of a Failover Event|
|Packet filter connections||Connections fail over to the other cluster member.|
|Branch office VPN tunnels||Tunnels fail over to the other cluster member.|
|User sessions||Sessions fail over to the other cluster member.|
|Proxy connections||Connections assigned to the failed device (master or backup master) must be restarted. Connections assigned to the other device are not interrupted.|
|Mobile VPN with IPSec||If the cluster master fails over, all sessions must be restarted.
If the backup master fails, only the sessions assigned to the backup master must be restarted.
Sessions assigned to the cluster master are not interrupted.
|Mobile VPN with SSL||If either device fails over, all sessions must be restarted.|
|Mobile VPN with L2TP||All L2TP sessions are assigned to the cluster master, even for an active/active cluster.
If the cluster master fails over, all sessions must be restarted.
If the backup master fails, L2TP sessions are not interrupted.
|Mobile VPN with PPTP||All PPTP sessions are assigned to the cluster master, even for an active/active cluster.
If the cluster master fails over, all sessions must be restarted.
If the backup master fails, PPTP sessions are not interrupted.
FireCluster Failover and Server Load Balancing
If you use server load balancing to balance connections between your internal servers, when a FireCluster failover event occurs, real-time synchronization does not occur. After a failover, the new cluster master sends connections to all servers in the server load balancing list to discover which servers are available. It then applies the server load balancing algorithm to all available servers.
For information about server load balancing, see Configure Server Load Balancing.
FireCluster Failover and Dynamic Routing
When you enable dynamic routing on a FireCluster, only the cluster master participates directly in the dynamic routing domain. The cluster master synchronizes dynamic route information to the other cluster member. When a failover occurs, the new cluster master initially uses the previously learned dynamic routes. The new cluster master then participates in the dynamic routing domain and uses the configured dynamic routing protocol to discover the latest routes to all destination networks. When the new cluster master discovers the updated dynamic routes, the old dynamic routes are purged and replaced with the new ones.
The time it takes for the new cluster master and all connected routers to agree on a common set of routes (the convergence time) depends on the dynamic routing protocol.
For RIPv1 and RIPv2
The peer RIP router does not detect the FireCluster failover event if the connection itself is not interrupted during the failover.
The peer router detects the FireCluster failover event. The convergence time for OSPF is from 10 to 40 seconds. The convergence time could be shorter, because the new cluster master uses a set of known dynamic routes synchronized from the previous cluster master until it discovers the updated dynamic routes.
The peer router detects the FireCluster failover event. The convergence time for BGP is from 1 to 3 minutes. The convergence time could be shorter, because the new cluster master uses a set of known dynamic routes synchronized from the previous cluster master until it discovers the updated dynamic routes.
Monitor the Cluster During a Failover
The role of each device in the cluster appears after the member name on the Firebox System Manager Front Panel tab. If you look at the Front Panel tab during a failover of the cluster master, you can see the cluster master role move from one device to another. During a failover, you see:
- The role of the old backup master changes from backup master to master.
- The role of the old cluster master changes to inactive and then to idle while the device restarts.
- The role of the old cluster master changes to backup master after the device restarts.
For more information, see Monitor and Control FireCluster Members.
FireCluster Failover and Subscription Services
If you enable licensed subscription services for your FireCluster, the services continue to operate after the failover, as long as you have purchased the required subscription services for FireCluster members. The requirements are different for an active/active FireCluster than for an active/passive FireCluster.
- Active/Active — You must have the same subscription services enabled in the feature keys for both cluster members. Each cluster member applies the services from its own feature key.
- Active/Passive — You must enable the subscription services in the feature key for only one cluster member. The active cluster member uses the subscription services that are active in the feature key of either cluster member.
For more information about feature keys and FireCluster, see About Feature Keys and FireCluster.