6.6.2. Recover a failed Primary

When a Primary datasource is automatically failed over in AUTOMATIC policy mode, providing the faut has been resolved and the host is viable, then the datasource can be brought back into the dataservice as a Replica by using the recover command:

[LOGICAL:EXPERT] /alpha > ls

...
+---------------------------------------------------------------------------------+
|db3(master:SHUNNED(FAILED-OVER-TO-db2), progress=43, THL latency=0.073)          |
|STATUS [SHUNNED] [2025/01/28 09:07:59 AM UTC]                                    |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=master, state=DEGRADED)                                        |
|  DATASERVER(state=STOPPED)                                                      |
|  CONNECTIONS(created=6, active=0)                                               |
+---------------------------------------------------------------------------------+
...

[LOGICAL:EXPERT] /alpha > datasource db3 recover
RECOVERING DATASOURCE 'db3@alpha'
VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'db3'
Verified that DB server notification 'db3' is in state 'ONLINE'
DATA SERVER 'db3' IS NOW AVAILABLE FOR CONNECTIONS
RECOVERING 'db3@alpha' TO A SLAVE USING 'db2@alpha' AS THE MASTER
SETTING THE ROLE OF DATASOURCE 'db3@alpha' FROM 'master' TO 'slave'
RECOVERY OF 'db3@alpha' WAS SUCCESSFUL

The recovered datasource will be added back to the dataservice as a Replica.

6.6.2.1. Recover when there are no Primaries

When there are no Primaries available, due to a failover of a Primary, or multiple host failure it is possible to bring the cluster back online by manually forcing a node to be a new primary.

This action should not be taken blindly, first you must understand why the failure happened in the first place and ensure the node you wish to bring online as the primary is viable and the most up to date.

If this is not immediately known or obvious, the timestamps within each individual datasource block may provide some assistance. In the event that multiple nodes are labelled as master, it would indicate a series of rolling failovers, the latest time would indicate the last node to hold the primary role.

  • Warning

    If in any doubt about the actions to be taken, contact Continuent Support for assistance.

    To use, first you should examine the state of the dataservice and choose which datasource is the most up to date or canonical. For example, within the following output, only db3 is labelled as a master and therefore is the most obvious candidate

    [LOGICAL] /alpha > ls
    
    COORDINATOR[db3:AUTOMATIC:ONLINE]
    
    ROUTERS:
    +---------------------------------------------------------------------------------+
    |connector@db1[2096](ONLINE, created=0, active=0)                                 |
    |connector@db2[2092](ONLINE, created=0, active=0)                                 |
    |connector@db3[2107](ONLINE, created=0, active=0)                                 |
    +---------------------------------------------------------------------------------+
    
    DATASOURCES:
    +---------------------------------------------------------------------------------+
    |db1(slave:FAILED(DATASERVER 'db1@alpha' STOPPED), progress=40, latency=1.000)    |
    |STATUS [CRITICAL] [2025/01/27 04:10:47 PM UTC]                                   |
    |REASON[DATASERVER 'db1@alpha' STOPPED]                                           |
    +---------------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                          |
    |  REPLICATOR(role=slave, master=db3, state=SYNCHRONIZING)                        |
    |  DATASERVER(state=ONLINE)                                                       |
    |  CONNECTIONS(created=0, active=0)                                               |
    +---------------------------------------------------------------------------------+
    +---------------------------------------------------------------------------------+
    |db2(slave:FAILED(DATASERVER 'db2@alpha' STOPPED), progress=40, latency=0.000)    |
    |STATUS [CRITICAL] [2025/01/27 04:10:47 PM UTC]                                   |
    |REASON[DATASERVER 'db2@alpha' STOPPED]                                           |
    +---------------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                          |
    |  REPLICATOR(role=slave, master=db3, state=SYNCHRONIZING)                        |
    |  DATASERVER(state=ONLINE)                                                       |
    |  CONNECTIONS(created=0, active=0)                                               |
    +---------------------------------------------------------------------------------+
    +---------------------------------------------------------------------------------+
    |db3(master:SHUNNED(FAILOVER-ABORTED AFTER UNABLE TO COMPLETE FAILOVER FOR        |
    |DATASOURCE 'db3'. CHECK COORDINATOR MANAGER LOG), progress=-1, THL               |
    |latency=-1.000)                                                                  |
    |STATUS [SHUNNED] [2025/01/27 04:10:50 PM UTC]                                    |
    +---------------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                          |
    |  REPLICATOR(role=master, state=OFFLINE)                                         |
    |  DATASERVER(state=ONLINE)                                                       |
    |  CONNECTIONS(created=0, active=0)                                               |
    +---------------------------------------------------------------------------------+

    Since all nodes are either SHUNNED or FAILED a simple recover would fail as there are no online master nodes available:

    [LOGICAL] /alpha > recover
    RECOVERING DATASERVICE 'alpha
    SET POLICY: AUTOMATIC => MAINTENANCE
    REVERT POLICY: MAINTENANCE => AUTOMATIC
    DATA SERVICE 'alpha' DOES NOT HAVE AN ACTIVE PRIMARY

    Once a host has been chosen, call the set force true and welcome commands specifying the full hostname of the chosen datasource:

    [LOGICAL] /alpha > set force true
    FORCE: true
    
    [LOGICAL] /alpha > datasource db3 welcome
    
    WARNING: This is an expert-level command:
    Incorrect use may cause data corruption
    or make the cluster unavailable.
    
    Do you want to continue? (y/n)> y
    DataSource 'db3@alpha' is now OFFLINE

    As the cluster is in the AUTOMATIC policy mode, and providing the remaining hosts are healthy, the cluster will automatically recover itself, as shown by the ls command executed shortly after:

    [LOGICAL] /alpha > ls
    COORDINATOR[db3:AUTOMATIC:ONLINE]
    
    ROUTERS:
    +---------------------------------------------------------------------------------+
    |connector@db1[2096](ONLINE, created=2, active=0)                                 |
    |connector@db2[2092](ONLINE, created=2, active=0)                                 |
    |connector@db3[2107](ONLINE, created=2, active=0)                                 |
    +---------------------------------------------------------------------------------+
    
    DATASOURCES:
    +---------------------------------------------------------------------------------+
    |db1(slave:ONLINE, progress=42, latency=0.213)                                    |
    |STATUS [OK] [2025/01/28 08:36:37 AM UTC]                                         |
    +---------------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                          |
    |  REPLICATOR(role=slave, master=db3, state=ONLINE)                               |
    |  DATASERVER(state=ONLINE)                                                       |
    |  CONNECTIONS(created=0, active=0)                                               |
    +---------------------------------------------------------------------------------+
    +---------------------------------------------------------------------------------+
    |db2(slave:ONLINE, progress=42, latency=0.240)                                    |
    |STATUS [OK] [2025/01/28 08:34:29 AM UTC]                                         |
    +---------------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                          |
    |  REPLICATOR(role=slave, master=db3, state=ONLINE)                               |
    |  DATASERVER(state=ONLINE)                                                       |
    |  CONNECTIONS(created=0, active=0)                                               |
    +---------------------------------------------------------------------------------+
    +---------------------------------------------------------------------------------+
    |db3(master:ONLINE, progress=42, THL latency=0.161)                               |
    |STATUS [OK] [2025/01/28 08:34:28 AM UTC]                                         |
    +---------------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                          |
    |  REPLICATOR(role=master, state=ONLINE)                                          |
    |  DATASERVER(state=ONLINE)                                                       |
    |  CONNECTIONS(created=6, active=0)                                               |
    +---------------------------------------------------------------------------------+

    If the cluster was in the MAINTENANCE policy mode, or nodes do not automatically recover, you can issue the recover command manually follwed by returning the cluster to AUTOMATIC

  • If this does not recover the remaining Replicas within the cluster, these must be manually recovered. This can be achieved either by using Section 6.6.1, “Recover a failed Replica” , or if this is not possible, using Section 6.6.1.2, “Provision or Reprovision a Replica” .

6.6.2.2. Manually Failing over a Primary in MAINTENANCE policy mode

If the dataservice is in MAINTENANCE mode when the Primary fails, automatic recovery cannot sensibly make the decision about which node should be used as the Primary. In that case, the datasource service must be manually reconfigured.

In the sample below, db2 is the current Primary. The database has failed, but the datasource is still ONLINE, the following steps can be used to force a failover: steps can be used:

...
+---------------------------------------------------------------------------------+
|db2(master:ONLINE, progress=45, THL latency=0.680)                               |
|STATUS [OK] [2025/01/28 09:08:04 AM UTC]                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=master, state=ONLINE)                                          |
|  DATASERVER(state=STOPPED)                                                      |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+
...
  1. Enter expert mode in cctrl. This will suppress the "Do you want to continue?" prompts during the following steps.

    [LOGICAL] /alpha > expert
    
    WARNING: This is an expert-level command:
    Incorrect use may cause data corruption
    or make the cluster unavailable.
    
    Do you want to continue? (y/n)> y
    [LOGICAL:EXPERT] /alpha >
  2. We then mark the node as failed using the following command:

    [LOGICAL:EXPERT] /alpha > datasource db2 fail
    DataSource 'db2@alpha' set to FAILED

    We then see this reflected in the cctrl output:

    ...
    +---------------------------------------------------------------------------------+
    |db2(master:FAILED(MANUALLY-FAILED), progress=51, THL latency=0.272)              |
    |STATUS [CRITICAL] [2025/01/28 10:45:41 AM UTC]                                   |
    |REASON[MANUALLY-FAILED]                                                          |
    +---------------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                          |
    |  REPLICATOR(role=master, state=ONLINE)                                          |
    |  DATASERVER(state=STOPPED)                                                      |
    |  CONNECTIONS(created=0, active=0)                                               |
    +---------------------------------------------------------------------------------+
    ...
  3. Now we simply issue the failover and let the managers do all the necessary steps to promote a new primary:

    [LOGICAL:EXPERT] /alpha >> failover
    SET POLICY: MAINTENANCE => MAINTENANCE
    EVALUATING SLAVE: db1(stored=51, applied=51, latency=0.355, datasource-group-id=0)
    EVALUATING SLAVE: db3(stored=51, applied=51, latency=0.358, datasource-group-id=0)
    SELECTED SLAVE: db3@alpha
    EVALUATING SLAVE: db1(stored=51, applied=51, latency=0.355, datasource-group-id=0)
    EVALUATING SLAVE: db3(stored=51, applied=51, latency=0.358, datasource-group-id=0)
    SELECTED SLAVE: db3@alpha
    Replicator 'db1' is now OFFLINE
    THIS IS THE MOST UPTODATE SLAVE. NO ACTION IS NEEDED.
    Savepoint failover_3(cluster=alpha, source=db1, created=2025/01/28 10:45:45 UTC) created
    PURGE REMAINING ACTIVE SESSIONS ON CURRENT MASTER 'db2@alpha'
    WAITING FOR REPLICATOR 'db2' TO REACH STATE DEGRADED
    Replicator 'db2' is now in DEGRADED state
    SHUNNING PREVIOUS MASTER 'db2@alpha'
    PUT THE NEW MASTER 'db3@alpha' ONLINE
    FAILOVER TO 'db3' WAS COMPLETED
    ...
    +---------------------------------------------------------------------------------+
    |db2(master:SHUNNED(FAILED-OVER-TO-db3), progress=51, THL latency=0.272)          |
    |STATUS [SHUNNED] [2025/01/28 10:45:46 AM UTC]                                    |
    +---------------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                          |
    |  REPLICATOR(role=master, state=DEGRADED)                                        |
    |  DATASERVER(state=STOPPED)                                                      |
    |  CONNECTIONS(created=0, active=0)                                               |
    +---------------------------------------------------------------------------------+
    +---------------------------------------------------------------------------------+
    |db3(master:ONLINE, progress=53, THL latency=0.432)                               |
    |STATUS [OK] [2025/01/28 10:45:50 AM UTC]                                         |
    +---------------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                          |
    |  REPLICATOR(role=master, state=ONLINE)                                          |
    |  DATASERVER(state=ONLINE)                                                       |
    |  CONNECTIONS(created=6, active=0)                                               |
    +---------------------------------------------------------------------------------+
    ...
  4. The recovery of the failed node can now be carried out using the methods outlined in Section 6.6.2, “Recover a failed Primary”

6.6.2.3. Split-Brain Discussion

A split-brain occurs when a cluster which normally has a single write Primary, has two write-able Primaries.

This means that some writes which should go to the “real” Primary are sent to a different node which was promoted to write Primary by mistake.

Once that happens, some writes exist on one Primary and not the other, creating two broken Primaries. Merging the two data sets is impossible, leading to a full restore, which is clearly NOT desirable.

We can say that a split-brain scenario is to be strongly avoided.

A situation like this is most often encountered when there is a network partition of some sort, especially with the nodes spread over multiple availability zones in a single region of a cloud deployment.

This would potentially result in all nodes being isolated, without a clear majority within the voting quorum.

A poorly-designed cluster could elect more than one Primary under these conditions, leading to the split-brain scenario.

Since a network partition would potentially result in all nodes being isolated without a clear majority within the voting quorum, the default action of a Tungsten Cluster is to SHUN all of the nodes.

Shunning ALL of the nodes means that no client traffic is being processed by any node, both reads and writes are blocked.

When this happens, it is up to a human administrator to select the proper Primary and recover the cluster.

For more information, please see Section 6.6.2.1, “Recover when there are no Primaries”.