When the dataservice policy mode is
AUTOMATIC
, the dataservice
will automatically failover the Primary host when the existing Primary is
identified as having failed or become unavailable.
For example, when the Primary host
db1
becomes unavailable because
of a network problem, the dataservice automatically switches to
db3
. The dataservice status is
updated accordingly, showing the automatically shunned
db2
:
[LOGICAL:EXPERT] /alpha > ls
COORDINATOR[db1:AUTOMATIC:ONLINE]
ROUTERS:
+---------------------------------------------------------------------------------+
|connector@db1[7435](ONLINE, created=2, active=0) |
|connector@db2[7472](ONLINE, created=2, active=0) |
|connector@db3[7468](ONLINE, created=2, active=0) |
+---------------------------------------------------------------------------------+
DATASOURCES:
+---------------------------------------------------------------------------------+
|db1(master:SHUNNED(FAILED-OVER-TO-db3), progress=8, THL latency=0.981) |
|STATUS [SHUNNED] [2025/01/27 01:51:23 PM UTC] |
+---------------------------------------------------------------------------------+
| MANAGER(state=ONLINE) |
| REPLICATOR(role=master, state=DEGRADED) |
| DATASERVER(state=STOPPED) |
| CONNECTIONS(created=4, active=0) |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db2(slave:ONLINE, progress=8, latency=1.004) |
|STATUS [OK] [2025/01/27 01:51:40 PM UTC] |
+---------------------------------------------------------------------------------+
| MANAGER(state=ONLINE) |
| REPLICATOR(role=slave, master=db1, state=ONLINE) |
| DATASERVER(state=ONLINE) |
| CONNECTIONS(created=0, active=0) |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db3(master:ONLINE, progress=10, THL latency=0.380) |
|STATUS [OK] [2025/01/27 01:51:27 PM UTC] |
+---------------------------------------------------------------------------------+
| MANAGER(state=ONLINE) |
| REPLICATOR(role=master, state=ONLINE) |
| DATASERVER(state=ONLINE) |
| CONNECTIONS(created=2, active=0) |
+---------------------------------------------------------------------------------+
The status for the original Primary
(db1
) identifies the datasource
as shunned, and indicates which datasource was promoted to the Primary in
the FAILED-OVER-TO-db3
.
A automatic failover can be triggered by using the datasource fail command:
[LOGICAL:EXPERT] /alpha > datasource db1 fail
This triggers the automatic failover sequence, and simulates what would happen if the specified host failed.
If db1
becomes available again,
the datasource is not automatically added back to the dataservice, but
must be explicitly re-added to the dataservice. The status of the
dataservice once db1
returns is
shown below:
[LOGICAL:EXPERT] /alpha > ls
...
+---------------------------------------------------------------------------------+
|db1(master:SHUNNED(FAILED-OVER-TO-db3), progress=8, THL latency=0.981) |
|STATUS [SHUNNED] [2025/01/27 01:51:23 PM UTC] |
+---------------------------------------------------------------------------------+
| MANAGER(state=ONLINE) |
| REPLICATOR(role=master, state=DEGRADED) |
| DATASERVER(state=ONLINE) |
| CONNECTIONS(created=4, active=0) |
+---------------------------------------------------------------------------------+
...
Because db1
was previously the
Primary, the datasource recover
command verifies that the server is available, configures the node as a
Replica of the newly promoted Primary, and re-enables the services:
[LOGICAL:EXPERT] /alpha > datasource db1 recover
RECOVERING DATASOURCE 'db1@alpha'
VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'db1'
Verified that DB server notification 'db1' is in state 'ONLINE'
DATA SERVER 'db1' IS NOW AVAILABLE FOR CONNECTIONS
RECOVERING 'db1@alpha' TO A SLAVE USING 'db3@alpha' AS THE MASTER
SETTING THE ROLE OF DATASOURCE 'db1@alpha' FROM 'master' TO 'slave'
RECOVERY OF 'db1@alpha' WAS SUCCESSFUL
If the command is successful, then the node should be up and running as a Replica of the new Primary.
The recovery process can fail if the THL data and dataserver contents do not match, for example when statements have been executed on a Replica. For information on recovering from failures that recover cannot fix, see Section 6.6.1.3, “Replica Datasource Extended Recovery” .