The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.
For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.3, “Deploying Composite Active/Active Clusters”
Under certain conditions, dataservices in an active/active configuration may drift and/or become inconsistent with the data in another dataservice. If this occurs, you may need to re-provision the data on one or more of the dataservices after first determining the definitive source of the information.
In the following example the west
service has been determined to be the definitive copy of the data. To fix
the issue, all the datasources in the
east
service will be reprovisioned
from one of the datasources in the
west
service.
The following is a guide to the steps that should be followed. In the
example procedure it is the
east
service
that has failed. It is assumed that the value of executable-prefix
has been set to mm
and the env.sh
script has
been executed to configure the environment.
Put the dataservice into
MAINTENANCE
mode. This
ensures that Tungsten Cluster will not attempt to automatically recover
the service.
cctrl [east]> set policy maintenance
On the east
, failed,
Tungsten Cluster service, put each Tungsten Connector offline:
cctrl [east]> router * offline
Reset the Tungsten Replicator service on all servers connected to
the failed Tungsten Cluster service. For example, on
west{1,2,3}
reset the
east
Tungsten Replicator service:
shell west>mm_trepctl offline
shell west>mm_trepctl -service east reset -all -y
Place all Tungsten Replicator services on all servers in the failed Tungsten Cluster service to offline:
shell east> mm_trepctl offline
Next we reprovision the primary node in the failed cluster (east1
in our example) with a manual backup taken from a replica node within the west cluster
(west3
in this example).
Shun the east1
datasource to be
restored, and put the replicator service offline, if not already in a failed state, using
cctrl :
cctrl [east]>set force true
cctrl [east]>datasource east1 shun
cctrl [east]>replicator east1 offline
Shun the west3
datasource to be
backed up, and put the replicator service offline using
cctrl :
cctrl [west]>datasource west3 shun
cctrl [west]>replicator west3 offline
Stop the mysqld service on both hosts:
shell> sudo systemctl stop mysqld
Delete the mysqld data
directory on east1
:
east1> sudo rm -rf /var/lib/mysql/*
If necessary, ensure the
tungsten
user can write to the
MySQL directory on east1
:
east1> sudo chmod 777 /var/lib/mysql
Use rsync on
west3
to send the data files for
MySQL to east1
:
west3> rsync -aze ssh /var/lib/mysql/* east1:/var/lib/mysql/
You should synchronize all locations that contain data. This includes
additional folders such as
innodb_data_home_dir
or
innodb_log_group_home_dir
. Check
the my.cnf
file to ensure
you have the correct paths.
Once the files have been copied, the files should be updated to have the correct ownership and permissions so that the Tungsten service can read them.
Recover west3
back to the
dataservice (This process will automatically restart MySQL):
cctrl [west]> datasource west3 recover
Update the ownership and permissions on the data files on
east1
:
east1>sudo chown -R mysql:mysql /var/lib/mysql
east1>sudo chmod 770 /var/lib/mysql
Restart MySQL on east1
:
east1> sudo systemctl start mysqld
Reset the local replication services on east1
:
east1>trepctl offline
east1>trepctl -service east reset -all -y
east1>trepctl online
Recover east1
witin cctrl :
cctrl [east]>set force true
cctrl [east]>datasource east1 welcome
cctrl [east]>datasource east1 online
Using tprovision, restore the remaining nodes
(east{2,3}
) in the failed
east
service from a host in the
newly recovered east1
host:
shell east{2,3}> tprovision -s east1 -m xtrabackup
For a full explanation of using tprovison see Section 9.23, “The tprovision Script”
Place all the Tungsten Replicator services on
east{1,2,3}
back online:
shell east> mm_trepctl online
Place all the Tungsten Replicator services on
west{1,2,3}
back online:
shell west> mm_trepctl online
On the east
, failed,
Tungsten Cluster service, put each Tungsten Connector online:
cctrl [east]> router * online
Set the policy back to AUTOMATIC
:
cctrl> set policy automatic