The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.
For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.3, “Deploying Composite Active/Active Clusters”
Under certain conditions, dataservices in an active/active configuration may drift and/or become inconsistent with the data in another dataservice. If this occurs, you may need to re-provision the data on one or more of the dataservices after first determining the definitive source of the information.
In the following example the west
service has been determined to be the definitive copy of the data. To fix
the issue, all the datasources in the
east service will be reprovisioned
from one of the datasources in the
west service.
The following is a guide to the steps that should be followed. In the
example procedure it is the
east service
that has failed. It is assumed that the value of executable-prefix
has been set to mm and the env.sh script has
been executed to configure the environment.
Put the dataservice into
MAINTENANCE mode. This
ensures that Tungsten Cluster will not attempt to automatically recover
the service.
cctrl [east]> set policy maintenance
On the east, failed,
Tungsten Cluster service, put each Tungsten Connector offline:
cctrl [east]> router * offline
Reset the Tungsten Replicator service on all servers connected to
the failed Tungsten Cluster service. For example, on
west{1,2,3} reset the
east Tungsten Replicator service:
shell west>mm_trepctl offlineshell west>mm_trepctl -service east reset -all -y
Place all Tungsten Replicator services on all servers in the failed Tungsten Cluster service to offline:
shell east> mm_trepctl offline
Next we reprovision the primary node in the failed cluster (east1
in our example) with a manual backup taken from a replica node within the west cluster
(west3 in this example).
Shun the east1 datasource to be
restored, and put the replicator service offline, if not already in a failed state, using
cctrl :
cctrl [east]>set force truecctrl [east]>datasource east1 shuncctrl [east]>replicator east1 offline
Shun the west3 datasource to be
backed up, and put the replicator service offline using
cctrl :
cctrl [west]>datasource west3 shuncctrl [west]>replicator west3 offline
Stop the mysqld service on both hosts:
shell> sudo systemctl stop mysqld
Delete the mysqld data
directory on east1 :
east1> sudo rm -rf /var/lib/mysql/*
If necessary, ensure the
tungsten user can write to the
MySQL directory on east1:
east1> sudo chmod 777 /var/lib/mysql
Use rsync on
west3 to send the data files for
MySQL to east1 :
west3> rsync -aze ssh /var/lib/mysql/* east1:/var/lib/mysql/
You should synchronize all locations that contain data. This includes
additional folders such as
innodb_data_home_dir or
innodb_log_group_home_dir. Check
the my.cnf file to ensure
you have the correct paths.
Once the files have been copied, the files should be updated to have the correct ownership and permissions so that the Tungsten service can read them.
Recover west3 back to the
dataservice (This process will automatically restart MySQL):
cctrl [west]> datasource west3 recover
Update the ownership and permissions on the data files on
east1:
east1>sudo chown -R mysql:mysql /var/lib/mysqleast1>sudo chmod 770 /var/lib/mysql
Restart MySQL on east1 :
east1> sudo systemctl start mysqld
Reset the local replication services on east1 :
east1>trepctl offlineeast1>trepctl -service east reset -all -yeast1>trepctl online
Recover east1 witin cctrl :
cctrl [east]>set force truecctrl [east]>datasource east1 welcomecctrl [east]>datasource east1 online
Using tprovision, restore the remaining nodes
(east{2,3}) in the failed
east service from a host in the
newly recovered east1 host:
shell east{2,3}> tprovision -s east1 -m xtrabackupFor a full explanation of using tprovison see Section 9.23, “The tprovision Script”
Place all the Tungsten Replicator services on
east{1,2,3} back online:
shell east> mm_trepctl online
Place all the Tungsten Replicator services on
west{1,2,3} back online:
shell west> mm_trepctl online
On the east, failed,
Tungsten Cluster service, put each Tungsten Connector online:
cctrl [east]> router * online
Set the policy back to AUTOMATIC:
cctrl> set policy automatic