3.4.4. Resetting a single dataservice

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.3, “Deploying Composite Active/Active Clusters”

Under certain conditions, dataservices in an active/active configuration may drift and/or become inconsistent with the data in another dataservice. If this occurs, you may need to re-provision the data on one or more of the dataservices after first determining the definitive source of the information.

In the following example the west service has been determined to be the definitive copy of the data. To fix the issue, all the datasources in the east service will be reprovisioned from one of the datasources in the west service.

The following is a guide to the steps that should be followed. In the example procedure it is the east service that has failed. It is assumed that the value of executable-prefix has been set to mm and the env.sh script has been executed to configure the environment.

  1. Put the dataservice into MAINTENANCE mode. This ensures that Tungsten Cluster will not attempt to automatically recover the service.

    cctrl [east]> set policy maintenance
  2. On the east, failed, Tungsten Cluster service, put each Tungsten Connector offline:

    cctrl [east]> router * offline
  3. Reset the Tungsten Replicator service on all servers connected to the failed Tungsten Cluster service. For example, on west{1,2,3} reset the east Tungsten Replicator service:

    shell west> mm_trepctl offline
    shell west> mm_trepctl -service east reset -all -y
  4. Place all Tungsten Replicator services on all servers in the failed Tungsten Cluster service to offline:

    shell east> mm_trepctl offline
  5. Next we reprovision the primary node in the failed cluster (east1 in our example) with a manual backup taken from a replica node within the west cluster (west3 in this example).

    Shun the east1 datasource to be restored, and put the replicator service offline, if not already in a failed state, using cctrl :

    cctrl [east]> set force true
    cctrl [east]> datasource east1 shun
    cctrl [east]> replicator east1 offline
  6. Shun the west3 datasource to be backed up, and put the replicator service offline using cctrl :

    cctrl [west]> datasource west3 shun
    cctrl [west]> replicator west3 offline
  7. Stop the mysqld service on both hosts:

    shell> sudo systemctl stop mysqld
  8. Delete the mysqld data directory on east1 :

    east1> sudo rm -rf /var/lib/mysql/*
  9. If necessary, ensure the tungsten user can write to the MySQL directory on east1:

    east1> sudo chmod 777 /var/lib/mysql
  10. Use rsync on west3 to send the data files for MySQL to east1 :

    west3> rsync -aze ssh /var/lib/mysql/* east1:/var/lib/mysql/

    You should synchronize all locations that contain data. This includes additional folders such as innodb_data_home_dir or innodb_log_group_home_dir. Check the my.cnf file to ensure you have the correct paths.

    Once the files have been copied, the files should be updated to have the correct ownership and permissions so that the Tungsten service can read them.

  11. Recover west3 back to the dataservice (This process will automatically restart MySQL):

    cctrl [west]> datasource west3 recover
  12. Update the ownership and permissions on the data files on east1:

    east1> sudo chown -R mysql:mysql /var/lib/mysql
    east1> sudo chmod 770 /var/lib/mysql
  13. Restart MySQL on east1 :

    east1> sudo systemctl start mysqld
  14. Reset the local replication services on east1 :

    east1> trepctl offline
    east1> trepctl -service east reset -all -y
    east1> trepctl online
  15. Recover east1 witin cctrl :

    cctrl [east]> set force true
    cctrl [east]> datasource east1 welcome
    cctrl [east]> datasource east1 online
  16. Using tprovision, restore the remaining nodes (east{2,3}) in the failed east service from a host in the newly recovered east1 host:

    shell east{2,3}> tprovision -s east1 -m xtrabackup

    Note

    For a full explanation of using tprovison see The tprovision Script

  17. Place all the Tungsten Replicator services on east{1,2,3} back online:

    shell east> mm_trepctl online
  18. Place all the Tungsten Replicator services on west{1,2,3} back online:

    shell west> mm_trepctl online
  19. On the east, failed, Tungsten Cluster service, put each Tungsten Connector online:

    cctrl [east]> router * online
  20. Set the policy back to AUTOMATIC:

    cctrl> set policy automatic