6.16.3. Performing Maintenance on an Entire Dataservice

To perform maintenance on all of the machines within a dataservice, a rolling sequence of maintenance must be performed carefully on each machine in a structured way. In brief, the sequence is as follows

  1. Perform maintenance on each of the current Replicas

  2. Switch the Primary to one of the already maintained Replicas

  3. Perform maintenance on the old Primary (now in Replica state)

  4. Switch the old Primary back to be the Primary again

Warning

The "Rolling Maintenance" procedure outlined here should NOT be used when upgrading Tungsten Software between major versions, for example from 6.1 to 7.0, or 7.0 to 7.1.

In most cases the switch will not work due to differences within the manager communications and this could cause unexpected outages.

See Section 6.15, “Upgrading Tungsten Cluster” for more details on upgrading Tungsten software.

A more detailed sequence of steps, including the status of each datasource in the dataservice, and the commands to be performed, is shown in the table below. The table assumes a three-node dataservice (one Primary, two Replicas), but the same principles can be applied to any Primary/Replica dataservice:

Step Description Command host1 host2 host3
1 Initial state   Primary Replica Replica
2 Set MAINTENANCE policy set policy maintenance Primary Replica Replica
3 Shun Replica host2 datasource host2 shun Primary Shunned Replica
4 Perform maintenance   Primary Shunned Replica
5 Recover the Replica host2 back datasource host2 recover Primary Replica Replica
6 Ensure the Replica ( host2 ) has caught up   Primary Replica Replica
7 Shun Replica host3 datasource host3 shun Primary Replica Shunned
8 Perform maintenance   Primary Replica Shunned
9 Recover Replica host3 back datasource host3 recover Primary Replica Replica
10 Ensure the Replica ( host3 ) has caught up   Primary Replica Replica
11 Switch Primary to host2 switch to host2 Replica Primary Replica
12 Shun host1 datasource host1 shun Shunned Primary Replica
13 Perform maintenance   Shunned Primary Replica
14 Recover the Replica host1 back datasource host1 recover Replica Primary Replica
15 Ensure the Replica ( host1 ) has caught up   Primary Replica Replica
16 Switch Primary back to host1 switch to host1 Primary Replica Replica
17 Set AUTOMATIC policy set policy automatic Primary Replica Replica