To perform maintenance on all of the machines within a dataservice, a rolling sequence of maintenance must be performed carefully on each machine in a structured way. In brief, the sequence is as follows
Perform maintenance on each of the current Replicas
Switch the Primary to one of the already maintained Replicas
Perform maintenance on the old Primary (now in Replica state)
Switch the old Primary back to be the Primary again
The "Rolling Maintenance" procedure outlined here should NOT be used when upgrading Tungsten Software between major versions, for example from 6.1 to 7.0, or 7.0 to 7.1.
In most cases the switch will not work due to differences within the manager communications and this could cause unexpected outages.
See Section 6.15, “Upgrading Tungsten Cluster” for more details on upgrading Tungsten software.
A more detailed sequence of steps, including the status of each datasource in the dataservice, and the commands to be performed, is shown in the table below. The table assumes a three-node dataservice (one Primary, two Replicas), but the same principles can be applied to any Primary/Replica dataservice:
Step | Description | Command | host1 | host2 | host3 |
---|---|---|---|---|---|
1 | Initial state | Primary | Replica | Replica | |
2 |
Set MAINTENANCE policy
| set policy maintenance | Primary | Replica | Replica |
3 |
Shun Replica host2
| datasource host2 shun | Primary | Shunned | Replica |
4 | Perform maintenance | Primary | Shunned | Replica | |
5 |
Recover the Replica host2
back
| datasource host2 recover | Primary | Replica | Replica |
6 |
Ensure the Replica ( host2 )
has caught up
| Primary | Replica | Replica | |
7 |
Shun Replica host3
| datasource host3 shun | Primary | Replica | Shunned |
8 | Perform maintenance | Primary | Replica | Shunned | |
9 |
Recover Replica host3 back
| datasource host3 recover | Primary | Replica | Replica |
10 |
Ensure the Replica ( host3 )
has caught up
| Primary | Replica | Replica | |
11 |
Switch Primary to host2
| switch to host2 | Replica | Primary | Replica |
12 |
Shun host1
| datasource host1 shun | Shunned | Primary | Replica |
13 | Perform maintenance | Shunned | Primary | Replica | |
14 |
Recover the Replica host1
back
| datasource host1 recover | Replica | Primary | Replica |
15 |
Ensure the Replica ( host1 )
has caught up
| Primary | Replica | Replica | |
16 |
Switch Primary back to
host1
| switch to host1 | Primary | Replica | Replica |
17 |
Set AUTOMATIC policy
| set policy automatic | Primary | Replica | Replica |