If THL is lost on a Primary before the events contained within it have been applied to the Replica(s), the THL will need to be rebuilt from the existing MySQL binary logs.
If the MySQL binary logs no longer exist, then recovery of the lost transactions in THL will NOT be possible.
The basic sequence of operation for recovering the THL on both Primary and Replicas is:
Gather the failing requested sequence numbers from all Replicas:
shell> trepctl status
...
pendingError : Event extraction failed
pendingErrorCode : NONE
pendingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: Client handshake failure: Client response validation failed: Master log »
does not contain requested transaction: master source ID=db1 client source ID=db2 requested »
seqno=22 client epoch number=22 master min seqno=27 master max seqno=27
...
In the above example, when Replica db2 comes back online, it requests a copy of the last seqno in local thl (22) from the Primary db1 to compare for data integrity purposes, which the Primary no longer has.
Keep a note of the lowest sequence number and the host that it is on across all Replicas for use in the next step.
On the Replica with the lowest failing requested seqno, get the epoch,
source-id and event-id (binlog position) from the THL using the command
thl list -seqno specifying the sequence number above.
This information will be needed on the extractor (Primary) in a later
step. You can add the option -headers
to reduce the amount of output if required. For example:
[tungsten@db2]$ thl list -seqno 22 -headers
SEQ# = 22 / FRAG# = 0 (last frag)
- FILE = thl.data.0000000001
- TIME = 2025-02-10 11:04:42.0
- EPOCH# = 22
- EVENTID = mysql-bin.000011:0000000000004723;181
- SOURCEID = db1
There are two more ways of getting the same information using the dsctl command, so use the one you are most comfortable with:
[tungsten@db2]$ dsctl get
[{"extract_timestamp":"2025-02-10 11:04:42.0","eventid":"mysql-bin.000011:0000000000004723;181",»
"fragno":0,"last_frag":true,"seqno":22,"update_timestamp":"2025-02-10 11:04:47.0",»
"shard_id":"tungsten_alpha","applied_latency":5,"epoch_number":22,"task_id":0,"source_id":"db1"}]
[tungsten@db2]$ dsctl get -ascmd
dsctl set -seqno 22 -epoch 22 -event-id "mysql-bin.000011:0000000000004723;181" -source-id "db1"
Additionally, if you are using v7.0.2 or later, you can also use thl dsctl:
[tungsten@db2]$ thl dsctl -seqno 22
dsctl -service alpha set -reset -seqno 22 -epoch 22 -event-id "mysql-bin.000011:0000000000004723;181" -source-id "db1"
Place the cluster into MAINTENANCE
and take the
primary replicator OFFLINE
[tungsten@db1]$cctrl
[LOGICAL] /alpha >set policy maintenance
[LOGICAL] /alpha >replicator db1 offline
Clear all THL on the Primary since it is no longer needed by any Replicas:
[tungsten@db1]$ thl purge
Use the dsctl command on the Primary with the values we got from the Replica with the lowest seqno to tell the Primary replicator to begin generating THL starting from that event in the MySQL binary logs:
Note: If you used the dsctl get -ascmd or the thl dsctl earlier, you
may use that provided command now. Ensure the -reset
option is supplied.
[tungsten@db1]$ dsctl set -reset -seqno 22 -epoch 22 -event-id "mysql-bin.000011:0000000000004723;181" -source-id "db1"
Place the cluster back to AUTOMATIC
. This should also
return the replicator to the ONLINE
state.
[tungsten@db1]$cctrl
[LOGICAL] /alpha >set policy automatic
Switch the Replicas to online state once the Primary is fully online:
[tungsten@db2]$ trepctl online