Recover a Failed Coordinator
If the primary coordinator fails, the Apache Cloudberry system is not accessible and WAL replication stops. Use gpactivatestandby to activate the standby coordinator. Upon activation of the standby coordinator, Apache Cloudberry reconstructs the coordinator host state at the time of the last successfully committed transaction.
These steps assume a standby coordinator host is configured for the system. See Enable Coordinator Mirroring.
To activate the standby coordinator
-
Run the
gpactivatestandbyutility from the standby coordinator host you are activating. For example:$ export PGPORT=5432
$ gpactivatestandby -d /data/coordinator/gpseg-1Where
-dspecifies the data directory of the coordinator host you are activating.After you activate the standby, it becomes the active or primary coordinator for your Apache Cloudberry array.
noteBefore running
gpactivatestandby, be sure to rungpstate -fto confirm that the standby coordinator is synchronized with the current coordinator node. If synchronized, the final line of thegpstate -foutput will look similar to this:20230607:06:50:06:004205 gpstate:test1-m:gpadmin-[INFO]:--Sync state: sync. -
After the utility completes, run
gpstatewith the-boption to display a summary of the system status:$ gpstate -bThe coordinator instance status should be
Active. When a standby coordinator is not configured, the command displaysNo coordinator standby configuredfor the standby coordinator status. If you configured a new standby coordinator, its status isPassive. -
Optional: If you have not already done so while activating the prior standby coordinator, you can run
gpinitstandbyon the active coordinator host to configure a new standby coordinator.noteYou need to initialize a new standby coordinator to continue providing coordinator mirroring.
For information about restoring the original coordinator and standby coordinator configuration, see Restore Coordinator Mirroring After a Recovery.