gpshrink
Cloudberry Database scales in clusters using the gpshrink
system tool. When cluster resources are idle, such as disk space usage consistently below 20% or low CPU and memory usage, gpshrink
can be used to reduce the size of the cluster, saving server resources. Users can remove segments from redundant servers with the gpshrink
tool to scale in the cluster.
The gpshrink tool operates in two phases:
-
Preparation Phase: Collects information about all user tables in the database that need redistribution.
-
Data Redistribution Phase: Redistributes data for all tables in the database cluster, adjusting for the expanded or reduced size of the cluster.
Steps to Scale In a Cluster Using gpshrink
-
Create a three-node cluster:
make create-demo-cluster
-
Create a test table and check pre-scale-in status:
-- Create table and insert data
CREATE TABLE test(a INT);
INSERT INTO test SELECT i FROM generate_series(1,100) i;
-- Check data distribution of the test table
SELECT gp_segment_id, COUNT(*) FROM test GROUP BY gp_segment_id;
-- Check metadata status
SELECT * FROM gp_distribution_policy;
SELECT * FROM gp_segment_configuration; -
Create a shrinktest file and list segments to delete:
touch shrinktest
The segment information format should be: hostname|address|port|datadir|dbid|content|role. Include both primary and mirror segment information. To delete multiple segments, list segments with higher content numbers first. Ensure the preferred role matches the role, listing primary before mirror.
Example format for deleting one segment:
i-thd001y0|i-thd001y0|7004|/home/gpadmin/cloudberrydb/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2|4|2|p
i-thd001y0|i-thd001y0|7007|/home/gpadmin/cloudberrydb/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2|7|2|m -
Run the
gpshrink
Command Twice:# Preparation phase
gpshrink -i shrinktest
# Redistribution phase
gpshrink -i shrinktestMain Parameters Description -i
Specifies the file containing the segments to delete. -c
Clears collected table information. -a
Gathers statistics for tables after redistribution. -d
Sets maximum execution time for redistribution, terminating if exceeded. tipHow gpshrink Works in Two Phases:
- The first
gpshrink -i shrinktest
command prepares for scaling in by reading the segments to be deleted from theshrinktest
file, creating the tablesgpshrink.status
(to record the status ofgpshrink
) andgpshrink.status_detail
(to record the status of each table), and identifying all tables that need redistribution. - The second
gpshrink -i shrinktest
command handles the data redistribution, calculates the segment size after deletion, and redistributes data across the scaled-in cluster. It then removes the corresponding segments fromgp_segment_configuration
. During this phase, creating new tables is not recommended, as they cannot be redistributed across the scaled-in cluster. Some statements might fail due to locked tables.
tip- If the first
gpshrink -i shrinktest
fails, it might be due to an error in theshrinktest
file. Clear the collected data withgpshrink -c
and rerungpshrink -i shrinktest
. - If the second
gpshrink -i shrinktest
fails, log in to the database to check the status of tables and proceed with further redistribution or rollback as needed.
- The first
-
Check the test table status after scaling in:
-- Check data distribution of the test table
SELECT gp_segment_id, COUNT(*) FROM test GROUP BY gp_segment_id;
-- Check metadata status
SELECT * FROM gp_distribution_policy;
SELECT * FROM gp_segment_configuration;